SQL: How would you split a 100,000 records from a Oracle table into 5 chunks?

30,302

Solution 1

If you just want to assign values 1-5 to basically equal sized groups, then use ntile():

select t.*, ntile(5) over (order by NULL) as num
from (select t.*
      from t
      where rownum <= 100000
     ) t;

If you want to insert into 5 different tables, then use insert all:

insert all
    when num = 1 then into t1
    when num = 2 then into t2
    when num = 3 then into t3
    when num = 4 then into t4
    when num = 5 then into t5
    select t.*, ntile(5) over (order by NULL) as num
    from (select t.*
          from t
          where rownum <= 100000
         ) t;

Solution 2

A bit harsh down voting another fair question.

Anyway, NTILE is new to me, so I wouldn't have discovered that were it not for your question.

My way of doing this , the old school way, would have been to MOD the rownum to get the group number, e.g.

select t.*, mod(rn,5) as num
from (select t.*, rownnum rn
      from t
     ) t;

This solves the SQL part, or rather how to group rows into equal chunks, but that is only half your question. The next half is how to write these to 5 separate files.

You can either have 5 separate queries each spooling to a separate file, e.g:

spool f1.dat
    select t.*
    from (select t.*, rownnum rn
          from t
         ) t
    where mod(t.rn,5) = 0;
spool off

spool f2.dat
    select t.*
    from (select t.*, rownnum rn
          from t
         ) t
    where mod(t.rn,5) = 1;
spool off

etc.

Or, using UTL_FILE. You could try something clever with a single query and have an array of UTL_FILE types where the array index matches the MOD(rn,5) then you wouldn't need logic like "IF rn = 0 THEN UTL_FILE.WRITELN(f0, ...".

So, something like (not tested, just in a rough form for guidance, never tried this myself):

DECLARE
   TYPE fname IS VARRAY(5) OF VARCHAR2(100);
   TYPE fh    IS VARRAY(5) OF UTL_FILE.FILE_TYPE;
   CURSOR c1 IS 
    select t.*, mod(rn,5) as num
    from (select t.*, rownnum rn
          from t
         ) t;
   idx INTEGER;
BEGIN
  FOR idx IN 1..5 LOOP
      fname(idx) := 'data_' || idx || '.dat';
      fh(idx) := UTL_FILE.'THE_DIR', fname(idx), 'w');
  END LOOP;
  FOR r1 IN c1 LOOP
     UTL_FILE.PUT_LINE ( fh(r1.num+1), r1.{column value from C1} );
  END LOOP;
  FOR idx IN 1..5 LOOP
      UTL_FILE.FCLOSE (fh(idx));
  END LOOP;
END;

Solution 3

You can even try with simple aggregation:

create table test_chunk(val) as
(
    select floor(dbms_random.value(1, level * 10)) from dual
    connect by level <= 100
)

select min(val), max(val), floor((num+1)/2)
from (select rownum as num, val from test_chunk)
group by floor((num+1)/2)

Solution 4

Thanks so much to Gordon Linoff for giving me a starter to the code.

just an update on how to get the min and max values for 5 chunks.

select num, min(cre_surr_id), max(cre_surr_id)
from
(select p.cre_surr_id, ntile(5) over (order by NULL) as num
from (select p.*
      from productions p
      where rownum <= 100000
 ) p )
group by num
Share:
30,302
Shaun Kinnair
Author by

Shaun Kinnair

Updated on January 07, 2020

Comments

  • Shaun Kinnair
    Shaun Kinnair over 4 years

    I'm trying to figure out away to split the first 100,000 records from a table that has 1 million+ records into 5 (five) 20,000 records chunks to go into a file? Maybe some SQL that will get the min and max rowid or primary id for each 5 chunks of 20,000 records, so I can put the min and max value into a variable and pass it into the SQL and use a BETWEEN in the where clause to the SQL.

    Can this be done?

    I'm on an Oracle 11g database.

    Thanks in advance.

  • Shaun Kinnair
    Shaun Kinnair about 8 years
    Thanks TenG, "A bit harsh down voting another fair question"...I'm not too bothered with people not liking my question, so long as the question is answered and thanks to guys like you questions are answered.
  • Shaun Kinnair
    Shaun Kinnair about 8 years
    Another great Answer by Aleksej, thanks guys, you've all been a big help.
  • Gordon Linoff
    Gordon Linoff about 5 years
    @BN . . . This answer is for Oracle.