split data table to small tables R

18,890

Solution 1

if you do

split_list <- split(dat,dat$x2)

you can access the splitted results with

split_list[[1]]
split_list[[2]]
....

to convert a result

# to a data.frame
df1 <- as.data.frame(split_list[[1]])

# to a table
t1 <- as.table(split_list[[1]])

to store in multiple data sets (even though I don't see the benefit of it)

names1 <- names(split_list)

for(i in seq_along(names1)){
  assign(names1[i], split_list[[i]])
}

Solution 2

I know the question is a bit older. I came across a similar problem recently and want to share the code still.

I wanted to split a data.table in equally sized chunks. The number of chunks I computed beforehand by dividing the total number of rows of the data.table by the number of smaller data.tables I intended to receive. I wrote a function that splits the data.table (input x) into the number of chunks with equal number of rows (no_rows_per_frame) and put a path were to store the frames (path_to_store).

I needed it to handcollect variables for the chunks. But you could rewrite it to simply return all data.tables separately. Or better and according to @David Arenburg's answer: Store it in a list and don't pollute your global environment. NB: The code might not be efficient as it uses a loop, but was pretty fast for my sample containing almost 500k observations (as data.table is).


# function: split into equally-sized samples for handcollection
split_data_table <- function(x, no_rows_per_frame, path_to_store){

  split_vec <- seq(1, nrow(x), no_rows_per_frame)

  for (split_cut in split_vec) {
    sample <- x[split_cut:(split_cut+(no_rows_per_frame-1))]
    fwrite(sample, paste(path_to_store, "sample_until_", (split_cut+(no_rows_per_frame-1)), ".csv", sep = ""))
  }
}

# apply sample cut
split_data_table(x = vendor_tab, no_rows_per_frame = 5000, 
                 path_to_store = "C/...")

Hope it help so.!

Share:
18,890
user4751931
Author by

user4751931

Updated on June 04, 2022

Comments

  • user4751931
    user4751931 almost 2 years

    I have a following table (more than 1k rows):

      x1      x2  x3  x4 
    7809  243638   1   1 
    7809  243638   1   1
    7809  243638   1   1 
    ... 
    3453  222222   1   0
    

    and I need to split this table to small tables (which will be in my environment as data frames) based on 2nd column x2. I've tried to do split(dat,dat$x2) and R did it right but in list.

  • user4751931
    user4751931 about 9 years
    got it, and how to cycle it for more than 700 results?
  • Ben Bolker
    Ben Bolker about 9 years