split data table to small tables R
Solution 1
if you do
split_list <- split(dat,dat$x2)
you can access the splitted results with
split_list[[1]]
split_list[[2]]
....
to convert a result
# to a data.frame
df1 <- as.data.frame(split_list[[1]])
# to a table
t1 <- as.table(split_list[[1]])
to store in multiple data sets (even though I don't see the benefit of it)
names1 <- names(split_list)
for(i in seq_along(names1)){
assign(names1[i], split_list[[i]])
}
Solution 2
I know the question is a bit older. I came across a similar problem recently and want to share the code still.
I wanted to split a data.table
in equally sized chunks. The number of chunks I computed beforehand by dividing the total number of rows of the data.table
by the number of smaller data.table
s I intended to receive. I wrote a function that splits the data.table
(input x
) into the number of chunks with equal number of rows (no_rows_per_frame
) and put a path were to store the frames (path_to_store
).
I needed it to handcollect variables for the chunks. But you could rewrite it to simply return all data.table
s separately. Or better and according to @David Arenburg's answer: Store it in a list and don't pollute your global environment. NB: The code might not be efficient as it uses a loop, but was pretty fast for my sample containing almost 500k observations (as data.table
is).
# function: split into equally-sized samples for handcollection
split_data_table <- function(x, no_rows_per_frame, path_to_store){
split_vec <- seq(1, nrow(x), no_rows_per_frame)
for (split_cut in split_vec) {
sample <- x[split_cut:(split_cut+(no_rows_per_frame-1))]
fwrite(sample, paste(path_to_store, "sample_until_", (split_cut+(no_rows_per_frame-1)), ".csv", sep = ""))
}
}
# apply sample cut
split_data_table(x = vendor_tab, no_rows_per_frame = 5000,
path_to_store = "C/...")
Hope it help so.!
user4751931
Updated on June 04, 2022Comments
-
user4751931 almost 2 years
I have a following table (more than 1k rows):
x1 x2 x3 x4 7809 243638 1 1 7809 243638 1 1 7809 243638 1 1 ... 3453 222222 1 0
and I need to split this table to small tables (which will be in my environment as data frames) based on 2nd column
x2
. I've tried to dosplit(dat,dat$x2)
and R did it right but in list. -
user4751931 about 9 yearsgot it, and how to cycle it for more than 700 results?
-
Ben Bolker about 9 years