Is there a way to guess the size of data.frame based on rows, columns and variable types?
Solution 1
You can simulate an object and compute an estimation of the memory that is being used to store it as an R object using object.size
:
m <- matrix(1,nrow=1e5,ncol=150)
m <- as.data.frame(m)
m[,1:20] <- sapply(m[,1:20],as.character)
m[,29:30] <- sapply(m[,29:30],as.factor)
object.size(m)
120017224 bytes
print(object.size(m),units="Gb")
0.1 Gb
Solution 2
You could create dummy variables that store examples of the data you will be storing in the dataframe.
Then use object.size()
to find their size and multiply with the rows and columns accordingly.
Solution 3
Check out pryr
package as well. It has object_size
which may be slightly better for you. From the advanced R
This function is better than the built-in object.size() because it accounts for shared elements within an object and includes the size of environments.
You also need to account for the size of attributes
as well as the column types etc.
object.size(attributes(m))
Ajay Ohri
http://about.me/aohri I am a data scientist with a startup at http://decisionstats.org I have written two books on R. R for Cloud Computing and R for Business Analytics. I blog at DecisionStats.com
Updated on July 06, 2022Comments
-
Ajay Ohri almost 2 years
I am expecting to generate a lot of data and then catch it R. How can I estimate the size of the data.frame (and thus memory needed) by the number of rows, number of columns and variable types?
Example.
If I have 10000 rows and 150 columns out of which 120 are numeric, 20 are strings and 10 are factor level, what is the size of the data frame I can expect? Will the results change depending on the data stored in the columns (as in
max(nchar(column))
)?> m <- matrix(1,nrow=1e5,ncol=150) > m <- as.data.frame(m) > object.size(m) 120009920 bytes > a=object.size(m)/(nrow(m)*ncol(m)) > a 8.00066133333333 bytes > m[,1:150] <- sapply(m[,1:150],as.character) > b=object.size(m)/(nrow(m)*ncol(m)) > b 4.00098133333333 bytes > m[,1:150] <- sapply(m[,1:150],as.factor) > c=object.size(m)/(nrow(m)*ncol(m)) > c 4.00098133333333 bytes > m <- matrix("ajayajay",nrow=1e5,ncol=150) > > m <- as.data.frame(m) > object.size(m) 60047120 bytes > d=object.size(m)/(nrow(m)*ncol(m)) > d 4.00314133333333 bytes
-
nicola almost 9 yearsI guess that the point is to know the size before creating it.
-
agstudy almost 9 years@nicola it is an estimation. You can use as a reference assuming that the memory allocation is a linear function..
-
Pierre L almost 9 yearsThis solution works because after doing a few examples the user will be able to better estimate their output for other cases.
-
nicola almost 9 yearsYes, but what's the point of generating an object so similar to the one we want to create? One can just create the object and see. I guess that maybe one can see how much space a single column of a single type requires and do some math. This may work even for very large objects.
-
agstudy almost 9 years@nicola well but I think you simplify too much how memory is allocated, I don't know what do you mean by some math but I don't think you can get the size memory using some additions.. take a look at
memory.profile()
-
Ajay Ohri almost 9 yearsbecause of expected variances in size of object and also the practicalities of working in a server environment with many processes and users firing code, memory estimation is useful
-
nicola almost 9 yearsIn your example for instance, see how much close are
object.size(m)
andobject.size(m[[1]])*ncol(m)
. No reason to generate 150 columns.