maximum size of a matrix in R

19,729

Solution 1

The theoretical limit of a vector in R is 2147483647 elements. So that's about 1 billion rows / 2 columns.

...but that amount of data does not fit in 4 GB of memory... And especially not with strings in a character vector. Each string is at least 96 bytes (object.size('a') == 96), and each element in your matrix will be a pointer (8 bytes) to such a string (there is only one instance of each unique string though).

So what typically happens is that the machine starts using virtual memory and start swapping. Heavy swapping typically kills all hope of ever finishing in this century - especially on Windows.

But if you are using a package (igraph?) and you're asking it to produce the matrix, it probably does a lot of internal work and creates lots of auxiliary objects. So even if you're nowhere near the memory limit for the single result matrix, the algorithm used to produce it can run out of memory. It can also be non-linear (quadratic or worse) in time, which would again kill all hope of ever finishing in this century...

A good way to investigate could be to time it on a small graph (e.g. using system.time), and the again when doubling the graph size a couple of times. Then you can see if the time is linear or quadratic and you can estimate how long it will take to complete your big graph. If the prediction says a week, well then you know ;-)

Solution 2

R matrices can be addressed in single index notation as they are really a vector with a dim attribute of length 2 and in R vectors are addressed by a signed 32-bit integer even if you are using the 64-bit version. So a 2-column matrix can have a maximum of 2^30-1 rows.

A data.frame would allow you to use 2^31-1 rows and columns.

Share:
19,729
Peter Flom
Author by

Peter Flom

I'm a statistical consultant to graduate students and researchers in fields including the behavioral and health sciences. I've assisted with review of articles, and with preparation of grants, dissertations and papers.

Updated on June 12, 2022

Comments

  • Peter Flom
    Peter Flom almost 2 years

    I am using igraph to do some network analysis. As part of that, I have to create a matrix with 2 columns and as many rows as there are links. I have a large network (several million links) and creating this matrix didn't work after 3 hours of run time (no errors, just no result, and it shows "not responding").

    What is the maximum size of such a character matrix? How long does it take to run?

    I am running 64 bit R 2.14.1, on a Windows 7 machine with 4 GB of memory running at 2.67 Ghz

    thanks

    ADDED Thanks for the quick responses. This made me positive it wasn't the size of the matrix; it turned out to be an error in which columns of another matrix I was using to create that matrix.

  • James
    James about 12 years
    There is some overhead on a vector, at the limit object.size(character(n))/n shows that characters are 8 bytes.
  • Tommy
    Tommy about 12 years
    @James - True. The strings' size increases by 8 for every 8 characters. So object.size('abcdefgh') == 104 (on 64-bit systems)