How to store 100 million integers in a Java array?

11,628

Solution 1

Use an associative array. The key is an integer, and the value is the count (the number of times the integer has been added to the list).

This should get you some decent space savings if the distribution is relatively random, much more so if it's not.

Solution 2

If you need to store 1 billion completely random integers then I am afraid that you really do need to corresponding space, i.e. about 4GB of memory for 32-bit int numbers. You can try increasing the JVM heap space but you need to have a 64-bit OS and at least as much physical memory - and there is only so far that you can go.

On the other hand, you might be able to store those number more efficiently if you can make use of specific constraints within your application.

E.g. if you only need to know if a specific int is contained in a set, you could get away with a bit set - i.e. a single bit for each value in the int range. That is about 4 billion bits, i.e. 512 MB - a far more reasonable space requirement. For example, a handful of BitSet objects could cover the whole 32-bit integer range without you having to write any bit-handling code...

Solution 3

May be using memory mapped files will help? They are not allocated from the heap. Here is an article how to create a matrix. An array should be easier.

Using a memory mapped file for a huge matrix - Peter Lawrey

Share:
11,628
user2035996
Author by

user2035996

Updated on June 06, 2022

Comments

  • user2035996
    user2035996 about 2 years

    I am working on a small task where I am required to store around 1 billion integers in an Array. However, I am running into a heap space problem. Could you please help me with this?

    Machine Details : Core 2 Duo Processor with 4 GB RAM. I have even tried -Xmx 3072m . Is there any work around for this? The same thing works in C++ , so there should definitely be a way to store this many numbers in memory.

    Below is the code and the exception I am getting :

    public class test {
        private static int C[] = new int[10000*10000];
    
        public static void main(String[] args) {
            System.out.println(java.lang.Runtime.getRuntime().maxMemory()); 
    
        }
    
    }
    

    Exception : Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at test.(test.java:3)

  • Audrius Meškauskas
    Audrius Meškauskas over 11 years
    RAM is not the only memory we have. There is also swap.
  • thkala
    thkala over 11 years
    @AudriusMeškauskas: yes, but as far as I know, the JVM will absolutely refuse to allocate more heap that what is available as physical RAM...
  • Audrius Meškauskas
    Audrius Meškauskas over 11 years
    Good idea. Really, it should be many repetitive values in the array of this size.
  • thkala
    thkala over 11 years
    @Dave: a Hashtable, at least as implemented in the Java libraries is one of the worst methods w.r.t. space efficiency. Your approach would only make sense when you have a only few integers with high appearance frequency...
  • Audrius Meškauskas
    Audrius Meškauskas over 11 years
    I am not convinced. Where have you seen this documented? Difference between RAM and swap kind of should be transparent for the program. But probably will be slow.
  • thkala
    thkala over 11 years
    @Dave: that's the point: most associative arrays have a significant overhead per-entry. Unless a lot of the integers are repeated, a primitive int[] may still be better...
  • Dave
    Dave over 11 years
    @thkala: it's only better until you run out of space :)
  • thkala
    thkala over 11 years
    @AudriusMeškauskas: Sorry, this indeed does seem to be the case in modern JVMs. My JVM will happily start with -Xmx80g which is actually more than both RAM+swap on my system...
  • thkala
    thkala over 11 years
    Of course, depending on what one is doing with all those ints, using swap space might be little better than doing the computations by hand :-)
  • phuclv
    phuclv about 10 years
    the address space of a 32-bit process is only 2GB (3GB with large address aware enabled)