C# dictionary - how to solve limit on number of items?

15,436

Solution 1

Buy more memory, install a 64 bit version of the OS and recompile for 64 bits. No, I'm not kidding. If you want so many objects... in ram... And then call it a "feature". If the new Android can require 16gb of memory to be compiled...

I was forgetting... You could begin by reading C# array of objects, very large, looking for a better way

You know how many are 13 million objects?

To make a comparison, a 32 bits Windows app has access to less than 2 gb of address space. So it's 2 billion bytes (give or take)... 2 billion / 13 million = something around 150 bytes/object. Now, if we consider how much a reference type occupies... It's quite easy to eat 150 bytes.

I'll add something: I've looked in my Magic 8-Ball and it told me: show us your code. If you don't tell us what you are using for the key and the values, how should we be able to help you? What are you using, class or struct or "primitive" types? Tell us the "size" of your TKey and TValue. Sadly our crystall ball broke yesterday :-)

Solution 2

C# is not a language that was designed to solve heavy-duty scientific computation problems. It is absolutely possible to use C# to build tools that do what you want, but the off-the-shelf parts like Dictionary were designed to solve more common business problems, like mapping zip codes to cities and that sort of thing.

You're going to have to go with external storage of some sort. My recommendation would be to buy a database and use it to store your data. Then use a DataSet or some similar technology to load portions of the data into memory, manipulate them, and then pour more data from the database into the DataSet, and so on.

Solution 3

Well, I had almost exactly the same problem.

I wanted to load about 12.5 million [string, int]s into a dictionary from a database (for all the programming "gods" above who don't understand why, the answer is that it is enormously quicker when you are working with a 150 GB database if you can cache a proportion of one of the key tables in memory).

It annoyingly threw an out of memory exception at pretty much the same place - just under the 12 million mark even though the process was only consuming about 1.3 GB of memory (reduced to about 800 MB of memory after a judicious change in db read method to not try and do it all at once) - despite running on an I7 with 8 GB of memory.

The solution was actually remarkably simple - in Visual Studio (2010) in Solution Explorer right click the project and select properties. In the Build tab set Platform Target to x64 and rebuild.

It rattles through the load into the Dictionary in a few seconds and the Dictionary performance is very good.

Share:
15,436
Perlnika
Author by

Perlnika

student of bioinformatics and sociology

Updated on June 08, 2022

Comments

  • Perlnika
    Perlnika over 1 year

    I am using Dictionary and I need to store almost 13 000 000 keys in it. Unfortunatelly, after adding 11 950 000th key I got an exception "System out of memory". Is there any solution of this problem? I will need my program to run on less powerable computers than is actually mine in the future..

    I need that many keys because I need to store pairs - sequence name and sequence length, it is for solving bioinformatics related problem.

    Any help will be appreciated.

  • Oded
    Oded about 12 years
    Depending on the processing requirements, MapReduce may also be an option.
  • svick
    svick about 12 years
    13M items in the dictionary doesn't necessarily mean 13M objects that have to be GCed. That is, if both the keys and the values are value types (and are not boxed). And you're usually not limited by available RAM, but by available virtual memory. If you run out of physical RAM, the application may run terribly slow, but it will still run, not throw OOM exception.
  • Jon Hanna
    Jon Hanna about 12 years
    @svick If they're in a dictionary, then internally they are in structures on the heap. Besides, if they were value types on the stack then you'd run out of stack space even sooner.
  • Salvatore Previti
    Salvatore Previti about 12 years
    Not really jon, .NET dictionary, at the contrary of java dictionary, uses only a constant number of 3 reference object, if i remember correctly. If key and value are value types they are stored internally in an array of value structures (that is a value type) and in an array of buckets (that is an array of integer). So .NET dictionary is very well written and fast. .NET garbage collector is smart enough to not look at pure value types for garbage collection.
  • svick
    svick about 12 years
    @JonHanna, yes, they are on the heap. But that doesn't mean GC has to track every one of them. GC tracks the whole dictionary (and the constant number of arrays it uses internally), but not every value in it (unless they are reference types, of course).
  • svick
    svick about 12 years
    The Dictionary is pretty memory-efficient, though. So, it's not unrealistic to fit 13M items in a dictionary into 2 GB of memory. But you're right that it's quite easy to get above the size where it doesn't fit anymore.
  • Jeff Reddy
    Jeff Reddy about 12 years
    You could simply create a DataSet (or even simple text file) and then persist the info to disk. You could then load only portions of the file into memory as needed. I doubt there is a need to load 13m keys and value to memory.
  • xanatos
    xanatos about 12 years
    @svick From what I'm seeing of the implementation of Dictionary, for big numbers it searches a prime > 2 * Count, so > 26 million. Let's say it's 26 million, it then creates an array of int[] of 26M and an array of structs containing 2x int, 1x TKey, 1x TValue. So of pure overhead it's 3x int * 26M = 12 * 26M = 300mb (I'm looking at Resize). If he is lucky and he is able to fill it before the doubling, it's still 150mb of overhead.
  • Eric Lippert
    Eric Lippert about 12 years
    It has nothing to do with available physical memory. It is all about available address space. Are you running this on a 32 bit process, with 2GB of address space, or a 64 bit process with a billion GB of address space?
  • svick
    svick about 12 years
    Yeah, pretty much. Except that the prime is searched only when resizing, so it's actually a prime > Count. For 13M objects, the actual prime that it uses in my tests is 23,997,907.
  • xanatos
    xanatos about 12 years
    @svick The last prime "In table" in .NET 4.0 seems to be 7199369, so after that I would have thought he would have found something around 14 millions
  • svick
    svick about 12 years
    Actually, the second-to-last number in the table (~6M) is used, and then comes manual computing (i.e. approx. doubling). So 24M fits. Of course this all assumes not setting the default capacity. If the number of the elements is known, setting the capacity might help quite a lot here.
  • Tigran
    Tigran about 12 years
    Just use SQLite, much simpler IMHO
  • xanatos
    xanatos about 12 years
    @EricLippert It's only 8TB of address space, even on higher-end Windows Server :-D Don't give new users false hopes :-D
  • Joel B
    Joel B over 7 years
    As discussed in other comments/answers, using a DB isn't always an option. If you want to keep everything in memory for performance reasons, using a database will potentially kill your performance.