alternative to MemoryStream for large data volumes

11,137

Solution 1

Programmers try too hard to avoid using a file. The difference between memory and a file is a very small one in Windows. Any memory you use for a MemoryStream in fact requires a file. The storage is backed by the paging file, c:\pagefile.sys. And the reverse is true as well, any file you use is backed by memory. File data is cached in RAM by the file system cache. So if the machine has sufficient RAM then you will in fact only read and write from/to memory if you use a FileStream. And get the perf you expect from using memory. It is entirely free, you don't have to write any code to enable this nor do you have to manage it.

If the machine doesn't have enough RAM then it deteriorates the same way. When you use a MemoryStream then the paging file starts trashing and you'll be slowed down by the disk. When you use a file then the data won't fit the file system cache and you'll be slowed down by the disk.

You'll of course get the benefit of using a file, you won't run out of memory anymore. Use a FileStream instead.

Solution 2

This is expected to happen using MemoryStream so you should implement you own logic or use some external class. here is a post that explains the problems with MemoryStream and big data and the post gives an alternative to MemoryStream A replacement for MemoryStream

Solution 3

We've run into similar obstacles on my team. Some commenters have suggested that developers need to be more okay with using files. If it's an option to use the filesystem directly do that, but that's not always an option.

If, like we needed, you want to pass data read from a file around your application, you can't pass the FileStream object because it can get disposed before you're done reading the data. We originally resorted to MemoryStreams to let us pass the data around easily, but ran into the same problem.

We've used a couple different workarounds to mitigate the problem.

Options we've used include:

  • Implement a wrapper class to store the data in multiple (since arrays are still limited to int.MaxValue number of entries) byte[] objects and expose methods that enable you to almost treat them like a Stream. We still try to avoid this at all costs.
  • Use some sort of "token" to pass a reference to the location of the data and wait to load the data "just in time" in the application.
Share:
11,137
Andy
Author by

Andy

Updated on June 15, 2022

Comments

  • Andy
    Andy almost 2 years

    I'm having problems with out of memory exceptions when using a .Net MemoryStream if the data is large and the process is 32 bit.

    I believe that the System.IO.Packaging API silently switches from memory to to file-backed storage as the data volume increases, and on the face of it, it seems it would be possible to implement a subclass of MemoryStream that does exactly the same thing.

    Does anyone know of such an implementation? I'm pretty sure there is nothing in the framework itself.

  • Andy
    Andy over 10 years
    Thanks, I did find that before but their MemoryTributary only allows approx double the data size of a standard MemoryStream, so it's not a general solution
  • René
    René about 8 years
    -1 There is a lot of points I believe are flat out wrong in this answer, but in the interest of space I'll only challenge this: "So if the machine has sufficient RAM then you will in fact only read and write from/to memory if you use a FileStream". This is absolutely not true. I'm currently working on a project where I'm writing 3 GB to a FileStream on a system with more than 50 GB of RAM. And I can tell you for a fact it's being continously persisted to disk. And MemoryStreams are not.
  • Jeffrey Kevin Pry
    Jeffrey Kevin Pry over 6 years
    -1 ... I use MemoryStream all of the time and have no page file enabled on my Windows system. Can you cite your source saying a MemoryStream always uses a file on the disk? From github.com/Microsoft/referencesource/blob/master/mscorlib/…: // A MemoryStream represents a Stream in memory (ie, it has no backing store). // This stream may reduce the need for temporary buffers and files in // an application.
  • user1703401
    user1703401 over 6 years
    These are just the basics of a demand-paged virtual memory operating system. RAM pages are backed by the paging file. if you don't have one then you'd better have a lot of RAM, not that hard to come by these days. If the OS is under pressure anyway then it will start pilfering pages that it can unmap. Generally that will be pages with code, backed-up by the executable file. Mapping them back in when the code needs to run uses the disk to restore their content. Perhaps you can set such a hardware requirement for your customer as well, I never can.
  • user247702
    user247702 about 6 years
    I'm confused. When using a FileStream, on a low memory system you'll avoid OOM and on a high memory system you'll benefit from writing to memory. But then you say in your comment it's wrong to use a FileStream instead of a MemoryStream on a high memory system? Can you please clarify, thanks.
  • Teorist
    Teorist over 5 years
    True about pagination. However, sometimes using a MemoryStream, or any other kind of memory storage to hold temporary data makes more sense than using files. For example, using files leaves more mopping up to do if sensitive data is being processed and is more prone to being breached if the process crashes before it manages to properly delete the temporary files.