Why Large Object Heap and why do we care?

45,195

Solution 1

A garbage collection doesn't just get rid of unreferenced objects, it also compacts the heap. That's a very important optimization. It doesn't just make memory usage more efficient (no unused holes), it makes the CPU cache much more efficient. The cache is a really big deal on modern processors, they are an easy order of magnitude faster than the memory bus.

Compacting is done simply by copying bytes. That however takes time. The larger the object, the more likely that the cost of copying it outweighs the possible CPU cache usage improvements.

So they ran a bunch of benchmarks to determine the break-even point. And arrived at 85,000 bytes as the cutoff point where copying no longer improves perf. With a special exception for arrays of double, they are considered 'large' when the array has more than 1000 elements. That's another optimization for 32-bit code, the large object heap allocator has the special property that it allocates memory at addresses that are aligned to 8, unlike the regular generational allocator that only allocates aligned to 4. That alignment is a big deal for double, reading or writing a mis-aligned double is very expensive. Oddly the sparse Microsoft info never mention arrays of long, not sure what's up with that.

Fwiw, there's lots of programmer angst about the large object heap not getting compacted. This invariably gets triggered when they write programs that consume more than half of the entire available address space. Followed by using a tool like a memory profiler to find out why the program bombed even though there was still lots of unused virtual memory available. Such a tool shows the holes in the LOH, unused chunks of memory where previously a large object lived but got garbage collected. Such is the inevitable price of the LOH, the hole can only be re-used by an allocation for an object that's equal or smaller in size. The real problem is assuming that a program should be allowed to consume all virtual memory at any time.

A problem that otherwise disappears completely by just running the code on a 64-bit operating system. A 64-bit process has 8 terabytes of virtual memory address space available, 3 orders of magnitude more than a 32-bit process. You just can't run out of holes.

Long story short, the LOH makes code run more efficient. At the cost of using available virtual memory address space less efficient.


UPDATE, .NET 4.5.1 now supports compacting the LOH, GCSettings.LargeObjectHeapCompactionMode property. Beware the consequences please.

Solution 2

If the object's size is greater than some pinned value (85000 bytes in .NET 1), then CLR puts it in Large Object Heap. This optimises:

  1. Object allocation (small objects are not mixed with large objects)
  2. Garbage collection (LOH collected only on full GC)
  3. Memory defragmentation (LOH is never rarely compacted)

Solution 3

The essential difference of Small Object Heap (SOH) and Large Object Heap (LOH) is, memory in SOH gets compacted when collected, while LOH not, as this article illustrates. Compacting large objects costs a lot. Similar with the examples in the article, say moving a byte in memory needs 2 cycles, then compacting a 8MB object in a 2GHz computer needs 8ms, which is a large cost. Considering large objects (arrays in most cases) are quite common in practice, I suppose that's the reason why Microsoft pins large objects in the memory and proposes LOH.

BTW, according to this post, LOH usually doesn't generate memory fragment problems.

Solution 4

The principal is that it unlikely (and quite possibly bad design) that a process would create lots of short lived large objects so the CLR allocates large objects to a separate heap on which it runs GC on a different schedule to the regular heap. http://msdn.microsoft.com/en-us/magazine/cc534993.aspx

Share:
45,195

Related videos on Youtube

Manish Basantani
Author by

Manish Basantani

You can find me, and more details about me - and the work I do, at my blog: http://www.devdumps.com/about-me/ "Any fool can write a code that computer can understand. Good programmers write code that human can understand" ~ Martin Fowler

Updated on February 02, 2020

Comments

  • Manish Basantani
    Manish Basantani over 4 years

    I have read about Generations and Large object heap. But I still fail to understand what is the significance (or benefit) of having Large object heap?

    What could have went wrong (in terms of performance or memory) if CLR would have just relied on Generation 2 (Considering that threshold for Gen0 and Gen1 is small to handle Large objects) for storing large objects?

    • Jacob Brewer
      Jacob Brewer about 10 years
      This gives me two questions for .NET designers: 1. Why isn't a LOH defrag called before an OutOfMemoryException is thrown? 2. Why not have LOH objects have an affinity for staying together (large prefer the end of the heap and small at the beginning)
  • Christopher Currens
    Christopher Currens over 12 years
    Also putting large objects on, say, generation 2 could wind up hurting performance, since it's would take a long time to compact the memory, especially if a small amount was freed and HUGE objects had to be copied to a new location. The current LOH is not compacted for performance reasons.
  • CodesInChaos
    CodesInChaos over 12 years
    I think it's only bad design because the GC doesn't handle it well.
  • Christian.K
    Christian.K over 12 years
    @CodeInChaos Apparently, there are some improvements coming in .NET 4.5
  • supercat
    supercat over 12 years
    @CodeInChaos: While it may make sense for the system to wait until a gen2 collection before trying to reclaim memory from even short-lived LOH objects, I can't see any performance advantage to declaring LOH objects (and any objects to which they hold references) unconditionally live during gen0 and gen1 collections. Are there some optimizations that are made possible by such an assumption?
  • Johnny_D
    Johnny_D about 12 years
    @Hans Passant, could you please clarify about x64 system, you mean this problem completely disappers?
  • supercat
    supercat over 11 years
    Some implementation details of the LOH make sense, but some puzzle me. For example, I can understand that if many large objects are created and abandoned, it may generally be desirable to delete them en masse in a Gen2 collection than piecemeal in Gen0 collections, but if one creates and abandons e.g. an array of 22,000 strings to which no outside references exist, what advantage exists to having Gen0 and Gen1 collections tag all 22,000 strings as "live" without regard for whether any reference exists to the array?
  • Lothar
    Lothar over 10 years
    Of course the fragmentation problem is just the same on x64. It will only take a few days more of running your server process before it kicks in.
  • user1703401
    user1703401 over 9 years
    Hmm, no, never underestimate 3 orders of magnitude. How long it takes to garbage collect a 4 terabyte heap is something you can't avoid discovering long before it gets close to that.
  • Shiv
    Shiv over 8 years
    Loading large quantities of data into managed objects usually dwarfs the 8ms cost to compact the LOH. In practice in most big data applications, the LOH cost is trivial next to the rest of the application performance.
  • Shiv
    Shiv over 8 years
    The 4.5.1 option needs to be manually set each time you want to compact. The value is reset to default each time it fires the compact so somehow you need to work out a good strategy yourself for managing LOH fragmentation in a 32-bit application.
  • relatively_random
    relatively_random about 7 years
    @HansPassant Could you, please, alaborate on this statement: "How long it takes to garbage collect a 4 terabyte heap is something you can't avoid discovering long before it gets close to that."
  • robbie fan
    robbie fan about 4 years
    @supercat I looked at the link mentioned by Myles McDonnell. My understanding is: 1. LOH collection happens in a gen 2 GC. 2. LOH collection doesn't include compaction (by the time the article was written). Instead, it will mark dead objects as reusable and these holes will serve future LOH allocations if large enough. Because of point 1, considering that a gen 2 GC would be slow if there are many objects in gen 2, I think it's better to avoid using LOH as much as possible in this case.
  • supercat
    supercat about 4 years
    @robbiefan: If at the time of a G0 collection, the only existing reference to an L0 or L1 object is held by a brand new LOH object, I would think it would be better to try to determine whether any references exist to the LOH object, and only promote the L0 or L1 objects to which it holds references if so, than to unconditionally keep alive all references held in the LOH object without regard for any references exist to it.
  • XTL
    XTL over 3 years
    @HansPassant what happens when object from SOH become(grow) more than 85kB(85kb), it moved to LOH or it remains in SOH ?