Why has the size of L1 cache not increased very much over the last 20 years?

33,144

Solution 1

30K of Wikipedia text isn't as helpful as an explanation of why too large of a cache is less optimal. When the cache gets too large the latency to find an item in the cache (factoring in cache misses) begins to approach the latency of looking up the item in main memory. I don't know what proportions CPU designers aim for, but I would think it is something analogous to the 80-20 guideline: You'd like to find your most common data in the cache 80% of the time, and the other 20% of the time you'll have to go to main memory to find it. (or whatever the CPU designers intended proportions may be.)

EDIT: I'm sure it's nowhere near 80%/20%, so substitute X and 1-X. :)

Solution 2

One factor is that L1 fetches start before the TLB translations are complete so as to decrease latency. With a small enough cache and high enough way the index bits for the cache will be the same between virtual and physical addresses. This probably decreases the cost of maintaining memory coherency with a virtually-indexed, physically-tagged cache.

Solution 3

Cache size is influenced by many factors:

  1. Speed of electric signals (should be if not the speed of light, something of same order of magnitude):

    • 300 meters in one microsecond.
    • 30 centimeters in one nanosecond.
  2. Economic cost (circuits at different cache levels may be different and certain cache sizes may be unworth)

    • Doubling cache size does not double performance (even if physics allowed that size to work) for small sizes doubling gives much more than double performance, for big sizes doubling cache size gives almost no extra performance.
    • At wikipedia you can find a chart showing for example how unworth is making caches bigger than 1MB (actually bigger caches exist but you must keep in count that those are multiprocessor cores.)
    • For L1 caches there should be some other charts (that vendors don't show) that make convenient 64 Kb as size.

If L1 cache size didn't changed after 64kb it's because it was no longer worth. Also note that now there's a greater "culture" about cache and many programmers write "cache-friendly" code and/or use prefetech instructions to reduce latency.

I tried once creating a simple program that was accessing random locations in an array (of several MegaBytes): that program almost freezed the computer because for each random read a whole page was moved from RAM to cache and since that was done very often that simple program was draining out all bandwith leaving really few resources for the OS.

Solution 4

I believe it can be summed up simply by stating that the bigger the cache, the slower the access will be. So a larger cache simply doesn't help as a cache is designed to reduce slow bus communication to RAM.

Since the speed of the processor has been increasing rapidly, the same-sized cache must perform faster and faster in order to keep up with it. So the caches may be significantly better (in terms of speed) but not in terms of storage.

(I'm a software guy so hopefully this isn't woefully wrong)

Solution 5

From L1 cache:

The Level 1 cache, or primary cache, is on the CPU and is used for temporary storage of instructions and data organised in blocks of 32 bytes. Primary cache is the fastest form of storage. Because it's built in to the chip with a zero wait-state (delay) interface to the processor's execution unit, it is limited in size.

SRAM uses two transistors per bit and can hold data without external assistance, for as long as power is supplied to the circuit. This is contrasted to dynamic RAM (DRAM), which must be refreshed many times per second in order to hold its data contents.

Intel's P55 MMX processor, launched at the start of 1997, was noteworthy for the increase in size of its Level 1 cache to 32KB. The AMD K6 and Cyrix M2 chips launched later that year upped the ante further by providing Level 1 caches of 64KB. 64Kb has remained the standard L1 cache size, though various multiple-core processors may utilise it differently.

EDIT: Please note that this answer is from 2009 and CPUs have evolved enormously in the last 10 years. If you have arrived to this post, don't take all our answers here too seriously.

Share:
33,144

Related videos on Youtube

eleven81
Author by

eleven81

Updated on September 17, 2022

Comments

  • eleven81
    eleven81 about 1 year

    The Intel i486 has 8 KB of L1 cache. The Intel Nehalem has 32 KB L1 instruction cache and 32 KB L1 data cache per core.

    The amount of L1 cache hasn't increased at nearly the rate the clockrate has increased.

    Why not?

    • Keltari
      Keltari over 10 years
      You are comparing apples to oranges. Clock rates have increased, but there is no correlation to the need for more cache. Just because you can do something faster, doesnt mean you benefit from a bigger bucket.
    • Fiasco Labs
      Fiasco Labs over 10 years
      Excess cache and the management overhead can slow a system down. They've found the sweet spot and there it shall remain.
  • sYnfo
    sYnfo almost 14 years
    "When the cache gets too large the latency to find an item in the cache (factoring in cache misses) begins to approach the latency of looking up the item in main memory." Are you sure about this? For example doubling the amount of installed RAM will certainly not increase it's latency, why would this be true for cache? And also, why would the L2 cache grow bigger with new CPUs, if this is a problem? I'm no expert in this, I really want to know :)
  • JMD
    JMD almost 14 years
    I had prepared a big, long description of caching in software, and measuring when your cache has outgrown itself and should be dumped/rebuilt, but then I decided it might be best to admit that I'm not a hardware designer. :) In either case, I suspect the answer can be summed up by the law of diminishing returns. I.e. more is not always better.
  • Brian Knoblauch
    Brian Knoblauch almost 14 years
    From my long history of fiddling with hardware at low levels, but not actually being a designer, I'd say that latency appears to be related to how many ways the cache is associative, not the size. My guess is that the extra transistors that would go into the cache have proven to be more effective elsewhere to overall performance.
  • sYnfo
    sYnfo almost 14 years
    @JMD I'd be interested in that description nevertheless ;) Although comments are probably not the best place for this, true. @Brian So, if I understand it correctly, they decided to put less transistors in L1 cache and in the same time put much more in L2, which is significantly slower? Please take no offense, I'm just curious :)
  • lukecampbell
    lukecampbell over 10 years
    A typical SRAM cell is made up of six MOSFETs. Each bit in an SRAM is stored on four transistors (M1, M2, M3, M4) that form two cross-coupled inverters. Source Second Source
  • CoffeDeveloper
    CoffeDeveloper almost 10 years
    most interesting answer:)
  • b_jonas
    b_jonas about 8 years
    I believe this is the reason, but let me give the number. The page size on the x86 architecture is 4096 bytes. The cache wants to choose the cache bucket in which to look for the entry of the cache line (64 bytes) before the page translation is complete. It would be expensive to have to decide between too many entries in a bucket, so each bucket only has 8 entries in it. As a result, for the last ten years, all the expensive x86 cpus have exactly 32768 bytes (512 cache lines) in their L1 data cache.
  • b_jonas
    b_jonas about 8 years
    As this is so hard to increase, the cpus add a middle level of cache, so we have separate L2 and L3 caches now. Also, the L1 code cache and L1 data cache are separate, because the CPU knows if it's accessing code or data.
  • Eonil
    Eonil almost 5 years
    This is just description of situation, and does not explain anything about why.
  • Ramhound
    Ramhound almost 5 years
    @Eonil - We could not provide the “why” answer if we wanted to. However, diminishing returns on the performance is a viable reasonable explanation. When the question was written nearly a decade ago, it was much more expensive, to increase the size without including a performance hit. This answer attempted to at the very least answer the intended question that was asked.