Why do we need multiple levels of cache memory?

memory performance cpu cpu-architecture cpu-cache

17,650

This is to do with the physical size on the die. Each bit in a cache is held by one or more transistors, so if you want a lot of cache you need a lot of transistors. The speed of the cache is essentially inversely proportional to locality to the unit that wants to access it - in tiny devices such as this, communication gets slower when your signal path gets longer. This is partially to do with substrate impedance, but at this level there are some more complex physics phenomena involved.

If we want to include a large singular cache, it has to be within a very short distance of the MMU, ALU, etc. at the same time. This makes the physical design of the processor quite difficult, as a large cache takes up a lot of space. In order to make the cache "local" to these subunits, you have to sacrifice the locality of the subunits to one-another.

By using a small, fast local cache (L1) we can maximise locality and speed, but we lose size. So we then use a secondary cache (L2) to keep larger amounts of data, with a slight locality (and therefore speed) sacrifice. This gives us the best of both worlds - we can store a lot of data, but still have a very fast local cache for processor subunits to use.

In multicore processors, the L3 cache is usually shared between cores. In this type of design, the L1 and L2 caches are built into the die of each core, and the L3 cache sits between the cores. This gives reasonable locality to each core, but also allows for a very large cache.

The functionality of caches in modern processors is very complicated, so I'll not even attempt a proper description, but a very simplistic process is that target addresses are looked for in the L1, then the L2, then the L3, before resorting to a system memory fetch. Once that fetch is done, it's pulled back up through the caches.

17,650

ganesh

Updated on September 18, 2022

Comments

ganesh over 1 year

In general a cache memory is useful because the speed of the processor is higher than the speed of the ram (they are both increasing in speed but the difference still remains). So reducing the number of memory accesses is desiderable to increase performance.

My question is why do we need multiple level of caches (L1, L2, L3) and not just one?

I know that L1 is the fastest and the smallest, L2 is a bit slower and a bit bigger and so on... but why do they create the hardware this way?
Loren_ over 10 years

Nice! Do you have any sources to link to?
Paul A. Clayton over 10 years

One interesting aspect of the impact of physical size for large memories is that a designer can consider using slower storage devices for each bit and yet increase access speed if the slower devices are sufficiently smaller. This means that DRAM can make sense for very large caches. (There are also Non-Uniform Cache Architecture techniques where closer parts of L2 have lower latency. With partitioned L3 caches, physical [or on-chip network] proximity can be exploited by placement choices to improve performance.)
Polynomial over 10 years

@Thor This is one of those weird bits of knowledge that I've picked up over years of random reading and playing with microprocessors, but the wiki article is pretty comprehensive, especially the part about multi-level cache architectures.