Processors cache L1, L2 and L3 are all made of SRAM?

cpu cpu-cache

16,695

In general they are all implemented with SRAM.

(IBM's POWER and zArchitecture chips use DRAM memory for L3. This is called embedded DRAM because it is implemented in the same type of process technology as logic, allowing fast logic to be integrated into the same chip as the DRAM. For POWER4 the off-chip L3 used eDRAM; POWER7 has the L3 on the same chip as the processing cores.)

Although they use SRAM, they do not all use the same SRAM design. SRAM for L2 and L3 are optimized for size (to increase the capacity given limited manufacturable chip size or reduce the cost of a given capacity) while SRAM for L1 is more likely to be optimized for speed.

More importantly, the access time is related to the physical size of the storage. With a two dimensional layout one can expect physical access latency to be roughly proportional to the square root of the capacity. (Non-uniform cache architecture exploits this to provide a subset of cache at lower latency. The L3 slices of recent Intel processors have a similar effect; a hit in the local slice has significantly lower latency.) This effect can make a DRAM cache faster than an SRAM cache at high capacities because the DRAM is physically smaller.

Another factor is that most L2 and L3 caches use serial access of tags and data where most L1 caches access tags and data in parallel. This is a power optimization (L2 miss rates are higher than L1 miss rates, so data access is more likely to be wasted work; L2 data access generally requires more energy--related to the capacity--; and L2 caches usually have higher associativity which means that more data entries would have to be read speculatively). Obviously, having to wait for the tag matching before accessing the data will add to the time required to retrieve the data. (L2 access also typically only begins after an L1 miss is confirmed, so the latency of L1 miss detection is added to the total access latency of L2.)

In addition, L2 cache is physically more distant from the execution engine. Placing the L1 data cache close to the execution engine (so that the common case of L1 hit is fast) generally means that L2 must be placed farther away.

16,695

Author by

Acaz Souza

@acazsouza

Updated on September 18, 2022

Comments

Acaz Souza over 1 year

Are Processor caches L1, L2 and L3 all made of SRAM? If true, why L1 is faster than L2 and L2 is faster than L3? I did not understand this part when I read about them.
Harshavardhan Ramanna about 6 years

Great answer. But I do not agree with your statement that L2 miss rates are higher than L1 miss rates. As we move lower in the memory hierarchy, we have bigger structures providing lesser misses but with increased latency.
Paul A. Clayton about 6 years

@HarshavardhanRamanna Yes increases in capacity and associativity help miss rate, but the lower levels filter accesses (the traditional transfer of a whole block from L2 filters out short-term spatial locality within the block; the hit only counts for one access in L2 while the block itself is likely to provide hits in L1 for additional accesses). The total miss rate goes down (e.g., an L2 with a decent 80% hit rate and an L1 with 95% hit rate gets a total 99% hit rate).