How are cache memories shared in multicore Intel CPUs?

performance x86 multiprocessing intel cpu-cache

25,733

Solution 1

In a multiprocessor system or a multicore processor (Intel Quad Core, Core two Duo etc..) does each cpu core/processor have its own cache memory (data and program cache)?

Yes. It varies by the exact chip model, but the most common design is for each CPU core to have its own private L1 data and instruction caches.

On old and/or low-power CPUs, the next level of cache is typically a L2 unified cache is typically shared between all cores. Or on 65nm Core2Quad (which was two core2duo dies in one package), each pair of cores had their own last-level cache and couldn't communicate as efficiently.

Modern mainstream Intel CPUs (since the first-gen i7 CPUs, Nehalem) use 3 levels of cache.

32kiB split L1i/L1d: private per-core (same as earlier Intel)
256kiB unified L2: private per-core. (1MiB on Skylake-avx512).
large unified L3: shared among all cores

Last-level cache is a a large shared L3. It's physically distributed between cores, with a slice of L3 going with each core on the ring bus that connects the cores. Typically 1.5 to 2.25MB of L3 cache with every core, so a many-core Xeon might have a 36MB L3 cache shared between all its cores. This is why a dual-core chip has 2 to 4 MB of L3, while a quad-core has 6 to 8 MB.

On CPUs other than Skylake-avx512, L3 is inclusive of the per-core private caches so its tags can be used as a snoop filter to avoid broadcasting requests to all cores. i.e. anything cached in a private L1d, L1i, or L2, must also be allocated in L3. See Which cache mapping technique is used in intel core i7 processor?

David Kanter's Sandybridge write-up has a nice diagram of the memory heirarchy / system architecture, showing the per-core caches and their connection to shared L3, and DDR3 / DMI(chipset) / PCIe connecting to that. (This still applies to Haswell / Skylake-client / Coffee Lake, except with DDR4 in later CPUs).

Can one processor/core access each other's cache memory, because if they are allowed to access each other's cache, then I believe there might be lesser cache misses, in the scenario that if that particular processors cache does not have some data but some other second processors' cache might have it thus avoiding a read from memory into cache of first processor? Is this assumption valid and true?

No. Each CPU core's L1 caches tightly integrate into that core. Multiple cores accessing the same data will each have their own copy of it in their own L1d caches, very close to the load/store execution units.

The whole point of multiple levels of cache is that a single cache can't be fast enough for very hot data, but can't be big enough for less-frequently used data that's still accessed regularly. Why is the size of L1 cache smaller than that of the L2 cache in most of the processors?

Going off-core to another core's caches wouldn't be faster than just going to L3 in Intel's current CPUs. Or the required mesh network between cores to make this happen would be prohibitive compared to just building a larger / faster L3 cache.

The small/fast caches built-in to other cores are there to speed up those cores. Sharing them directly would probably cost more power (and maybe even more transistors / die area) than other ways of increasing cache hit rate. (Power is a bigger limiting factor than transistor count or die area. That's why modern CPUs can afford to have large private L2 caches).

Plus you wouldn't want other cores polluting the small private cache that's probably caching stuff relevant to this core.

Will there be any problems in allowing any processor to access other processor's cache memory?

Yes -- there simply aren't wires connecting the various CPU caches to the other cores. If a core wants to access data in another core's cache, the only data path through which it can do so is the system bus.

A very important related issue is the cache coherency problem. Consider the following: suppose one CPU core has a particular memory location in its cache, and it writes to that memory location. Then, another core reads that memory location. How do you ensure that the second core sees the updated value? That is the cache coherency problem.

The normal solution is the MESI protocol, or a variation on it. Intel uses MESIF.

Solution 2

Quick answers 1) Yes 2)No, but it all may depend on what memory instance/resource you are referring, data may exist in several locations at the same time. 3)Yes.

For a full length explanation of the issue you should read the 9 part article "What every programmer should know about memory" by Ulrich Drepper ( http://lwn.net/Articles/250967/ ), you will get the full picture of the issues you seem to be inquiring about in a good and accessible detail.

Solution 3

To answer your first, I know the Core 2 Duo has a 2-tier caching system, in which each processor has its own first-level cache, and they share a second-level cache. This helps with both data synchronization and utilization of memory.

To answer your second question, I believe your assumption to be correct. If the processors were to be able to access each others' cache, there would obviously be less cache misses as there would be more data for the processors to choose from. Consider, however, shared cache. In the case of the Core 2 Duo, having shared cache allows programmers to place commonly used variables safely in this environment so that the processors will not have to access their individual first-level caches.

To answer your third question, there could potentially be a problem with accessing other processors' cache memory, which goes to the "Single Write Multiple Read" principle. We can't allow more than one process to write to the same location in memory at the same time.

For more info on the core 2 duo, read this neat article.

http://software.intel.com/en-us/articles/software-techniques-for-shared-cache-multi-core-systems/

25,733

Author by

goldenmean

struct descriptionOf { int elligent_Developer; short list_of_proven_skills; long list_of_ambitions; long long int erest_in_various_technologies; double effort_in_achieving_skills_and_ambitions; float ing_innovator; char of_a_hands_on_doer; }goldenmean; Software Developer with work experience in areas of Video/Image processing and codecs,DSP and multimedia Systems,on DSP/Multicore processor architecures, for devices and applications in Consumer Electronics, Communications industry. Programming languages: C,C++,Matlab/Octave,Python,DSP or RISC assembly languages.

Updated on July 09, 2022

Comments

goldenmean almost 2 years
I have a few questions regarding Cache memories used in Multicore CPUs or Multiprocessor systems. (Although not directly related to programming, it has many repercussions while one writes software for multicore processors/multiprocessors systems, hence asking here!)
1. In a multiprocessor system or a multicore processor (Intel Quad Core, Core two Duo etc..) does each cpu core/processor have its own cache memory (data and program cache)?
2. Can one processor/core access each other's cache memory, because if they are allowed to access each other's cache, then I believe there might be lesser cache misses, in the scenario that if that particular processors cache does not have some data but some other second processors' cache might have it thus avoiding a read from memory into cache of first processor? Is this assumption valid and true?
3. Will there be any problems in allowing any processor to access other processor's cache memory?
Peter Cordes over 6 years

The "variety" of solutions is really not that varied. Pretty much everything uses some minor variation on the MESI protocol. Many caches can have a copy of a Shared line, but only one cache in the coherency domain (i.e. the system) can have a line in Modified or Exclusive state. So to write a line, a CPU does a Read-For-Ownership to make sure no other cache in the system has a copy of that line. Related: how atomic read-modify-write works (lock inc [mem]): stackoverflow.com/questions/39393850/…
Peter Cordes over 6 years

Update 8 years later: these days it's typical for CPUs to have private per-core L1 and L2 caches, with a shared L3. (Intel since Nehalem.) The L3 can back-stop coherency traffic so it doesn't have to go all the way to memory.
HongboZhu about 6 years

Also NUMA (en.wikipedia.org/wiki/Non-uniform_memory_access) is an interesting and relevant topic.
Peter Cordes over 5 years

This answer was in need of a major overhaul for various reasons: 3-level caches on everything after Nehalem, and various slight technical mis-statements. I hope I managed to improve things without making it unreadable for people without cpu-architecture experience.