Impact of the L3 cache on performance - worth a dual-processor system?

18,993

Solution 1

As always with caching questions, the answer would be "it entirely depends on your workload". The cache is only of any use if your running processes are spending a significant amount of time accessing memory and exhibit a noticeable locality of reference for memory addressing (and are not happy with the smaller L1/L2 caches present per core for this matter).

Having a high number of processes running within different threads increases the odds for thrashing of the shared cache and thus diminishes performance gains which possibly would have been achieved otherwise. This is also the reason for increasing the cache size with an increased core count - the more memory-competing threads you have running, the larger your shared cache likely needs to be in order to be useful at all.

There is an oldish article from Tom's Hardware comparing two old P4 chips with and without L3 cache for a number of rendering / graphical workloads. The numbers are rubbish, as is the whole benchmark, but it contains a nice explanation of the caching architecture in general and L3 caching in particular.

The bottom line: you likely would not notice the difference, but if you need the exact numbers, you would have to purchase both CPUs and run your workload on both of them to compare runtimes.

Solution 2

People saying "a mere 20 MB increase in L3 cache" simply do not know what they are talking about. A sensible increase in cache size for a given architecture is likely to cause a sensible boost in performance, even with an average load. This is more true, when you think about the turbo boost architecture implemented in sandy bridge and ivy bridge processors.

I had the chance to experiment this personally in several different stages of the x86/x86_64 architecture: Sempron vs Athlon, Celeron vs Pentium 4, Pentium4 vs Athlon, Pentium4-m vs Pentium-m, Pentium 4 vs Xeon, i7 vs Xeon E5. Whenever the cache is bigger (usually doubled or almost doubled).

Whether the cost of doubling the cache is affordable, is up to you. But Xeon are better for stability, since they support ECC memory and such technologies, which are obviusly a must-have in certain applications (such as 3D simulations for aluminum die-casting, which is my case).

Solution 3

From your description of what you do and how your current system handles it I can only wonder why you want to replace it. At best the L3 cache would give a trifling boost but at a great expense but in your described use case you can't expect to see any difference resulting from a mere 20MB increase in L3 cache.

Share:
18,993
Dan Nissenbaum
Author by

Dan Nissenbaum

Freelance C++ and PHP developer, with PhD in physics, in Brattleboro, VT, USA Generally with bare feet Best to all!

Updated on September 18, 2022

Comments

  • Dan Nissenbaum
    Dan Nissenbaum almost 2 years

    I will be purchasing a new high-end system, and I would like to have a better sense of whether a dual-processor Xeon system (I am looking at the new, high-end Xeon E5-2687W) might, realistically, provide a noticeable performance improvement due to the doubling of the L3 cache (20 MB per CPU).

    (This is in addition to the occasional added advantage due to the doubling of cores and RAM.)

    My usage scenario is, roughly, that I have many background applications running at any time - 3 or 4 data compression/backup applications, a low-impact web server, one or two virtual machines at any given time (usually fairly idle), and perhaps 20 utility programs that utilize a noticeable (but small) portion of the CPU cores. In total, when I am not actively using the computer, about 25% of the total CPU power is utilized in my current i7-970 6-core (12 thread) system.

    When I am doing routine work, the CPU utilization often exceeds 50%, and occasionally hits 75%-80%.

    The Xeon E5-2687W is not only a second-generation i7 (so should improve performance for that reason), but also has 8 cores (16 threads), rather than 6 cores. For this reason, I expect to run into the 75% CPU range even less frequently. Nonetheless, the ability to double the cores and the RAM is a consideration.

    However, in the end, I believe this decision comes down to whether the doubling of the L3 cache will provide a noticeable improvement. There are many benchmarks, and a lot of discussion, regarding CPU power. However, I find very little discussion of L3 cache utilization, and how increases in the L3 cache (such as doubling it with dual processors) affect performance.

    For example: If there are only two processes running, but each benefits from a large L3 cache (such as might be the case for background processes that frequently scan the file system), perhaps the overall system performance might noticeably improve with dual CPU's - even if only a single core is active on each CPU - due to each process having double the effective L3 cache.

    I am hoping that someone has a sense of the benefits of increasing (or doubling) the L3 cache size.

    Note: the CPU I am considering (the Xeon E5-2687W) has 20 MB L3 cache, so a system with dual CPU's would have 40 MB L3 cache.

    • ewwhite
      ewwhite about 12 years
      Which operating systems will be in use?
    • Dan Nissenbaum
      Dan Nissenbaum about 12 years
      Windows 7 Professional. The VM's (fairly low-impact) are not of major importance - they will be of various OS's.
    • Admin
      Admin about 12 years
      Cache size increase tends to be logarithmic in performance gain, so double it for a bit more performance. Apart from that everything you mentioned doesn't seem to be memory intensive at your usage scenario.
  • Joe Yahchouchi
    Joe Yahchouchi over 7 years
    Why do you consider the numbers of that benchmark to be rubbish? That benchmark seems to be the only thing on the internet addressing the cache performance with some numbers.
  • syneticon-dj
    syneticon-dj over 7 years
    @JoeYahchouchi to be honest, I do not remember. But looking at the article today, I suppose it was "rubbish" in the sense of "the benchmark numbers are not applicable to your environment and the 3rd level cache gain observed there cannot be interpolated to your specific use case".