How to catch the L3-cache hits and misses by perf tool in Linux

10,266

Solution 1

That is strange LLC (Last Level Cache) is configured with "L2" if the hardware has L3 cache. But I don't know yet internals of perf and maybe these settings are generic.

I think the only solution you have is to use "raw hardware event" (see at the end of "perf list", the line starting with "rNNN"). That gives the opportunity to encode a description of the hardware registers.

The perf user guide and tutorial only mention "To measure an actual PMU as provided by the HW vendor documentation, pass the hexadecimal parameter code". I don't know what is the syntax on Intel and if there is different implementations of the performance monitor on this architecture. You could start here:

http://code.google.com/p/kernel/wiki/PerfUserGuide#Hardware_events

Solution 2

I have had more success using raw event counters, looking directly at the Intel Software Developer Manual for detailed definitions.

http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html

From section: 18.2.1.2 Pre-defined Architectural Performance Events

r412e "LLC Misses" is likely the one you want

perf stat -e r412e <command>

(Note that for me, this gives the same number as using -e cache-misses.)

Solution 3

To get system-wide L3 cache miss rate, just do:

$ sudo perf stat -a -e LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetch-misses sleep 5


Performance counter stats for 'system wide':

    24,477,266,369      LLC-loads                                                     (22.65%)
     1,409,470,007      LLC-load-misses           #    5.76% of all LL-cache hits     (29.79%)
        88,584,705      LLC-stores                                                    (30.32%)
        10,545,277      LLC-store-misses                                              (30.03%)
       150,785,745      LLC-prefetch-misses                                           (34.71%)

      13.773144159 seconds time elapsed

This prints out both misses and total references. The ratio is the L3 cache miss rate.

See complete event list on wiki: https://perf.wiki.kernel.org/index.php/Tutorial#Events

Share:
10,266
Admin
Author by

Admin

Updated on June 05, 2022

Comments

  • Admin
    Admin almost 2 years

    Is there any way to catch the L3-cache hits and misses by perf tool in Linux. According to the output of perf list cache, L1 and LLC cache are supported. According to the definition of perf_evsel__hw_cache array in perf's source code:

    const char *perf_evsel__hw_cache[PERF_COUNT_HW_CACHE_MAX]
                                    [PERF_EVSEL__MAX_ALIASES] = {
     { "L1-dcache", "l1-d",         "l1d",          "L1-data",              },
     { "L1-icache", "l1-i",         "l1i",          "L1-instruction",       },
     { "LLC",       "L2",                                                   },
     { "dTLB",      "d-tlb",        "Data-TLB",                             },
     { "iTLB",      "i-tlb",        "Instruction-TLB",                      },
     { "branch",    "branches",     "bpu",          "btb",          "bpc",  },
     { "node",                                                              },
    };
    

    LLC is an alias to L2-cache. My question is how to catch the L3-cache hits and misses by perf tool in Linux. Thanks in advance!

  • osgx
    osgx almost 10 years
    And page bnikolic.co.uk/blog/hpc-prof-events.html have advises of searching and using raw perf events with help of libpfm4 (perfmon2) utilities showevtinfo and check_events
  • Zheng Shao
    Zheng Shao over 6 years
    To get system-wide L3 cache miss rate, just do: sudo perf stat -a -e LLC-loads -e LLC-load-misses -e LLC-stores -e LLC-store-misses -e LLC-prefetch-misses which prints out both misses and total references. The ratio is the L3 cache miss rate.
  • blaze9
    blaze9 over 3 years
    May i ask what are the LLC-prefetch-misses and how they should be used to the calculation of L3 cache miss rate? So far, my calculation is simply (LLC-load-misses+LLC-store-misses) / (LLC-loads+LLC-stores)