What are perf cache events meaning?
Solution 1
You seem to think that the cache-misses
event is the sum of all other kind of cache misses (L1-dcache-load-misses
, and so on). That is actually not true.
the cache-misses
event represents the number of memory access that could not be served by any of the cache.
I admit that perf's documentation is not the best around.
However, one can learn quite a lot about it by reading (assuming that you already have a good knowledge of how a CPU and a performance monitoring unit work, this is clearly not a computer architecture course) the doc of the perf_event_open() function:
http://web.eece.maine.edu/~vweaver/projects/perf_events/perf_event_open.html
For example, by reading it you can see that the cache-misses
event showed by perf list corresponds to PERF_COUNT_HW_CACHE_MISSES
Solution 2
Some answers:
-
L1
is the Level-1 cache, the smallest and fastest one.LLC
on the other hand refers to the last level of the cache hierarchy, thus denoting the largest but slowest cache. -
i
vs.d
distinguishes instruction cache from data cache. Only L1 is split in this way, other caches are shared between data and instructions. -
TLB
refers to the translation lookaside buffer, a cache used when mapping virtual addresses to physical ones. - Different TLB counters depending on whether the named address referred to an instruction or some data.
- For all data access, different counters are kept depending on whether the given memory location was read, written, or prefetched (i.e. retrieved for reading at some later time).
- The number of misses indicates how often a given item of data was accessed but not present in the cache.
Solution 3
According to perf tutorial, Performance Monitoring Unit (PMU) events or hardware events refer to those events which can be mapped directly to CPU specific events for a CPU vendor. But the hardware cache events refer to some hardware events monikers provided by perf
, which may be mapped to actual events provided by the CPU. For the list of perf
's cache events use perf list cache
in Linux terminal.
Comments
-
Manuel Selva over 4 years
I am trying to figure out why a modified C program is running faster than its non modified counter part (I am adding very few lines of code to perform some additional work). In this context, I suspect "cache effects" to be the main explanation (instruction cache). Thus I reach the
perf
(https://perf.wiki.kernel.org/index.php/Main_Page) profiling tool but unfortunately I am not able to understand the meaning of its outputs regarding cache misses.Several events about cache are provided:
cache-references [Hardware event] cache-misses [Hardware event] L1-dcache-loads [Hardware cache event] L1-dcache-load-misses [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-dcache-store-misses [Hardware cache event] L1-dcache-prefetches [Hardware cache event] L1-dcache-prefetch-misses [Hardware cache event] L1-icache-loads [Hardware cache event] L1-icache-load-misses [Hardware cache event] L1-icache-prefetches [Hardware cache event] L1-icache-prefetch-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-stores [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-prefetches [Hardware cache event] LLC-prefetch-misses [Hardware cache event] dTLB-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-stores [Hardware cache event] dTLB-store-misses [Hardware cache event] dTLB-prefetches [Hardware cache event] dTLB-prefetch-misses [Hardware cache event] iTLB-loads [Hardware cache event] iTLB-load-misses [Hardware cache event] branch-loads [Hardware cache event] branch-load-misses [Hardware cache event] node-loads [Hardware cache event] node-load-misses [Hardware cache event] node-stores [Hardware cache event] node-store-misses [Hardware cache event] node-prefetches [Hardware cache event] node-prefetch-misses [Hardware cache event]
Where can I find explanation about these fields ? cache-misses event is always smaller than other events. What does this event measure ?
How to interpret the 26,760 L1-icache-load-misses for ls vs the 5,708 cache-misses in the following example ?
perf stat -e L1-icache-load-misses ls caches caches~ out Performance counter stats for 'ls': 26,760 L1-icache-load-misses 0.002816690 seconds time elapsed perf stat -e cache-misses ls caches caches~ out Performance counter stats for 'ls': 5,708 cache-misses 0.002822122 seconds time elapsed