why does perf stat show "stalled-cycles-backend" as <not supported>?

15,076

Solution 1

Looks like perf has not been updated to understand all the performance monitoring events that Ivy Bridge supports. Fortunately there is a generic, albeit cumbersome, interface that allows you to access the full list of performance monitoring events. I didn't see stalled-cycles-backend in the list when I gave it a quick look, but maybe I missed, or maybe they have broken it down by all the different events that could stall the backend.

We start with

perf list --help

...shows the following NOTE

    1. Intel(R) 64 and IA-32 Architectures Software Developer's Manual
       Volume 3B: System Programming Guide
       http://www.intel.com/Assets/PDF/manual/253669.pdf

...armed with that URL you end up in

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf

...you want section 19.3

19.3 PERFORMANCE MONITORING EVENTS FOR 3RD GENERATION INTEL® CORE™ PROCESSORS 3rd generation Intel® Core™ processors and Intel Xeon processor E3-1200 v2 product family are based on Intel microarchitecture code name Ivy Bridge. They support architectural performance-monitoring events listed in Table 19-1. Non-architectural performance-monitoring events in the processor core are listed in Table 19-5. The events in Table 19-5 apply to processors with CPUID signature of DisplayFamily_DisplayModel encoding with the following values: 06_3AH.

...so for architectural events you need Table 19-1

19.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS Architectural performance events are introduced in Intel Core Solo and Intel Core Duo processors. They are also supported on processors based on Intel Core microarchitecture. Table 19-1 lists pre-defined architectural performance events that can be configured using general-purpose performance counters and associated event-select registers.

**Table 19-1. Architectural Performance Events

enter image description here

enter image description here

... now comes the tricky part, you take the UMask Value as the upper 2 hex digits and the Event Num is the lower 2 hex digits of a 4 hex digit hardware register number to be given to perf stat.

perf stat --help
   -e, --event=
       Select the PMU event. Selection can be a symbolic event name (use
       perf list to list all events) or a raw PMU event (eventsel+umask) in
       the form of rNNN where NNN is a hexadecimal event descriptor.

... it says NNN but you can give it NNNN. Let's verify that this works, let's ask perf stat for cache-misses both as a symbolic event name and as a hex number from table 19-1. We'll invoke the date command for simplicity.

$ perf stat -e r412e -e cache-misses date

Fri Mar 28 09:28:52 CDT 2014

Performance counter stats for 'date':

          2292 r412e                                                       
          2292 cache-misses                                                

   0.003322663 seconds time elapsed

$ 

As you can see both reported the same number, so far so good. Now we go to Table 19-5 for the non-architectural hardware registers, of which there are too many too list here, but I'll list a few:

enter image description here

Solution 2

The perf (or its in-kernel part) was not updated to support your CPU, so perf is unable to map generic event name "stalled-cycles-backend" to actual HW event.

In such case it can be easier to find event names; e.g. for Intel CPUs - from Intel's optimization manual http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf (which groups events by type and explains how to use them to measure various parts). Don't have similar document for AMD.

To use event names with perf without manual conversion into raw event ids (like amdn says in his answer), you can use converter scripts showevtinfo and check_events from perfmon2 (libpfm4; examples folder), as explained in the article "How to monitor the full range of CPU performance events" by Bojan Nikolic http://www.bnikolic.co.uk/blog/hpc-prof-events.html. perfmon2 knows AMD and Intel CPUs, and written in C/C++

For Intel CPUs the easiest way is to use ocperf wrapper over perf from Intel's open source python project by Andi Kleen "pmu-tools" hosted at github https://github.com/andikleen/pmu-tools and introduced here in ML: https://lwn.net/Articles/556983/ and in Andi's blog http://halobates.de/blog/p/245

The ocperf understands all intel event names from Intel's optimization manual.

ocperf will also support every HW event with older linux kernels. It has its own database in tsv or json format with all HW events and their codes at https://download.01.org/perfmon/ (there is auto-downloader in pmu-tools), and the database is constantly updated by Intel's employers. Format of database is documented in readme: https://download.01.org/perfmon/readme.txt

For Sandy Bridge/Ivy Bridge or Haswell, and kernels 3.10 or newer, you can also use toplev.py script from "pmu-tools" to investigate performance. Here is description from its author, Andi Kleen, http://halobates.de/blog/p/262 "pmu-tools, part II: toplev" based on "TopDown" method from Ahmad Yasin "How to Tune Applications Using a Top-Down Characterization of Microarchitectural Issues and "Top Down Analysis. Never lost with performance counters"

Solution 3

Just found Re: perf, x86: Add parts of the remaining haswell PMU functionality:

> AFAICS backend stall cycles are documented to work on Ivy Bridge.

I'm not aware of any documentation that presents these events
as accurate frontend/backend stalls without using the full
TopDown methology (Optimization manual B.3.2)

So IIUC stalled-cycles-backend counters are too unreliable on Ivy Bridge, and that's why the kernel devs have decided to not support them.

And sure enough, Linux' perf_event_intel.c supports PERF_COUNT_HW_STALLED_CYCLES_BACKEND for Nehalem, Xeon E7 and SandyBridge, but not for IvyBridge. PERF_COUNT_HW_STALLED_CYCLES_FRONTEND is supported for IvyBridge, though.

So I guess there won't be a way to get this counter on my current CPU - either switch CPUs or use the full top-down methodology mentioned in the mail (and described here and here)

Share:
15,076
oliver
Author by

oliver

Updated on June 15, 2022

Comments

  • oliver
    oliver almost 2 years

    Running perf stat ls shows this:

    Performance counter stats for 'ls':
    
              1.388670 task-clock                #    0.067 CPUs utilized          
                     2 context-switches          #    0.001 M/sec                  
                     0 cpu-migrations            #    0.000 K/sec                  
                   266 page-faults               #    0.192 M/sec                  
               3515391 cycles                    #    2.531 GHz                    
               2096636 stalled-cycles-frontend   #   59.64% frontend cycles idle   
       <not supported> stalled-cycles-backend  
               2927468 instructions              #    0.83  insns per cycle        
                                                 #    0.72  stalled cycles per insn
                615636 branches                  #  443.328 M/sec                  
                 22172 branch-misses             #    3.60% of all branches        
    
           0.020657192 seconds time elapsed
    

    Why is stalled-cycles-backend shown as "not supported"? What kind of CPU, hardware, kernel or user-space software do I need to see this value?

    Currently tried this on RHEL with Linux 3.12 for x86_64, with matching perf version, on different Intel Core i5 and i7 systems (Ivy Bridge type). None of them support stalled-cycles-backend.

    Some more info:

    $ perf list | grep stalled
      stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
      stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event]
    
    $ ls /sys/devices/cpu/events/
    branch-instructions  bus-cycles    cache-references  instructions  mem-stores
    branch-misses        cache-misses  cpu-cycles        mem-loads     stalled-cycles-frontend
    
    $ cat /sys/bus/event_source/devices/cpu/events/stalled-cycles-frontend
    event=0x0e,umask=0x01,inv,cmask=0x01
    

    Edit: just tried this on an AMD Phenom II X6 1045T CPU, under Ubuntu 12.04 with Linux 3.2 (32bit) - and here it does show values for both stalled-cycles-frontend and stalled-cycles-backend.