perf fails to work on really large process
I experienced something similar, but less severe.
In my case, I would get the CONFIG_PERF_EVENTS
error maybe 10% of the time. The rest of the time perf record
would complete without error, but it would fail to record any samples (looking at the output file with perf report
would state "file has no samples", and perf report -D ... | grep -c RECORD_SAMPLE
would confirm a count of zero).
Following the advice in this stackoverflow post, I added the -e cpu-clock
parameter to perf record
. This did not fix the CONFIG_PERF_EVENTS error, and perf record
would still take 2x-4x as long as I specified via sleep
- but at least it worked. Maybe it will help you, too.
Related videos on Youtube
Shlomi
Updated on September 18, 2022Comments
-
Shlomi over 1 year
I use perf quite often to profile my applications. Recently I got a few new machine with ~750G RAM each. I am trying to profile a process that uses about 400G of ram on one of them. This works fine on all the new machines but one. They were all installed the same way, running ubuntu.
I tried both
perf
as installed throughapt-get
and compiled my own from git.When running
./perf top -p 14182
I some time manage to get results (after a long wait, and very rarely) but most other times I am getting:The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:pp). /bin/dmesg may provide additional information. No CONFIG_PERF_EVENTS=y kernel support configured?
For the same pid(!).
I get similar problems when trying to record, for instance:
time ./perf record -F 111 -a -g -p 14182 -- sleep 3 Warning: PID/TID switch overriding SYSTEMsleep: Terminated Command exited with non-zero status 255 2.66user 91.58system 1:36.68elapsed 97%CPU (0avgtext+0avgdata 4896maxresident)k 0inputs+5248outputs (0major+4847minor)pagefaults 0swaps
This command takes way longer than 3 seconds, I cannot ctrl+c it, and killing it requires
-9
. It emits some data intoperf.data
but when I try to view it, i getfile has no samples
.Everything works just fine on other (smaller) processes on this machine, and as I mentioned earlier this works great on the rest of the machines, for processes weighting about 200G. When I run the previous command on the other machines, it also uses much less cpu (33% rather than 97%).
I am not sure where else to look, and google has not been very helpful for me. Any ideas or directions?
Edit
Seeing how system was high, I ran this with
strace -c
, this is what I get:Another machines with a process weighting ~200G:
_ strace -c perf record -F 111 -a -g -p 27879 -- sleep 3 Warning: PID/TID switch overriding SYSTEM[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.684 MB perf.data (~29876 samples) ] % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 91.43 2.548664 109 23481 read 2.47 0.068712 11 6457 15 stat 2.42 0.067392 9 7303 write 0.68 0.018960 12 1646 getdents 0.68 0.018841 22 847 1 mmap
This machine with a process weighting ~400G:
_ strace -c perf record -F 111 -a -g -p 14182 -- sleep 3 /tmpvaknins9@snlp-brevis-3 Warning: PID/TID switch overriding SYSTEMsleep: Terminated sleep: Terminated % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.70 131.243199 472 278178 read 0.16 0.207757 8 25895 write 0.04 0.048962 8 6398 5 stat 0.02 0.031247 14 2217 mmap 0.02 0.026385 9 3091 close
It seems lots of time is being spent on reading
/proc/tid/maps|status
.. Perhaps my process is simply too large? Any clues how I could still get it to record this process?Thanks!