Measure disk IO latencies of a running process

linux strace perf-event dtrace

8,273

Solution 1

This is actually complicated. But there is hints:

Learn about SystemTap, this is linux analog of DTrace. I think they may even have example script for similar task.
Learn blktrace. You may be able to parse its output, in theory. This will be more device latency (service time) than response time program get on read().

Yes strace may not be appropriate, since it will trace everything (all syscalls, even when you use -e filter) and will load server and slower process considerably. Perf is very obscure tool, you may have moments you think you understand its output, but you actually didn't, and its feature set is highly depend on kernel version. Basically and currently perf is suitable for measuring CPU time (cycles), and [yet] unsuitable to measuring reponse times (which you actually need). I heard they wanted to implement something to ease that, so on very recent development kernels there may have something. (Look also in perf-scripts (perf script -l) if you will investigate further.)

May be you will be able to get something from ftrace. Read this article http://lwn.net/Articles/370423/ (And this for the intro.) As I can see you can limit ftracing by pid and function, then trace with something like sys_read. I tried this as example for you:

# mount -t debugfs debugfs /sys/kernel/debug # if it's not already mounted
# cd /sys/kernel/debug/tracing
# echo $$ > set_ftrace_pid  # pid of process to trace
# echo sys_read sys_write > set_ftrace_filter
# echo function_graph > current_tracer
# head trace

# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0)   8.235 us    |  sys_write();
 0)   3.393 us    |  sys_write();
 0) ! 459859.3 us |  sys_read();
 0)   6.289 us    |  sys_write();
 0)   8.773 us    |  sys_write();
 0) ! 1576469 us |  sys_read();

Solution 2

If you're only interested in the number of "read" or "write" calls to block devices this is Red Hat's SOP for determining that.

Using the block dump feature and a bit of scripting a high level overview about the I/O actions processes are producing can be gathered. To do so, complete the following:

Disable system logging for a short period of time (so it doesn't get in the way of the data capture):

# service syslog stop # echo 1 > /proc/sys/vm/block_dump

Wait for the high iowait issue to occur, once it has past re-enable syslog (or rsyslog if using that), and disable the block dump:

# service syslog start # echo 0 > /proc/sys/vm/block_dump

Using the following command parse the dmesg output for READ/WRITE/dirtied actions being issued by certain processes:

# dmesg | awk '/(READ|WRITE|dirtied)/ {activity[$1]++} END {for (x in activity) print x, activity[x]}'| sort -nr -k 2,2| head -n 10

kjournald(1425): 5984 kjournald(3681): 1269 pdflush(27301): 725 iostat(2913): 134 crond(26919): 61 crond(28985): 60 crond(7026): 54 sshd(28175): 50 sshd(15388): 50 nautilus(24498): 46

The example output above shows the top 10 processes that issued READ, WRITE and dirtied operations during the time the block dump was running. Using this data a high level overview of the number of operations processes are issuing can be gathered and it can help determine if a single process is contributing highly to iowait.

There are also several command line tools like atop and iotop that give you per-process iowait statistics and can be ran as part of a script (meaning they have batch modes that can do a single iteration for particular PIDs).

EDIT: Doing more research it looks like you can get per-process iowait from /proc/$pid/stat (search for "Aggregated block I/O delays")

8,273

Author by

ajduff574

Updated on September 18, 2022

Comments

ajduff574 over 1 year

I'm trying to measure the disk IO latencies of a running process to make a histogram.

I could do this with DTrace in operating systems that provide it (e.g. as in this Joyent paper), but my application is running in Linux. My first thought was to try perf, and I can get counters but I can't find any way to get time deltas. I can get time deltas with strace (e.g. strace -e read -T), but I'm not sure if I can restrict the trace to disk IO (this system has a busy network interface as well).

Is there any way to do this in Linux?
ajduff574 about 11 years

It looks like neither of those will get me enough information to make a histogram. I'm specifically interested in the wait time distribution, rather than the aggregate wait time.
Bratchley about 11 years

/proc/pid/stat should still help, you would just need a script to run periodically and record the information, subtracting the value that was found last time. After that you're pretty much taking about sample rate.
ajduff574 about 11 years

Thanks, it looks like there are several options in there. I started with blktrace, which seems like it's supposed to do what I want, but I couldn't get the -t option to work. I will look at the others as well.
catpnosis almost 10 years

Interesting approach there brendangregg.com/blog/2014-07-01/perf-heat-maps.html