Determining Specific File Responsible for High I/O

linux monitoring io profiling

54,137

Solution 1

There are several aspects to this question which have been addressed partially through other tools, but there doesn't appear to be a single tool that provides all the features you're looking for.

iotop

This tools shows which processes are consuming the most I/O. But it lacks options to show specific file names.

$ sudo iotop
Total DISK READ:       0.00 B/s | Total DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                        
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    5 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u:0]
    6 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
    7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]

By default it does what regular top does for processes vying for the CPU's time, except for disk I/O. You can coax it to give you a 30,000 foot view by using the -a switch so that it shows an accumulation by process, over time.

$ sudo iotop -a
Total DISK READ:       0.00 B/s | Total DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                        
  258 be/3 root          0.00 B    896.00 K  0.00 %  0.46 % [jbd2/dm-0-8]
22698 be/4 emma          0.00 B     72.00 K  0.00 %  0.00 % chrome
22712 be/4 emma          0.00 B    172.00 K  0.00 %  0.00 % chrome
 1177 be/4 root          0.00 B     36.00 K  0.00 %  0.00 % cupsd -F
22711 be/4 emma          0.00 B    120.00 K  0.00 %  0.00 % chrome
22703 be/4 emma          0.00 B     32.00 K  0.00 %  0.00 % chrome
22722 be/4 emma          0.00 B     12.00 K  0.00 %  0.00 % chrome

i* tools (inotify, iwatch, etc.)

These tools provide access to the file access events, however they need to be specifically targeted to specific directories or files. So they aren't that helpful when trying to trace down a rogue file access by an unknown process, when debugging performance issues.

Also the inotify framework doesn't provide any particulars about the files being accessed. Only the type of access, so no information about the amount of data being moved back and forth is available, using these tools.

iostat

Shows overall performance (reads & writes) based on access to a given device (hard drive) or partition. But doesn't provide any insight into which files are generating these accesses.

$ iostat -htx 1 1
Linux 3.5.0-19-generic (manny)  08/18/2013  _x86_64_    (3 CPU)

08/18/2013 10:15:38 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.41    0.00    1.98    0.11    0.00   79.49

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda
                  0.01     0.67    0.09    0.87     1.45    16.27    37.06     0.01   10.92   11.86   10.82   5.02   0.48
dm-0
                  0.00     0.00    0.09    1.42     1.42    16.21    23.41     0.01    9.95   12.22    9.81   3.19   0.48
dm-1
                  0.00     0.00    0.00    0.02     0.01     0.06     8.00     0.00  175.77   24.68  204.11   1.43   0.00

blktrace

This option is too low level. It lacks visibility as to which files and/or inodes are being accessed, just raw block numbers.

$ sudo blktrace -d /dev/sda -o - | blkparse -i -
  8,5    0        1     0.000000000   258  A WBS 0 + 0 <- (252,0) 0
  8,0    0        2     0.000001644   258  Q WBS [(null)]
  8,0    0        3     0.000007636   258  G WBS [(null)]
  8,0    0        4     0.000011344   258  I WBS [(null)]
  8,5    2        1 1266874889.709032673   258  A  WS 852117920 + 8 <- (252,0) 852115872
  8,0    2        2 1266874889.709033751   258  A  WS 852619680 + 8 <- (8,5) 852117920
  8,0    2        3 1266874889.709034966   258  Q  WS 852619680 + 8 [jbd2/dm-0-8]
  8,0    2        4 1266874889.709043188   258  G  WS 852619680 + 8 [jbd2/dm-0-8]
  8,0    2        5 1266874889.709045444   258  P   N [jbd2/dm-0-8]
  8,0    2        6 1266874889.709051409   258  I  WS 852619680 + 8 [jbd2/dm-0-8]
  8,0    2        7 1266874889.709053080   258  U   N [jbd2/dm-0-8] 1
  8,0    2        8 1266874889.709056385   258  D  WS 852619680 + 8 [jbd2/dm-0-8]
  8,5    2        9 1266874889.709111456   258  A  WS 482763752 + 8 <- (252,0) 482761704
...
^C
...
Total (8,0):
 Reads Queued:           0,        0KiB  Writes Queued:           7,       24KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        3,       24KiB
 Reads Requeued:         0       Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        5,       24KiB
 Read Merges:            0,        0KiB  Write Merges:            3,       12KiB
 IO unplugs:             2           Timer unplugs:           0

Throughput (R/W): 0KiB/s / 510KiB/s
Events (8,0): 43 entries
Skips: 0 forward (0 -   0.0%)

fatrace

This is a new addition to the Linux Kernel and a welcomed one, so it's only in newer distros such as Ubuntu 12.10. My Fedora 14 system was lacking it 8-).

It provides the same access that you can get through inotify without having to target a particular directory and/or files.

$ sudo fatrace
pickup(4910): O /var/spool/postfix/maildrop
pickup(4910): C /var/spool/postfix/maildrop
sshd(4927): CO /etc/group
sshd(4927): CO /etc/passwd
sshd(4927): RCO /var/log/lastlog
sshd(4927): CWO /var/log/wtmp
sshd(4927): CWO /var/log/lastlog
sshd(6808): RO /bin/dash
sshd(6808): RO /lib/x86_64-linux-gnu/ld-2.15.so
sh(6808): R /lib/x86_64-linux-gnu/ld-2.15.so
sh(6808): O /etc/ld.so.cache
sh(6808): O /lib/x86_64-linux-gnu/libc-2.15.so

The above shows you the process ID that's doing the file accessing and which file it's accessing, but it doesn't give you any overall bandwidth usage, so each access is indistinguishable to any other access.

So what to do?

The fatrace option shows the most promise for FINALLY providing a tool that can show you aggregate usage of disk I/O based on files being accessed, rather than the processes doing the accessing.

References

Solution 2

I haven't gotten an answer yet but I did write this script (at the end) and it seems to do what I want. I haven't tested it on other systems and it's Linux-specific.

Basically it just wraps around strace for 30 seconds, filtering for file related system calls and makes an effort to strip out the filename. It counts the number of occurrences of that file in the strace and presents a paginated summary to the user. It's not perfect but the number of system calls to a particular file may have some weak correlation to how much I/O it's performing.

I haven't tested it fully but if it doesn't work out of the box, it should give people a place to start from. If it gets fleshed out any more, it may be advisable to re-write this into a higher level language like python.

If I don't get an answer within a week of a less homebrewed way of doing this (even if it's another tool that just counts I/O of a particular process) I'll accept this as my answer for posterity.

Script:

#!/bin/bash

####
# Creates files underneath /tmp
# Requires commands: timeout  strace  stty
####
#
# All commands are GNU unless otherwise stated
#
##########################################################


####
## Initialization
####

outputFile=/tmp/out.$RANDOM.$$
uniqueLinesFile=/tmp/unique.$RANDOM.$$
finalResults=/tmp/finalOutput.txt.$$

if [ $# -ne 1 ]; then
    echo "USAGE: traceIO [PID]" >&2
    exit 2
fi

if ! [[ "$1" =~ ^[0-9]+$ ]]; then
    echo "USAGE: traceIO [PID]" >&2
    echo -e "\nGiven Process ID is not a number." >&2
    exit 2
fi

if [ ! -e /proc/$1 ]; then
    echo "USAGE: traceIO [PID]" >&2
    echo -e "\nThere is no process with $1 as the PID." >&2
    exit 2
fi

if [[ "x$PAGER" == "x" ]]; then

   for currentNeedle in less more cat; do

      which $currentNeedle >/dev/null 2>&1

      if [ $? -eq 0 ]; then
         PAGER=$currentNeedle
         break;
      fi

   done

  if [[ "x$PAGER" == "x" ]]; then

     echo "Please set \$PAGER appropriately and re-run" >&2
     exit 1

  fi

fi

####
## Tracing
####

echo "Tracing command for 30 seconds..."

timeout 30 strace -e trace=file -fvv -p $1 2>&1 | egrep -v -e "detached$" -e "interrupt to quit$" | cut -f2 -d \" > $outputFile

if [ $? -ne 0 ]; then
   echo -e "\nError performing Trace. Exiting"
   rm -f $outputFile 2>/dev/null
   exit 1
fi

echo "Trace complete. Preparing Results..."

####
## Processing
####

sort $outputFile | uniq > $uniqueLinesFile

echo -e "\n--------  RESULTS --------\n\n  #\t Path " > $finalResults
echo -e " ---\t-------" >> $finalResults

while IFS= read -r currentLine; do

   echo -n $(grep -c "$currentLine" "$outputFile")
   echo -e "\t$currentLine"

done < "$uniqueLinesFile" | sort -rn >> $finalResults

####
## Presentation
####

resultSize=$(wc -l $finalResults | awk '{print $1}')
currentWindowSize=$(stty size | awk '{print $1}')

  # We put five literal lines in the file so if we don't have more than that, there were no results
if [ $resultSize -eq 5 ]; then

   echo -e "\n\n No Results found!"

elif [ $resultSize -ge $currentWindowSize ] ; then

   $PAGER $finalResults

else

   cat $finalResults

fi

  # Cleanup
rm -f $uniqueLinesFile $outputFile $finalResults

Solution 3

You can use iwatch Using iWatch

iWatch is very simple to use, suppose you want to watch the change in /etc filesystem, you just need to run it in the console

$ iwatch /etc

and iwatch will tell you if something changes in this directory. And if you want to be notified per email:

$ iwatch -m [email protected] /etc

In this case, the admin will get email notification (maybe you can use your sms gateway account, so you will be alarmed immediately anytime and anywhere). And if you want to monitor many difference directories you can use a configuration file. This configuration file is an xml file with an easy understandable structure.

54,137

Bratchley

Updated on September 18, 2022

Comments

Bratchley almost 2 years

This is a simple problem but the first time I've ever had to actually fix it: finding which specific files/inodes are the targets of the most I/O. I'd like to be able to get a general system overview, but if I have to give a PID or TID I'm alright with that.

I'd like to go without having to do a strace on the program that pops up in iotop. Preferably, using a tool in the same vein as iotop but one that itemizes by file. I can use lsof to see which files mailman has open but it doesn't indicate which file is receiving I/O or how much.

I've seen elsewhere where it was suggested to use auditd but I'd prefer to not do that since it would put the information into our audit files, which we use for other purposes and this seems like an issue I ought to be able to research in this way.

The specific problem I have right now is with LVM snapshots filling too rapidly. I've since resolved the problem but would like to have been able to fix it this way rather than just doing an ls on all the open file descriptors in /proc/<pid>/fd to see which one was growing fastest.
- slm almost 11 years
  
  possibly related: unix.stackexchange.com/questions/9520/…
- Bratchley almost 11 years
  
  Yeah, I hadn't seen that one before but most of the answers to this question were basically like that: "Well if you do things this incredibly specific way, and do something weird you can have a rough idea" versus something that directly solves the problem without requiring that the admin get too fancy. I don't mean to criticize others, and I realize now the difficulty of this problem is probably way such solutions were offered, but it seems like even if there isn't a tool like fatrace but older, that something like the script I wrote should have been offered since it's more widely usable.
- Bratchley almost 11 years
  
  Just to be clear: I'm not criticizing the others who did offer help. Help is always better than no help. It's just frustrating when you feel the problem should have a straight forward response and all you can figure out yourself or see others suggesting are either kludgy workarounds or very manual processes (such as what I ended up doing with my mailman problem).
- slm almost 11 years
  
  Yeah I'm always amazed when I find answers to new Q's here buried in the site that don't show up until I dig for a while. Seems like somethings broken there 8-). Hence why it's good to ask the same Q multiple ways and link it to the older ones as they're routed out. Agreed your script is a better approach, I'm still surprised that there isn't a general purpose tool that does what you ask. Seems like a big gap in Unix.
- slm almost 11 years
  
  Most of the help is just extremely targeted which can get a little annoying, since when answering you're saying the same thing a lot of times over and over in different ways. But that's the nature of the SE sites. I don't know how Gilles does it. I like these longer form Q&A's better.
- Bratchley almost 11 years
  
  If I had to guess, it's probably because applications generally have their own load/traffic metrics. So it probably was a marginal problem, especially since multi-use servers are typically like that because they're low traffic in the first place. Mailman, though, doesn't have any metrics to speak of, which is frustrating since I now have to re-write part of the webui and introduce a custom handler just to get that going. They have standard logging but no access to metrics or tracking progress or ANYTHING. Still, this problem should be solvable at the platform level just for cases like that.
- Bratchley almost 11 years
  
  For cases where the application traffic monitoring sucks I mean. If that is even the logic of why it took so long to develop something like fatrace.
- slm almost 11 years
  
  The fatrace looks like it went through a rough path to get into the Kernel. The posts I referenced were from 2009 and it's only showing up just now (2012-2013) in the more recent Kernels. It seemed to encounter a lot of resistance but I don't truly understand why. Rolling your own "tool" was the only option using the regular cast of character tools I mentioned without it. Seems stupid that each SA/Dev. would have to make their own tool for something that seems so basic once you asked the Q.
Bratchley almost 11 years

I'm supposing this is using inotify is that correct? I was hesitant to use anything based on inotify since you have to give it paths (which is essentially what I'm looking for) and I was worried at how much overhead there would be if I just did everything underneath / Can this filter by PID? I might be able to tolerate temporary slowness if it's going to be easy enough to extract which program is doing it. The website also doesn't have any example command output.
vfbsilva almost 11 years

@JoelDavis Im really not sure. As far as I know it consumes a huge ammount of RAM hence running it under "/" will be dangerous.
Bratchley almost 11 years

Sweet baby Jesus, slm. You are like the rockstar of Unix SE as far as I'm concerned. You answers are always incredibly educational and show a lot of research all in one place. Most people (if they knew about it) would have just posted the last bit about fatrace and not developed it much passed that. I really do appreciate how you go the extra mile to make sure people understand the complete picture and wish I could do more than just upvote and give bounty.
slm almost 11 years

@JoelDavis - thanks for your very kind words. I liked your idea of making a canonical answer so I was attempting to start that here. I've run into this problem many times as well and wished I had a resource like this so I figured we'd create it here 8-).
Bratchley almost 11 years

One thing I'm confused about: When I did the install yum pulled in python3's libraries for some reason. I did a file on it and it looks like it's an ELF executable. ldd doesn't show any links to python and neither did strings. Any idea why it bothered with python3?
slm almost 11 years

@JoelDavis - which distro? CentOS 6? I did not see the package on Cent6, my commands were from Ubuntu 12.10.
slm almost 11 years

@JoelDavis - BTW I like that you ask questions that aren't just selfishly trying to solve just your problems but also leaving a path for others in the future.
Bratchley almost 11 years

This is on Fedora 18. And thank you, sir.
Bratchley almost 11 years

BTW, apparently I have to wait some time after accepting the answer to award bounty. Not that it matters to someone with roughly half of Unix SE's aggregate amount reputation points but just an FYI.
slm almost 11 years

@JoelDavis - NP. Is the lack of a aggregate bandwidth from fatrace an issue for you? That implementation detail feels like a hook that was exposed for other tools to step in and expand it as needed, no?
Bratchley almost 11 years

Not really an issue for me, no. I can get the information I need about that via the appropriate iotop and iostat calls. Also, I figured out the python thing, it looks like (on Fedora 18 at least) there's a "power-usage-report" python script so yum was just responding to the fact that python is in the RPM's dependencies. So that particular mystery is solved.
Bratchley almost 11 years

Basically I'd use iostat to confirm bandwidth saturation on a particular device, use iotop to get a short list of applications using a lot of I/O then use fatrace to confirm whether that application's I/O was related to the the bandwidth saturation (i.e the I/O iotop is returning is targeted at the device I'm concerned about).
slm almost 11 years

@JoelDavis - yeah it's still a blending of tools situation, I hate having to do that, esp. as a DevOp on some production system at 2am. 8-).
Bratchley almost 11 years

Call me a weirdo but I actually like blending tools. Mix and match helps me solve problems the developers have no way to be able to anticipate an admin encountering. It's just that you can go too far in that direction and have to perform 50 different steps just to answer a simple question. fatrace looks like it solves that problem by cutting out 47 of the steps (not to mention a single central solution for multiple environments to build procedures/skillsets around and direct bugfixes towards).
Bratchley almost 11 years

Not to say that knowing how much bandwidth fileX is taking up wouldn't be useful, though. As long as I can see the I/O going out in fatrace and it gets quantified by iostat I'm fine.
slm almost 11 years

@JoelDavis - don't get me wrong, I like the ability to cut knew solutions using Unix lego blocks too, just not at 2am when I'm under the gun 8-). I was going to keep looking for some other options to see if we can mix in the bandwidth more systematically.
slm almost 11 years

let us continue this discussion in chat