List the files accessed by a program

81,392

Solution 1

I gave up and coded my own tool. To quote from its docs:

SYNOPSIS
    tracefile [-adefnu] command
    tracefile [-adefnu] -p pid

OPTIONS
    -a        List all files
    -d        List only dirs
    -e        List only existing files
    -f        List only files
    -n        List only non-existing files
    -p pid    Trace process id
    -u        List only files once

It only outputs the files so you do not need to deal with the output from strace.

https://gitlab.com/ole.tange/tangetools/tree/master/tracefile

Solution 2

You can trace the system calls with strace, but there is indeed an inevitable speed penalty. You need to run strace as root if the command runs with elevated privileges:

sudo strace -f -o foo.trace su user -c 'mycommand'

Another method that's likely to be faster is to preload a library that wraps around filesystem access functions: LD_PRELOAD=/path/to/libmywrapper.so mycommand. The LD_PRELOAD environment variable won't be passed to programs invoked with elevated privileges. You'd have to write the code of that wrapper library (here's an example from “Building library interposers for fun and profit”); I don't know if there is reusable code available on the web.

If you're monitoring the files in a particular directory hierarchy, you can make a view of the filesystem with LoggedFS such that all accesses through that view are logged.

loggedfs -c my-loggedfs.xml /logged-view
mycommand /logged-view/somedir

To configure LoggedFS, start with the sample configuration shipped with the program and read LoggedFS configuration file syntax.

Another possibility is Linux's audit subsystem. Make sure the auditd daemon is started, then configure what you want to log with auditctl. Each logged operation is recorded in /var/log/audit/audit.log (on typical distributions). To start watching a particular file:

auditctl -a exit,always -w /path/to/file

If you put a watch on a directory, the files in it and its subdirectories recursively are also watched. Take care not to watch the directory containing the audit logs. You can restrict the logging to certain processes, see the auditctl man page for the available filters. You need to be root to use the audit system.

Solution 3

I think you want lsof (possibly piped to a grep on the program and it's children). It will tell you every file that's currently being accessed on the filesystem. For information about which files accessed by process (from here):

lsof -n -p `pidof your_app`

Solution 4

I tried that tracefile. For me it gave much less matches than my own strace ... | sed ... | sort -u. I even added -s256 to strace(1) command line but it did not help much...

Then I tried that loggedfs. First it failed since I did not have read/write access to the directory I tried to log with it. After doing chmod 755 temporarily I did get some hits...

But, for me, doing the following seems to work best:

inotifywait -m -r -e OPEN /path/to/traced/directory

And then postprocess the output after running the process of interest.

This doesn't catch the files process access outsice of the traced directory nor this doesn't know whether some other process accessed the same directory tree, but in many cases this is good enough tool to get the job done.

EDIT: inotifywait does not catch symlink access (just the targets after symlinks resolved). I was hit by this when I archived libraries accessed by a program for future use. Used some extra perl glob hackery to pick the symlinks along the notified libraries to get the job done in that one particular case.

EDIT2: at least when inotifying files and symlinks themselves from inotifywait command line (e.g. inotifywait -m file symlink or inotifywait symlink file) output will show access to which one is first in command line (regardless which, file of symlink is accessed). inotifywait does not support IN_DONT_FOLLOW -- which, when I tried programmatically just makes one see access to file (which may, or may not, be what one expects...) regardless of order in command line

Solution 5

While it might not give you enough control (yet?) I have written a program, which at least partially fulfills your needs, using the linux-kernel's fanotify and unshare to monitor only files modified (or read) by a specific process and its children. Compared to strace, it is quite fast (;

It can be found on https://github.com/tycho-kirchner/shournal

Example on the shell:

$ shournal -e sh -c 'echo hi > foo1; echo hi2 > foo2'
$ shournal -q --history 1
  # ...
  Written file(s):                                                                                                                                                                              
 /tmp/foo1 (3 bytes) Hash: 15349503233279147316                                                                                                                                             
 /tmp/foo2 (4 bytes) Hash: 2770363686119514911    
Share:
81,392

Related videos on Youtube

Ole Tange
Author by

Ole Tange

I am strong believer in free software. I do not believe in Santa, ghosts, fairies, leprechauns, unicorns, goblins, and gods. Author of GNU Parallel.

Updated on September 18, 2022

Comments

  • Ole Tange
    Ole Tange almost 2 years

    time is a brilliant command if you want to figure out how much CPU time a given command takes.

    I am looking for something similar that can list the files being accessed by a program and its children. Either in real time or as a report afterwards.

    Currently I use:

    #!/bin/bash
    
    strace -ff -e trace=file "$@" 2>&1 | perl -ne 's/^[^"]+"(([^\\"]|\\[\\"nt])*)".*/$1/ && print'
    

    but its fails if the command to run involves sudo. It is not very intelligent (it would be nice if it could only list files existing or that had permission problems or group them into files that are read and files that are written). Also strace is slow, so it would be good with a faster choice.

    • Gilles 'SO- stop being evil'
      Gilles 'SO- stop being evil' almost 13 years
      Given your use of strace, I assume you're specifically interested in Linux. Correct?
    • Ole Tange
      Ole Tange almost 13 years
      Linux is my primary concern.
    • Erik Aas
      Erik Aas over 4 years
      Interesting side effect: when running this code with $@ equal to "firefox", when I'm already running firefox, a new firefox process spawns.
  • Ole Tange
    Ole Tange almost 13 years
    But it only gives me a snapshot. What I need is what files it tried to access. Think of the situation where a program refuses to start because it says "Missing file". How do I figure out what file it was looking for?
  • David Given
    David Given about 8 years
    LD_PRELOAD also won't work on static binaries.
  • xeruf
    xeruf almost 6 years
    thanks! strace's output is absolutely unreadable. I don't know where to find the docs though - it would be nice if it had a -h/--help option. I'd also appreciate an option that only shows file edits, not accesses.
  • Ole Tange
    Ole Tange almost 6 years
    @Xerus Clone gitlab.com/ole.tange/tangetools and run make && sudo make install. Then you can run man tracefile.
  • Ole Tange
    Ole Tange almost 5 years
    "For me it gave much less matches than my own" Can you share an example of tracefile missing a file access?
  • Tomi Ollila
    Tomi Ollila almost 5 years
    I am not sure what you'te exactly asking :)... If I try to look files inside /path/to/traced/directory/ I see OPEN in inotify output... BUT stat(1) ing the files I seem to get no results in the few cases I tried (I wonder why, is some caching hiding directory content reading from view)
  • Tomi Ollila
    Tomi Ollila almost 5 years
    I'm commenting the fanotify post below (I have only 21 reputation, although I've had account for more than a decade; requiring 50 for commenting has always been obstacle for me...) -- fanotify is good stuff, but cannot go around the symlink dereference issue (i.e. in case of symlinks, the final file accessed is found by reading /proc/self/fd/<fd> .. anyway +1:ing the answer :D
  • Jeff Schaller
    Jeff Schaller almost 4 years
    "Also strace is slow, so it would be good with a faster choice"
  • mazunki
    mazunki almost 4 years
    To color only the filenames: strace subl 2>&1 | grep 'openat' | grep -e '".*"' --color=auto