strace: order of <unfinished ...> and <... resumed>

11,547

The system call begins when strace writes the line close(449 <unfinished ...>, and ends when it outputs <... close resumed>.

The close is not interrupted by any other call or signal: the other call is executed by another process, while the kernel is closing your file descriptor.

There is no way to know which is the exact point when the file descriptor is closed; the only thing you know is that it is not closed until the syscall is executed, and that it is closed when the syscall finishes.

Share:
11,547
try-catch-finally
Author by

try-catch-finally

We make our world significant by the courage of our questions and the depth of our answers. (Carl Sagan)

Updated on August 01, 2022

Comments

  • try-catch-finally
    try-catch-finally almost 2 years

    I'm writing a script that analyzes file access traced with strace.

    The trace contains some calls which have been interrupted by another process. strace shows them with <unfinished ...> and <... close resumed> (in case of an interrupted close call) markers.

    [pid 26817] 12:48:22.972737 close(449 <unfinished ...>
    [pid 28708] 12:48:22.972797 fcntl(451, F_SETFD, FD_CLOEXEC <unfinished ...>
    [pid 26817] 12:48:22.972808 <... close resumed> ) = 0 
    

    The process and all its threads have been traced with

    strace -f -tt -p <pid>
    

    The man page is uncertain about when the call has been finished.

    If a system call is being executed and meanwhile another one is being called from a different thread/process then strace will try to preserve the order of those events and mark the ongoing call as being unfinished. When the call returns it will be marked as resumed.

    While I'd assume that, natually the resumed marker will indicate that the call is now finished. I'd like to ask if it is so.

    Can the above trace excerpt be reconstructed to

    A

    [pid 28708] 12:48:22.972797 fcntl(451, F_SETFD, FD_CLOEXEC <unfinished ...>
    [pid 26817] 12:48:22.972808 close(449) = 0
    

    or should it be reconstructed to

    B

    [pid 26817] 12:48:22.972737 close(449) = 0 
    [pid 28708] 12:48:22.972797 fcntl(451, F_SETFD, FD_CLOEXEC <unfinished ...>
    

    The order is crucial here since there may be multiple calls between the unfinished and resumed and one of them might do something with the file that is about to be closed at the moment.

  • Stevel
    Stevel over 4 years
    Keep in mind that "unfinished" does NOT mean "interrupted at the kernel level." Rather, it means that the close() system call of 28708 takes some time (about 71 usec) and during that period, a different process/thread started a new system call - 28708 starts a fcntl(). Since strace can't report the return value (or any modified structure "output" values) until such call completes, it splits the report into two parts: the call initiation with "unfinished" and the call completion with "resumed." This has strace output interleave system call reports, as noted.