waitpid - WIFEXITED returning 0 although child exited normally

25,742

On Unix and Linux systems, the status returned from wait or waitpid (or any of the other wait variants) has this structure:

bits   meaning

0-6    signal number that caused child to exit,
       or 0177 if child stopped / continued
       or zero if child exited without a signal

 7     1 if core dumped, else 0

8-15   low 8 bits of value passed to _exit/exit or returned by main,
       or signal that caused child to stop/continue

(Note that Posix doesn't define the bits, just macros, but these are the bit definitions used by at least Linux, Mac OS X/iOS, and Solaris. Also note that waitpid only returns for stop events if you pass it the WUNTRACED flag and for continue events if you pass it the WCONTINUED flag.)

So a status of 11 means the child exited due to signal 11, which is SIGSEGV (again, not Posix but conventionally).

Either your program is passing invalid arguments to execv (which is a C library wrapper around execve or some other kernel-specific call), or the child runs differently when you execv it and when you run it from the shell or gdb.

If you are on a system that supports strace, run your (parent) program under strace -f to see whether execv is causing the signal.

Share:
25,742
Andreas Grapentin
Author by

Andreas Grapentin

Updated on June 04, 2020

Comments

  • Andreas Grapentin
    Andreas Grapentin almost 4 years

    I have been writing a program that spawns a child process, and calls waitpid to wait for the termination of the child process. The code is below:

      // fork & exec the child
      pid_t pid = fork();
      if (pid == -1)
        // here is error handling code that is **not** triggered
    
      if (!pid)
        {
          // binary_invocation is an array of the child process program and its arguments
          execv(args.binary_invocation[0], (char * const*)args.binary_invocation);
          // here is some error handling code that is **not** triggered
        }
      else
        {
          int status = 0;
          pid_t res = waitpid(pid, &status, 0);
    
          // here I see pid_t being a positive integer > 0
          // and status being 11, which means WIFEXITED(status) is 0.
          // this triggers a warning in my programs output.
        }
    

    The manpage of waitpid states for WIFEXITED:

    WIFEXITED(status)
        returns  true  if  the child terminated normally, that is, by calling exit(3) or
        _exit(2), or by returning from main().
    

    Which I intepret to mean it should return an integer != 0 on success, which is not happening in the execution of my program, since I observe WIFEXITED(status) == 0

    However, executing the same program from the command line results in $? == 0, and starting from gdb results in:

    [Inferior 1 (process 31934) exited normally]
    

    The program behaves normally, except for the triggered warning, which makes me think something else is going on here, that I am missing.

    EDIT:
    as suggested below in the comments, I checked if the child is terminated via segfault, and indeed, WIFSIGNALED(status) returns 1, and WTERMSIG(status) returns 11, which is SIGSEGV.

    What I don't understand though, is why a call via execv would fail with a segfault while the same call via gdb, or a shell would succeed?

    EDIT2:
    The behaviour of my application heavily depends on the behaviour of the child process, in particular on a file the child writes in a function declared __attribute__ ((destructor)). After the waitpid call returns, this file exists and is generated correctly which means the segfault occurs somewhere in another destructor, or somewhere outside of my control.

    • rob mayoff
      rob mayoff about 10 years
      Status 11 means the child received signal 11, SIGSEGV. A non-signal exit is 256 times the low 8 bits the value passed to _exit or exit or returned by main. If you are on a platform (like Linux) that has strace, use it (with the -f flag) to see whether the child gets the signal due to a bad call to execv, or after a successful exec.
    • Andreas Grapentin
      Andreas Grapentin about 10 years
      @robmayoff you are right! I was not aware of the fact that the lower byte of the status variable holds exit status of the spawned process, as well as the signal id. thanks for pointing that out!
    • Dabo
      Dabo about 10 years
      "What I don't understand though, is why a call via execv would fail with a segfault" ... how does args.binary_invocation look like ? Where it comes from, you create it ?
    • Andreas Grapentin
      Andreas Grapentin about 10 years
      @Dabo yes, args.binary_invocation is a NULL terminated array of char pointers that are the name of the child application and its arguments. I have verified that the array is correct.
    • Andreas Grapentin
      Andreas Grapentin about 10 years
      @robmayoff I found the reason for the segfault thanks to your comment - the issue was that my application altered the environment for the child process, which I did not reproduce in my independent tests, which is why the segfault was hidden outside of the exec environment. I would like to give you credit for that, because you sent me in the right direction. So, if you would like to make your comments into an answer, I would gladly accept it :)