What can cause exec to fail? What happens next?

c linux unix exec

34,685

Solution 1

From the exec(3) man page:

The execl(), execle(), execlp(), execvp(), and execvP() functions may fail and set errno for any of the errors specified for the library functions execve(2) and malloc(3).

The execv() function may fail and set errno for any of the errors specified for the library function execve(2).

And then from the execve(2) man page:

ERRORS

Execve() will fail and return to the calling process if:

[E2BIG] - The number of bytes in the new process's argument list is larger than the system-imposed limit. This limit is specified by the sysctl(3) MIB variable KERN_ARGMAX.

[EACCES] - Search permission is denied for a component of the path prefix.

[EACCES] - The new process file is not an ordinary file.

[EACCES] - The new process file mode denies execute permission.

[EACCES] - The new process file is on a filesystem mounted with execution disabled (MNT_NOEXEC in <sys/mount.h>).

[EFAULT] - The new process file is not as long as indicated by the size values in its header.

[EFAULT] - Path, argv, or envp point to an illegal address.

[EIO] - An I/O error occurred while reading from the file system.

[ELOOP] - Too many symbolic links were encountered in translating the pathname. This is taken to be indicative of a looping symbolic link.

[ENAMETOOLONG] - A component of a pathname exceeded {NAME_MAX} characters, or an entire path name exceeded {PATH_MAX} characters.

[ENOENT] - The new process file does not exist.

[ENOEXEC] - The new process file has the appropriate access permission, but has an unrecognized format (e.g., an invalid magic number in its header).

[ENOMEM] - The new process requires more virtual memory than is allowed by the imposed maximum (getrlimit(2)).

[ENOTDIR] - A component of the path prefix is not a directory.

[ETXTBSY] - The new process file is a pure procedure (shared text) file that is currently open for writing or reading by some process.

malloc() is a lot less complicated, and uses only ENOMEM. From the malloc(3) man page:

If successful, calloc(), malloc(), realloc(), reallocf(), and valloc() functions return a pointer to allocated memory. If there is an error, they return a NULL pointer and set errno to ENOMEM.

Solution 2

The problem with handling exec failure is that usually exec is performed in a child process, and you want to do the error handling in the parent process. But you can't just exit(errno) because (1) you don't know if error codes fit in an exit code, and (2), you can't distinguish between failure to exec and failure exit codes from the new program you exec.

The best solution I know is using pipes to communicate the success or failure of exec:

Before forking, open a pipe in the parent process.
After forking, the parent closes the writing end of the pipe and reads from the reading end.
The child closes the reading end and sets the close-on-exec flag for the writing end.
The child calls exec.
If exec fails, the child writes the error code back to the parent using the pipe, then exits.
The parent reads eof (a zero-length read) if the child successfully performed exec, since close-on-exec made successful exec close the writing end of the pipe. Or, if exec failed, the parent reads the error code and can proceed accordingly. Either way, the parent blocks until the child calls exec.
The parent closes the reading end of the pipe.

Solution 3

What you do after the exec() call returns depends on the context - what the program is supposed to do, what the error is, and what you might be able to do to work around the problem.

One source of trouble could be that you specified a simple program name instead of a pathname; maybe you could retry with execvp(), or convert the command into an invocation of sh -c 'what you originally specified'. Whether any of these is reasonable depends on the application. If there are major security issues involved, probably you don't try again.

If you specified a pathname and there is a problem with that (ENOTDIR, ENOENT, EPERM), then you may not have any sensible fallback, but you can report the error meaningfully.

In the old days (10+ years ago), some systems did not support the '#!' shebang notation, and if you were not sure whether you were executing an executable or a shell script, you tried it as an executable and then retried it as a shell script. That might or might not work if you were running a Perl script, but in those days, you wrote your Perl scripts to detect that they were being run by a shell and to re-exec themselves with Perl. Fortunately, those days are mostly over.

To the extent possible, it is important to ensure that the process reports the problem so that it can be traced - writing its message to a log file or just to stderr (or maybe even syslog()), so that those who have to work out what went wrong have more information to help them other than the hapless end user's report "I tried X and it didn't work". It is crucial that if nothing works, then the exit status is not 0 as that indicates success. Even that might be ignored - but you did what you could.

Solution 4

Other than just panicking, you could take a decision based on errno's value.

Solution 5

Exec should always succeed (except for shells, e.g. if the user entered a bogus command).

If exec does fail, it indicates:

a "fault" with the program (missing or bad component, wrong pathname, bad memory, ...), or
a serious system error (out of memory, too many processes, disk fault, ...)

For any serious error, the normal approach is to write the error message on stderr, then exit with a failure code. Almost all of the standard tools do this. For exec:

execl("bork", "bork", NULL);
perror("failed: exec");
exit(127);

The shell does that, too (more or less).

Normally if a child process fails, the parent has failed too and should exit. It does not matter whether the child failed in exec, or while running the program. If exec failed, it does not matter why exec failed. If the child process failed for any reason, the calling process is in trouble and needs to stop.

Don't waste lots of time trying to anticipate all possible error conditions. Don't write code that tries to handle each error code in the best possible way. You'll just bloat the code, and introduce many new bugs. If your program is broken, or it's being abused, it should simply fail. If you force it to continue, worse trouble will come of that.

For example, if the system is out of memory and thrashing swap, we don't want to cycle over and over trying to run a process; it would just make the situation worse. If we get a filesystem error, we don't want to continue running on that filesystem; it might make the corruption worse. If the program was installed wrongly, or has a bug, or has memory corruption, we want to stop as soon as possible, before that broken program does some real damage (such as sending a corrupted report to a client, trashing a database, ...).

One possible alternative: a failing process might call for help, pause itself (SIGSTOP), then retry the operation if told to continue. This could help when the system is out of memory, or disks are full, or perhaps even if there is a fault in the program. Few operations are so expensive and important that this would be worthwhile.

If you're making an interactive GUI program, try to do it as a thin wrapper over reusable command-line tools (which exit if something goes wrong). Every function in your program should be accessible through the GUI, through the command-line, and as a function call. Write your functions. Write a few tools to make commmand-line and GUI wrappers for any function. Use sub-processes too.

If you are making a truly critical system, such as a controller for a nuclear power station, or a program to predict tsunamis, then what are you doing reading my dumb advice? Critical systems should not depend entirely on computers or software. There needs to be a 'manual override', with someone to drive it. Especially, do not attempt to build a critical system on MS Windows; that is like building sandcastles underwater.

View more solutions

34,685

pythonic metaphor

Updated on July 09, 2022

Comments

pythonic metaphor almost 2 years

What are the reasons that an exec (execl,execlp, etc.) can fail? If you make a call to exec and it returns, are there any best practices other than just panicking and calling exit?
- bta over 13 years
  
  Panicking is rarely the best practice.
- Jonathan Leffler over 13 years
  
  @bta: but a good panic can be remarkably therapeutic, if not cathartic.
- Aidan Cully over 13 years
  
  Honest question, and I don't mean this to be taken badly... How did you learn to program Linux? Whatever mechanism you used, it should have taught you to look at man pages for questions like this...
- Jonathan Leffler over 13 years
  
  @Aidan: the man pages I've seen don't really cover best practices for what to do if the exec() fails.
- spartygw over 3 years
  
  Not only do the man pages not cover best practices, they really don't do diddly squat for enumerating the possible values for errno. Quote: may fail and set errno for any of the errors specified for the library functions execve(2) and malloc(3). Yeah, ok...so I'm reading the man page for execve and it refers to itself without any specification. Real useful.
Dana the Sane over 13 years

Also, I'll consider grouping the ERRNO's together by some scheme so you don't have to handle them all individually. It may also be helpful to keep track of the codes that come up during testing and deployment so you know whether the scope of your error handling is reasonable.
user877329 almost 8 years

I have some problems when standard streams are redirected. The parent process hangs when waiting for the write end to be closed. Everything works if there is no redirection going on.
R.. GitHub STOP HELPING ICE almost 8 years

Normally this means the parent has its own fd for the write end of the pipe due to failure to close it after forking.
user877329 almost 8 years

It appears that read does not return until the child process has finished.
R.. GitHub STOP HELPING ICE almost 8 years

Did you forget to set the close-on-exec flag on the fd in the child before performing exec?
user877329 almost 8 years

No. It is set fcntl(exec_error_write_end,FD_CLOEXEC). The call succeeds, yet it has no effect.
R.. GitHub STOP HELPING ICE almost 8 years

Read the documentation for fcntl. That's not how you use it, and unfortunately the variadic API prevents the compiler from being able to tell you you're using it wrong...
frankster over 5 years

This is quite an opinionated answer.