What to do if a posix close call fails?

c linux unix error-handling posix

12,120

Solution 1

In practice, close should never be retried on error, and the fd you passed to close is always invalid (closed) after close returns, regardless of whether an error occurred. In some cases, an error may indicate that data was lost (certain NFS setups) or unusual hardware conditions for devices (e.g. tape could not be rewound), so you may want to be cautious to avoid data loss, but you should never attempt to close the fd again.

In theory, POSIX was unclear in the past as to whether the fd remains open when close fails with EINTR, and systems disagreed. Since it's important to know the state (otherwise you have either fd leaks or double-close bugs which are extremely dangerous in multithreaded programs), the resolution to Austin Group issue #529 specified the behavior strictly for future versions of POSIX, that EINTR means the fd remains open. This is the right behavior consistent with the definition of EINTR elsewhere, but Linux refuses to accept it. (FWIW there's an easy workaround for this that's possible at the libc syscall wrapper level; see glibc PR #14627.) Fortunately it never arises in practice anyway.

Some related questions you might find informative:

Solution 2

First of all: EINTR means exactly that: System call was interrupted, if this happens on a close() call, there is exactly nothing you can do.

Apart from maybe keeping track of the fact, that if the fd belonged to a file, this file is possibly corrupt, there is not much you can do about errors on close() at all - depending on the return value. AFAIK the only case, where a close can be retried is on EBUSY, but I have yet to see that.

So:

Not checking the result of close() might mean that you miss file corruption, especially truncation.
Depending on the error, most of the time you can do nothing - a failed close() just means something has gone awfully wrong outside the scope of your application.

12,120

Ilya Popov

Updated on June 07, 2022

Comments

Ilya Popov almost 2 years

On my system (Ubuntu Linux, glibc), man page of a close call specifies several error return values it can return. It also says

Not checking the return value of close() is a common but nevertheless serious programming error.

and at the same time

Note that the return value should only be used for diagnostics. In particular close() should not be retried after an EINTR since this may cause a reused descriptor from another thread to be closed.

So I am not allowed to ignore the return value nor to retry the call.

Given that, how shall I handle the close() call failure?

If the error happened when I was writing something to the file, I am probably supposed to try to write the information somewhere else to avoid the data loss.

If I was only reading the file, can I just log the failure and continue the program pretending nothing happened? Are there any caveats, leak of file descriptors or whatever else?
- PSkocik over 8 years
  
  Thought about this too. (unix.stackexchange.com/questions/231677/…) Close failures make sense in certain cases (e.g., faulty disk syncs) but I think it should be safe to assume close won't fail in some other cases. Like closing an instance of a duplicated filedescriptor which isn't the last instance pointing to the same physical file or closing a pipe, because those would be basically kernel bugs, but I would love to hear a more enlightened answer.
- Michael Burr over 8 years
  
  FWIW, Raymond Chen's take on this general type of situation: blogs.msdn.com/b/oldnewthing/archive/2008/01/07/7011066.aspx
- Nominal Animal over 8 years
  
  Whatever you do, always let the user know. Just "logging" it into some internal log file nobody ever looks at is not enough; you'll want the user to know that something hinky is happening. For GUI applications, I'd pop up a modal dialog box. For command line applications, I'd print a warning to standard error. For services, the log file suffices. If close() error happens after writing to a file, I'd abort exactly the same way I would if I encountered a write error during writing to the file.
Luis Colorado over 8 years

so if on EINTR the fd remains open, ¿does it continue to be an error to try to close it again? What the hell should we do in that case? Closing an invalid descriptor should return EINVAL, so it's not a problem to try it again (despite of what manual says about other threads open file descriptors without knowledge, what should happen if other thread just opens a descriptor in case you are in the middle of a IO redirection? ---this is a two phase procedure) Hmmm... arent we trying to force too much the machine?
R.. GitHub STOP HELPING ICE over 8 years

@LuisColorado: Per the (amended) standard, on EINTR the fd remains open. However, Linux does not honor this and glibc does not work around the failure to honor it. See the links in my answer. Fortunately EINTR does not happen on close on Linux in any real-world situations I'm aware of, anyway.
Luis Colorado over 8 years

EINTR means the syscall was interrupted and will not be retried, so it was not executed at all. The system did not execute, so the file descriptor is not closed, and must be closed. What about the atomicity of system calls? If the system call was not performed and we cannot close the descriptor again, what about repeating this process a bunch of times with a leak of one descriptor each? Normally, the implementation of calls that block (like close(), but not for normal files) just undoes what has been done and makes a longjmp(3) to the saved context, just to conserve atomicity.
Luis Colorado over 8 years

EINTR can happen on close(2) on NFS filesystems, network connections (well, on networking TCP/IP sockets the kernel does actually the work, but not sure on other protocols), as well as on every device that needs handshaking to be closed (in the last close thing, depending on the device driver to return from close) And linux is not the only POSIX system that exist.
Eugen Rieck over 8 years

@LuisColorado Retrying close() after EINTR is not a good idea on Linux. It might close a different fd.
R.. GitHub STOP HELPING ICE over 8 years

@LuisColorado: At least on Linux, the release file op cannot cause EINTR (its return value is ignored), but flush can. See the question I linked. This probably prevents it from happening in practice, but I may be mistaken. I agree there are other systems to worry about. Unless you have posix_close available (added for POSIX-future) the only fully-safe thing to do is to mask all interrupting signals whenever you call close... :-(
Luis Colorado over 8 years

Device driver writers are always warned about close primitive, as it must do its work indepently of the hardware conditions, returning a stable environment. This means device driver writers must sometimes to leave resources locked in memory to wait for the device to respond or mark it as unusable until some condition, but the process has disconnected time ago. Think that device drivers aren't normally written by the same guys that write the operating system. I'm not talking specifically of linux, but for posix, which means a lot of different systems.
Luis Colorado over 8 years

then, close(2)ing and dup(2)ing to redirect output is not also a good idea on linux, as you can get the hole filled by other thread. I understood your message the first time. Atomicity is a hard problem indeed, but don't think you are telling the truth an I not. At this moment, nobody has written a safe thread to redirect a file descriptor and the whole system continues to work without pain. Perhaps, considering that close(2) is not thread safe and locking the context for the whole process will allow you to retry a close(2) call until you are safe. Or leak announces.
Luis Colorado over 8 years

And think that if the result of close(2) means that you have not closed the descriptor, it's impossible for it to have moved elsewhere. You will for sure close the same file descriptor, because as it has not been closed, its place cannot be filled by another different descriptor (even in multithread environments) What is indeed an error is to reclose(2) it without having checked the return code from the first, think it is not closed and redo the syscall on a possibly closed and reopened file descriptor.