Mutex in shared memory when one user crashes?

11,271

Solution 1

If you're working in Linux or something similar, consider using named semaphores instead of (what I assume are) pthreads mutexes. I don't think there is a way to determine the locking PID of a pthreads mutex, short of building your own registration table and also putting it in shared memory.

Solution 2

It seems that the exact answer has been provided in the form of robust mutexes.

According to POSIX, pthread mutexes can be initialised "robust" using pthread_mutexattr_setrobust(). If a process holding the mutex then dies, the next thread to acquire it will receive EOWNERDEAD (but still acquire the mutex successfully) so that it knows to perform any cleanup. It then needs to notify that the acquired mutex is again consistent using pthread_mutex_consistent().

Obviously you need both kernel and libc support for this to work. On Linux the kernel support behind this is called "robust futexes", and I've found references to userspace updates being applied to glibc HEAD.

In practice, support for this doesn't seem to have filtered down yet, in the Linux world at least. If these functions aren't available, you might find pthread_mutexattr_setrobust_np() there instead, which as far as I can gather appears to be a non-POSIX predecessor providing the same semantics. I've found references to pthread_mutexattr_setrobust_np() both in Solaris documentation and in /usr/include/pthread.h on Debian.

The POSIX spec can be found here: http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_setrobust.html

Solution 3

How about file-based locking (using flock(2))? These are automatically released when the process holding it dies.

Demo program:

#include <stdio.h>
#include <time.h>
#include <sys/file.h>

void main() {
  FILE * f = fopen("testfile", "w+");

  printf("pid=%u time=%u Getting lock\n", getpid(), time(NULL));
  flock(fileno(f), LOCK_EX);
  printf("pid=%u time=%u Got lock\n", getpid(), time(NULL));

  sleep(5);
  printf("pid=%u time=%u Crashing\n", getpid(), time(NULL));
  *(int *)NULL = 1;
}

Output (I've truncated the PIDs and times a bit for clarity):

$ ./a.out & sleep 2 ; ./a.out 
[1] 15
pid=15 time=137 Getting lock
pid=15 time=137 Got lock
pid=17 time=139 Getting lock
pid=15 time=142 Crashing
pid=17 time=142 Got lock
pid=17 time=147 Crashing
[1]+  Segmentation fault      ./a.out
Segmentation fault

What happens is that the first program acquires the lock and starts to sleep for 5 seconds. After 2 seconds, a second instance of the program is started which blocks while trying to acquire the lock. 3 seconds later, the first program segfaults (bash doesn't tell you this until later though) and immediately, the second program gets the lock and continues.

Share:
11,271
Vivek
Author by

Vivek

Updated on June 28, 2022

Comments

  • Vivek
    Vivek almost 2 years

    Suppose that a process is creating a mutex in shared memory and locking it and dumps core while the mutex is locked.

    Now in another process how do I detect that mutex is already locked but not owned by any process?

  • Duck
    Duck over 14 years
    Not in all resources. If OP uses the POSIX semaphore as suggested and the process holding the lock dies the value of the semaphore will not revert, potentially deadlocking the other processes.
  • Duck
    Duck over 14 years
    Agree in general with the semaphore recommendation but POSIX semaphores don't really solve the problem since they also don't record the PID of the locking process nor unlock upon untimely death. Rusty and clumsy though they may be SysV semaphores do keep track of PIDs and can revert when called with the SEM_UNDO option.
  • Vivek
    Vivek over 14 years
    I dont think taht will be removed too as either it is file or memory its same thing for both.
  • Wim
    Wim over 14 years
    I don't mean by writing something inside the file (which would indeed be similar), but to use flock(2). When your process dies, the file will be closed automatically, and the lock on it should be released.
  • Joseph Garvin
    Joseph Garvin over 13 years
    I think this is a better answer. I've been using the robust mutex on Solaris so far with success.
  • Jonathan Wakely
    Jonathan Wakely over 11 years
    Robust mutexes are great, but be aware they may not work correctly on GNU/Linux prior to glibc 2.15 if the mutex was created in a parent process which then forks and the child dies while holding the mutex. That bug is fixed in glibc 2.15. If the two processes sharing the mutex are not a parent and child created by forking then robust mutexes work fine even with older glibc versions.