deadlock detection in multithreaded application

11,228

Solution 1

deadlock detection can be done in higher level, for processes, deadlock detection can be done in Operating System level, for threads, the main thread could detect deadlock among sub-threads, if sub-threads don't require any system-level resources like I/O, etc.

One method to detect deadlock is to construct a resource requirement graph for shared resources. Suppose system created 2 threads Ta and Tb, both Ta and Tb requires resource Ra and Rb. If Ta has acquired Ra, and is requiring Rb, this can be like a directed graph: Ra->Ta->Rb. If Tb is holding Rb, this will be like Ra->Ta->Rb->Tb, there is no deadlock. But, if Tb is holding Rb and requiring Ra, the graph will be like this: Ra->Ta->Rb->Tb->Ra, there constitutes a loop, which means deadlock.

For this specific case, I think it is possible to detect deadlock from logs, like above, say Ta log like this: Ra->Ta->Rb, and Tb log like this: Rb->Tb->Ra, then main thread can check if there is a loop, but this can be complex and expensive, so, the better way is to use synchronization protocol carefully, and prevent deadlock.

Solution 2

Detecting a deadlock is a matter of the upfront design. You should not design your application to detect deadlocks, you should strive to eliminate them from the design through review and test.

Strategies for avoiding/rooting out deadlock in your design

  • static analysis tools (Coverity Prevent, clang's --analyze, etc)
  • testing
  • RAII

Once you've found a deadlock, they're typically painfully easy to investigate. Unlike heap corruption, which can be tricky to track back to its origin, a deadlocked system will remain in the bad state until you come along to rescue it. Attach your debugger and look at the stack traces and the problem should be apparent.

Share:
11,228
Dr. Debasish Jana
Author by

Dr. Debasish Jana

SOreadytohelp More than Twenty-eight years of extensive industry experience in various stages of the software development life cycle. Specifically, involved in project management, software development, software maintenance, and teaching in India and abroad. Well versed with Software Life Cycle and OOAD techniques. Well conversant with software quality process and project execution methodology. Adopted special skills in object-oriented methodology, front-end tool development, mobile application development, enterprise software development as well entire software lifecycle process management. Taught at premier educational institutions in India and abroad. Authored three popular books on C++, Java and Computer Graphics published by PHI Learning. Authored many research papers in peer reviewed national and international conferences and journals. Performed role of Honorary Editor of CSI Communications, monthly technical magazine for CSI members (April 2011 to March 2015). Actively involved in several national and international conferences including EAIT 2006, CSI-2006, CSI-RDHS 2008, EAIT 2011, ReTIS-11, EAIT 2012, CSI-2012, EAIT 2014, ReTIS-15 as spearheading role in Program Committee and Editorial role in the Proceedings. Fellow - IE(I), IETE and Senior Member - IEEE, ACM. Also, Ex-Senior Life member, CSI.

Updated on July 26, 2022

Comments

  • Dr. Debasish Jana
    Dr. Debasish Jana almost 2 years

    I am using multithreaded C++ application using posix threads i.e. threads get created through pthread_create. There are several semaphores that control the synchronization. Semaphores are primarily of two kinds:

    1. mutex semaphores - Critical section code under mutual exclusion between pthread_mutex_lock and pthread_mutex_unlock calls (done by same thread)

    2. Synchronization semaphores - One thread waits calling sem_wait for some tasks to be done, other thread upon completion of the task, signals through sem_post.

    The application seem to unresponsive while running for a longer time during processing of large volume of data.

    Two possibilities that come to my mind:

    1. There is a deadlock, two threads are waiting on one another (cyclic waiting)
    2. The parent process (telnet session) gets timed out resulting in sending a SIGHUP signal to the child process.

    Questions:

    1. Can I detect deadlock through log? For example, on every wait/post, before sem_wait, log as going to wait, after sem_wait acquires the lock, waiting over and after sem_post (which is non blocking anyway), log as signal/post for waiting process -- then if the program appears to hang, check the log to detect any cyclic waiting. Is there any better way suggested?
    2. Run application using nohup so that SIGHUP can be ignored.

    Any other suggestions?