How to know where a program is stuck in linux?

143,673

Solution 1

My first step would be to run strace on the process, best

 strace -s 99 -ffp 12345

if your process ID is 12345. This will show you all syscalls the program is doing. How to strace a process tells you more.

If you insist on getting a stacktrace, google tells me the equivalent is pstack. But as I do not have it installed I use gdb:

 tweedleburg:~ # sleep 3600 &
 [2] 2621
 tweedleburg:~ # gdb
 (gdb) attach 2621
 (gdb) bt
 #0  0x00007feda374e6b0 in __nanosleep_nocancel () from /lib64/libc.so.6
 #1  0x0000000000403ee7 in ?? ()
 #2  0x0000000000403d70 in ?? ()
 #3  0x000000000040185d in ?? ()
 #4  0x00007feda36b8b05 in __libc_start_main () from /lib64/libc.so.6
 #5  0x0000000000401969 in ?? ()
 (gdb)

Solution 2

Two answers have been given for finding the stack trace of a program (remember to install debugging symbols first!). If you want to find out where a system call got stuck, examine /proc/PID/stack, which lists the kernel stack. Example:

$ cat /proc/self/stack
[<ffffffff81012b72>] save_stack_trace_tsk+0x22/0x40
[<ffffffff81213abe>] proc_pid_stack+0x8e/0xe0
[<ffffffff81214960>] proc_single_show+0x50/0x90
[<ffffffff811cd970>] seq_read+0xe0/0x3e0
[<ffffffff811a6a84>] vfs_read+0x94/0x180
[<ffffffff811a7729>] SyS_read+0x49/0xb0
[<ffffffff81623ad2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Solution 3

On most unix systems, you can use GDB.

gdb -batch -ex bt -p 1234

There's also pstack (not a standard utility, you'll probably have to install it manually). It looks like an equivalent of AIX's procstack. But on my Debian wheezy amd64, it seems to always error out. On i386, for a program compiled without debugging symbols, it doesn't print any symbol, not even from libraries for which debugging symbols are available.

You can also use strace -p1234 to see the system calls performed by the process.

Solution 4

You want eu-stack from elfutils. For example,

$ sudo eu-stack -id -p $$
PID 9189 - process
TID 9189:
#0  0x00007fd36c69e687 __GI___waitpid
#1  0x000055ba004c0c19
#2  0x000055ba004c234b wait_for
#3  0x000055ba004b1b64 execute_command_internal
#4  0x000055ba004b1bf2 execute_command
#5  0x000055ba0049c274 reader_loop
#6  0x000055ba0049ac7f main
#7  0x00007fd36c5dbb97 __libc_start_main
#8  0x000055ba0049b54a _start

One can write one-liners for more complex stuff. For example, to list all thread stacks for Google Chrome one can do

sudo true
pidof chrome | sudo xargs -rn1 timeout 1s eu-stack -id -p

The timeout is used to prevent stalling on processes in uninterruptible sleep. Set timeout longer if you're willing to wait longer to get a stack.

And if you have busy system where only some of the threads or processes are locked, one can take a snapshot of current process states like this

sudo true
pidof apache2 | sort -n | sudo xargs -rn1 timeout 1s eu-stack -id -p > snap1
sleep 10s
pidof apache2 | sort -n | sudo xargs -rn1 timeout 1s eu-stack -id -p > snap2
diff -u snap1 snap2

and you can identify stacks that did not change during 10 seconds. If you get an errors such as

eu-stack: dwfl_linux_proc_report pid 25062: No such file or directory
eu-stack: dwfl_linux_proc_report pid 25068: Exec format error

it just means that the process 25062 exited after pidof listed it but until eu-stack had queried its state and process 25068 was tearing down at the same time the stack trace was being built and as a result the stack building failed while reading the executable pointed via proc filesystem.

Solution 5

pstack will print a stack trace of a running process for you. gstack is a common equivalent if pstack is not available / doesn't support your distro/arch.

Share:
143,673

Related videos on Youtube

Pavan Manjunath
Author by

Pavan Manjunath

Software Engineer @ Lyft, SF Bay Area

Updated on September 18, 2022

Comments

  • Pavan Manjunath
    Pavan Manjunath over 1 year

    I am running the following command on my ubuntu server

    root@slot13:~# lxc-stop --name pavan --logfile=test1.txt --logpriority=trace
    

    It seems to hang indefinitely. Whenever this happened on AIX, I simply used to get the PID of the offending process and say

    $ procstack <pid_of_stuck_process>
    

    and it used to show the whole callstack of the process. Is there any equivalent of procstack in linux/ubuntu?

  • Pavan Manjunath
    Pavan Manjunath over 9 years
    pstack seems to be throwing errors everytime. Something like root@sarai:~# pstack 6312 6312: /usr/bin/python /usr/bin/twistd -n --uid=maas --gid=maas --pidfile=/run/maas-pserv.pid --logfile=/dev/null maas-pserv --conf... (No symbols found) crawl: Input/output error Error tracing through process 6312
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 9 years
    @PavanManjunath pstack seems to be broken on amd64, I observe the same on Debian wheezy amd64.
  • DocSalvager
    DocSalvager over 9 years
    NOTE: per man7.org/linux/man-pages/man5/proc.5.html ... "/proc/[pid]/stack (since Linux 2.6.29) This file provides a symbolic trace of the function calls in this process's kernel stack. This file is provided only if the kernel was built with the CONFIG_STACKTRACE configuration option."
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    pstack's README does say it's only for 32bit, ELF, x86, GNU.
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    Also note access to that file is limited to superuser or the process itself (at least on the systems I've tried).
  • Stéphane Chazelas
    Stéphane Chazelas over 9 years
    See also /proc/pid/wchan and the WCHAN column in ps -l output or ps -o wchan for the top of that stack. (the ps part works on many Unices, but is not always (often) useful in itself).
  • Hi-Angel
    Hi-Angel almost 4 years
    Not gonna help if the program hanged on a system call. The answer by Lekensteyn works.
  • Mikko Rantalainen
    Mikko Rantalainen about 3 years
    @Aktau: Do you know if the process was entering uninterruptible sleep often but didn't actually hang there for overly long? In my experience you cannot get stack for true uninterruptible sleep at least with Linux kernel 4.15.x using the method I describe above.
  • Aktau
    Aktau about 3 years
    I was on Linux 5.7. I'm "reasonably" sure it wasn't flipping around in states because of the nature of the stack trace that I uncovered and the code I then read. There is no retry loop around that and it made total sense why it would be in uninterruptible sleep in exactly that path.