How to know where a program is stuck in linux?
Solution 1
My first step would be to run strace on the process, best
strace -s 99 -ffp 12345
if your process ID is 12345. This will show you all syscalls the program is doing. How to strace a process tells you more.
If you insist on getting a stacktrace, google tells me the equivalent is pstack. But as I do not have it installed I use gdb:
tweedleburg:~ # sleep 3600 &
[2] 2621
tweedleburg:~ # gdb
(gdb) attach 2621
(gdb) bt
#0 0x00007feda374e6b0 in __nanosleep_nocancel () from /lib64/libc.so.6
#1 0x0000000000403ee7 in ?? ()
#2 0x0000000000403d70 in ?? ()
#3 0x000000000040185d in ?? ()
#4 0x00007feda36b8b05 in __libc_start_main () from /lib64/libc.so.6
#5 0x0000000000401969 in ?? ()
(gdb)
Solution 2
Two answers have been given for finding the stack trace of a program (remember to install debugging symbols first!). If you want to find out where a system call got stuck, examine /proc/PID/stack
, which lists the kernel stack. Example:
$ cat /proc/self/stack
[<ffffffff81012b72>] save_stack_trace_tsk+0x22/0x40
[<ffffffff81213abe>] proc_pid_stack+0x8e/0xe0
[<ffffffff81214960>] proc_single_show+0x50/0x90
[<ffffffff811cd970>] seq_read+0xe0/0x3e0
[<ffffffff811a6a84>] vfs_read+0x94/0x180
[<ffffffff811a7729>] SyS_read+0x49/0xb0
[<ffffffff81623ad2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Solution 3
On most unix systems, you can use GDB.
gdb -batch -ex bt -p 1234
There's also pstack
(not a standard utility, you'll probably have to install it manually). It looks like an equivalent of AIX's procstack
. But on my Debian wheezy amd64, it seems to always error out. On i386, for a program compiled without debugging symbols, it doesn't print any symbol, not even from libraries for which debugging symbols are available.
You can also use strace -p1234
to see the system calls performed by the process.
Solution 4
You want eu-stack
from elfutils
. For example,
$ sudo eu-stack -id -p $$
PID 9189 - process
TID 9189:
#0 0x00007fd36c69e687 __GI___waitpid
#1 0x000055ba004c0c19
#2 0x000055ba004c234b wait_for
#3 0x000055ba004b1b64 execute_command_internal
#4 0x000055ba004b1bf2 execute_command
#5 0x000055ba0049c274 reader_loop
#6 0x000055ba0049ac7f main
#7 0x00007fd36c5dbb97 __libc_start_main
#8 0x000055ba0049b54a _start
One can write one-liners for more complex stuff. For example, to list all thread stacks for Google Chrome one can do
sudo true
pidof chrome | sudo xargs -rn1 timeout 1s eu-stack -id -p
The timeout
is used to prevent stalling on processes in uninterruptible sleep. Set timeout longer if you're willing to wait longer to get a stack.
And if you have busy system where only some of the threads or processes are locked, one can take a snapshot of current process states like this
sudo true
pidof apache2 | sort -n | sudo xargs -rn1 timeout 1s eu-stack -id -p > snap1
sleep 10s
pidof apache2 | sort -n | sudo xargs -rn1 timeout 1s eu-stack -id -p > snap2
diff -u snap1 snap2
and you can identify stacks that did not change during 10 seconds. If you get an errors such as
eu-stack: dwfl_linux_proc_report pid 25062: No such file or directory
eu-stack: dwfl_linux_proc_report pid 25068: Exec format error
it just means that the process 25062 exited after pidof
listed it but until eu-stack
had queried its state and process 25068 was tearing down at the same time the stack trace was being built and as a result the stack building failed while reading the executable pointed via proc
filesystem.
Solution 5
pstack
will print a stack trace of a running process for you. gstack
is a common equivalent if pstack
is not available / doesn't support your distro/arch.
Related videos on Youtube
Comments
-
Pavan Manjunath over 1 year
I am running the following command on my ubuntu server
root@slot13:~# lxc-stop --name pavan --logfile=test1.txt --logpriority=trace
It seems to hang indefinitely. Whenever this happened on AIX, I simply used to get the PID of the offending process and say
$ procstack <pid_of_stuck_process>
and it used to show the whole callstack of the process. Is there any equivalent of
procstack
in linux/ubuntu? -
Pavan Manjunath over 9 years
pstack
seems to be throwing errors everytime. Something likeroot@sarai:~# pstack 6312 6312: /usr/bin/python /usr/bin/twistd -n --uid=maas --gid=maas --pidfile=/run/maas-pserv.pid --logfile=/dev/null maas-pserv --conf... (No symbols found) crawl: Input/output error Error tracing through process 6312
-
Gilles 'SO- stop being evil' over 9 years@PavanManjunath
pstack
seems to be broken on amd64, I observe the same on Debian wheezy amd64. -
DocSalvager over 9 yearsNOTE: per man7.org/linux/man-pages/man5/proc.5.html ... "/proc/[pid]/stack (since Linux 2.6.29) This file provides a symbolic trace of the function calls in this process's kernel stack. This file is provided only if the kernel was built with the CONFIG_STACKTRACE configuration option."
-
Stéphane Chazelas over 9 yearspstack's README does say it's only for 32bit, ELF, x86, GNU.
-
Stéphane Chazelas over 9 yearsAlso note access to that file is limited to superuser or the process itself (at least on the systems I've tried).
-
Stéphane Chazelas over 9 yearsSee also
/proc/pid/wchan
and theWCHAN
column inps -l
output orps -o wchan
for the top of that stack. (theps
part works on many Unices, but is not always (often) useful in itself). -
Hi-Angel almost 4 yearsNot gonna help if the program hanged on a system call. The answer by Lekensteyn works.
-
Mikko Rantalainen about 3 years@Aktau: Do you know if the process was entering uninterruptible sleep often but didn't actually hang there for overly long? In my experience you cannot get stack for true uninterruptible sleep at least with Linux kernel 4.15.x using the method I describe above.
-
Aktau about 3 yearsI was on Linux 5.7. I'm "reasonably" sure it wasn't flipping around in states because of the nature of the stack trace that I uncovered and the code I then read. There is no retry loop around that and it made total sense why it would be in uninterruptible sleep in exactly that path.