How to find out, why a server hangs, but is still reachable with ping

5,106

Solution 1

I'd leave some light profiling commands logging to files, so you can get an inside look on what went wrong after the fact. For example:

nohup top -b -d 60 >> top.log & # runs every 60 seconds
nohup vmstat 5 >> vmstat.log &
nohup iostat 5 >> iostat.log &

nohup is there so they aren't killed when you lose connection to the server. You can also use screen for that.

A more robust alternative to the last two commands would be to setup sar.

Solution 2

When I have seen issues like this, it usually ends up being a problem with a cron job.

Check your syslog for cron jobs running at the same time of day that the server hangs. Also, check your root crontab (crontab -e) and jobs in /etc/cron.daily for anything that might be responsible.

Share:
5,106

Related videos on Youtube

Martin Schlagnitweit
Author by

Martin Schlagnitweit

Updated on September 18, 2022

Comments

  • Martin Schlagnitweit
    Martin Schlagnitweit almost 2 years

    One of my servers, which runs in a german data center "hangs" every night, but i cant find out why. No errors are found in the /var/log/messages and /var/log/syslog.

    The server responds to ping, but all services are down (ssh, apache, ...). After a reset everything runs normal.

    A hardware test has been performed. It looks like being a software issue.

    • HUB
      HUB about 13 years
      Is it possible to log into local console when the server "hangs"? Examine top output. May be some process just takes almost all CPU time and all network services just get connection timeout.
    • Martin Schlagnitweit
      Martin Schlagnitweit about 13 years
      Its possible to log into a local console, but a little bit complicated. This had been my next step. I now will log top output every minute like Eduardo suggested.
    • Martin Schlagnitweit
      Martin Schlagnitweit about 13 years
      It seems to be a kernel panic. At the local console i can see this. But into which logfiles should this be written?
    • HUB
      HUB about 13 years
      Is there "/var/log/kern.log" file?
    • Martin Schlagnitweit
      Martin Schlagnitweit about 13 years
      In the kern.log file there are some extries like this: Jun 25 14:05:39 solunic kernel: [369632.475072] php-cgi[15194] general protection ip:6914c9 sp:7fffaf0f84d0 error:0 in php5-cgi[400000+6f9000] Today the server crahsed again at 16:00 CET, but the last error in kern.log was one hour before this.
    • Martin Schlagnitweit
      Martin Schlagnitweit about 13 years
      top.log shows at the first line: 1378 mysql 20 0 354m 168m 4592 S 0.6 16.8 43:39.59 mysqld
    • Martin Schlagnitweit
      Martin Schlagnitweit about 13 years
      the last entry of the sar command shows: 00:00:01 cpu %usr %nice %sys %irq %softirq %wait %idle cpu ... 15:50:01 all 10 0 2 0 0 1 87
  • mfinni
    mfinni about 13 years
    I bet you meant to use ">>" instead of ">", right?
  • Eduardo Ivanec
    Eduardo Ivanec about 13 years
    It'd be better on some cases, but it's not necessary - each command keeps running and so opens/overwrites the output files only once at start. But I'll change it just in case, thanks!
  • mfinni
    mfinni about 13 years
    If those keep running after the "symptom" happens, then the output files will get over-written. That's why I'm suggesting the change.
  • mfinni
    mfinni about 13 years
    Oh wait - you mean the commands continue to run in the background, with the output redirected only once? Neat trick.
  • Michael Lowman
    Michael Lowman about 13 years
    -1 for not reading the question
  • Eduardo Ivanec
    Eduardo Ivanec about 13 years
    Well, actually I meant the commands to be executed separately, and yes - they do keep running/cycling by themselves. But I'll add nohup and & because I wasn't that clear, thanks.
  • Martin Schlagnitweit
    Martin Schlagnitweit about 13 years
    An extensive hardware test has been performed by the hosting company.