How to troubleshoot a hardware problem on linux?

9,968

Solution 1

Try booting memtest86+ from bootable media and see what it says about your memory and memory subsystem integrity.

Also, the last job started might get logged in Cron to /var/log/syslog or /var/log/messages.

If not, and debugging this issue on an ongoing basis, you could set up auditd and a cron job with ps to log system activity and what jobs are running on a continuous basis.

Solution 2

Kernal devices will report problems to dmesg, which may be logged separately as well, or in kern.log.

For serious problems, a POST diagnostics board may be used.

Solution 3

On most linux' today, you should be able to have an MCE log (Machine Check Exception) which may be decoded to find the actual hardware errors (http://freshmeat.net/projects/mcelog/). Also, you may run a Kernel Crash Dump, a kernel that runs the linux kernel you're using daily, and with this capture the incident and debug the cause

Solution 4

Logs are the first place to look, as kmarsh says, but if the logs don't tell much in the case of a serious HW failure, then it doesn't matter what OS you use, it just takes some old school trial and error.

Determine if it is a hardware issue by running a live CD, otherwise it could be a driver issue misdiagnosed as hardware failure.

HW lockups are random, but frequent. I'd start with removing graphics cards (use on-board or backup cards), network cards or (gasp) modems if you have any, one at a time until you pinpoint the culprit. Run with one memory stick at a time (if you have x2) or swap out for other sticks while testing.

Your PSU could also be failing, sometimes adding a new card eats your watts, starving the CPU if your PSU isn't powerful enough, causing random fails.

If nothing else gives a lead, it could be your main board (usually corrosion if it's 2+yrs depending on the humidity where you live) or CPU.

Use software to monitor CPU temperature, overheating can cause lockups too.

After trying everything under the sun, with no luck, it might be time for a new PC ;)

Share:
9,968

Related videos on Youtube

Jack
Author by

Jack

Updated on September 17, 2022

Comments

  • Jack
    Jack almost 2 years

    Just to note I am not having a problem at the moment, but have had previously so it sparked my curiosity...

    When a computer locks up suddenly to so caps lock flashes incessantly and the only possibility to restart....how do you troubleshoot what is causing it? On Windows there would be some errors in the event log...on Linux it seems there is no opportunity for anything to be written to the log, making it hard to troubleshoot...

    In this case, how would you troubleshoot the problem through linux?

    • kmarsh
      kmarsh about 14 years
      Sudden H/W lockups rarely get logged by any operating system.
    • Jack
      Jack about 14 years
      Well, they do on windows, even if it is vague....
    • quack quixote
      quack quixote about 14 years
      not always. depends on the problem; if it's a true hardware freeze, the first indication Windows will give (in the error logs) is that it's rebooting. (BSoDs are not true hardware lockups in this sense.)
    • Marius Gedminas
      Marius Gedminas almost 14 years
      Flashing caps lock indicates a kernel panic (which is more or less the same as the BSoD on Windows). It's not necessarily a hardware problem, it could be a bug in the kernel/drivers.
  • Jack
    Jack about 14 years
    No no...as I said I am not having any problem at the moment, I just want to know the equivilant way to see hardware problme on linux, as I can on windows.
  • Kevin M
    Kevin M about 14 years
    Driver errors can happen on a live CD just as on a fully-installed system. It all matters on what drivers the system is using. If you use only generic drivers and it still happens, THEN it would be a HW issue.