linux freezes - how to find out if hardware or software is the cause?

6,749

Solution 1

Since you have replaced such lot of hardware, I presume you have already made sure your problem isn't about temperature issues.

What if you try out some completely different distro instead of Kubuntu 10.04? Download some other live distribution, for example openSUSE or even some BSD flavour, and see if they reproduce the freeze as well. That way you can be sure this isn't some kind of bug in Kubuntu 10.04.

How much data you have under the directory trees you are diffing? And more importantly, are there only couple of large files or huge number of small files?

When you replaced the hard drives, how did you copy the data from the old drive to another? dd_rescue or some imaging program? Just plain old cp? If you used some kind of imaging program or dd_rescue and the original filesystem somehow contained some strange corruption, perhaps diffing hits the corrupted area and causes a crash? Rare and unlikely, but certainly possible. Just like it's possible that a lightning hits you out there.

Solution 2

You need to get a crash dump and take a look through it. Looking in the logs won't help as they won't have anything written to them in the event of kernel panic/oops. If you have console access you may get to see if there is a panic message. A crash dump will have the contents of the kernel ring buffer (what you see in dmesg if it gets written to disk). If that doesn't help you need to start doing a full analysis of the dump

https://wiki.ubuntu.com/Kernel/CrashdumpRecipe?action=show&redirect=KernelTeam%2FCrashdumpRecipe

appears to be a start for ubuntu. Googling "redhat crash whitepaper" will also give you some pointers.

Share:
6,749

Related videos on Youtube

ssc
Author by

ssc

Updated on September 18, 2022

Comments

  • ssc
    ssc over 1 year

    a couple of weeks ago, my linux server (kubuntu 10.04) started to give me trouble.

    it freezes after a certain uptime, seemingly between a couple of minutes and a few hours - GUI is unresponsive, no reaction to mouse or keyboard (not even REISUB), top in an ssh session stops updating and the session itself is aborted after a timeout:

    Read from remote host 10.1.1.9: Operation timed out
    Connection to 10.1.1.9 closed.
    

    back then, I assumed a hardware issue, so i started replacing more and more hardware - graphics card, motherboard, cpu, ram, harddrives, psu. now i've replaced the entire machine and it still freezes.

    i've checked /var/log/messages and some other logs - there is no clue in them at all. a hardware issue seems unlikely considering it's all been replaced, but is still possible.

    i've stripped the machine down to the bare minimum. i boot a kubuntu live system from a usb stick, mount a couple of harddrives read-only and start diffing folders on them. this seems to produce the freeze somewhat reliably. so far, i haven't gotten beyond a few hours of uptime.

    my server is down, this has been going on for weeks now. i am at the end of my wisdom and i am clutching at straws.

    how can i reliably determine if this is a hardware or a software issue ? how would you approach a problem like that ?

  • ssc
    ssc almost 13 years
    thanks for your response! :-) it is always possible that temperature contributes to the problem, but there is quite a lot of fans in the machine, the case is open, no hardware gets more than handwarm, no airflow errors in hd smart data and i would rather expect the machine to shut down instead of just freeze - or at least log some errors.
  • ssc
    ssc almost 13 years
    i thought about booting freebsd before, but there doesn't seem to be a ready to use live system i can boot from a usb stick, so i might go with opensuse as you suggested or centos. a different distribution will probably use the same filesystem drivers, so some bsd might be required in the end after all. i'm diffing two folders of approximately 500GB each. unfortunately, they're on ntfs and hfs+ drives which makes matters more complex.
  • ssc
    ssc almost 13 years
    the data was not on that machine when the trouble started, i've used these drives for quite a while now and never had any issues, so there's some reason to believe they're ok. i haven't copied anything.
  • Xiong Chiamiov
    Xiong Chiamiov almost 13 years
    You can easily create USB-bootable images from isos using UNetBootin.
  • Janne Pikkarainen
    Janne Pikkarainen almost 13 years
    Oh -- DragonFlyBSD has an USB bootable image. dragonflybsd.org/download
  • ssc
    ssc almost 13 years
    I get 'Content not found' on your link, did you mean kde-look.org/content/show.php/Sensors-Monitor?content=111150 ?
  • ssc
    ssc almost 13 years
    OpenSUSE ran the entire night. Trying out Kubuntu 10.04 32 bit right now, but the difference is already visible. Seems like this is not hardware-related, it's Kubuntu 10.04 64bit freezing. I really feel kinda stupid right now - but then, I never had any Linux just freeze without any further message, unless there was a hardware issue...
  • Janne Pikkarainen
    Janne Pikkarainen almost 13 years
    ssc: Back when Red Hat was not yet Red Hat Enterprise Linux (around Red Hat 7.3), I had my own share of unexpected freezes. That's when I learnt my lesson and started to try with another distro - Knoppix was available back then. :)