Ubuntu restarts randomly and there is no log about the reason

16,042

Solution 1

This sounds like a combination of issues.

In the case of an individual system rebooting randomly I would want to replace the power supply in the chassis with one that provided more than adequate amperage for the connected components (as you want it to keep running during periods of peak power draw).

In the case where the entire rack reboot simultaneously I would look at an inadequate UPS as the root cause or possibly an overheat condition due to AC failure in the server location.

An intermittent short in the feed cord to the multi-tap could also result in the multiple reboot result that you describe.

Solution 2

If your sever has a BMC (Baseboard Management Controller) you can check for power-failures with the following installed command:

ipmitool sel list|grep -i power

you can install ipmitool with

apt install ipmitool

here an example output:

   4 |  Pre-Init  |0000000057| Power Unit #0x3f | Power off/down | Deasserted
   d |  Pre-Init  |0000000021| Power Unit #0x3f | Power off/down | Deasserted
  13 |  Pre-Init  |0000000022| Power Unit #0x3f | Power off/down | Deasserted
  16 | 09/12/2013 | 14:18:00 | Power Supply #0x30 | Presence detected | Asserted
  17 | 09/12/2013 | 14:18:00 | Power Supply #0x31 | Presence detected | Asserted

Also make sure you loaded the kernel module for ipmi:

modprobe ipmi_devintf

For module load you can also check with dmesg command:

dmesg|grep ipmi
Share:
16,042

Related videos on Youtube

Hankook Lee
Author by

Hankook Lee

Updated on September 18, 2022

Comments

  • Hankook Lee
    Hankook Lee over 1 year

    I installed the Ubuntu 16.04 desktop version on a machine, and used it for my research via ssh.

    Sometimes the machine restarts randomly, but I cannot find why restart.

    $ last reboot
    reboot   system boot  4.4.0-62-generic Wed Feb  8 01:34   still running
    reboot   system boot  4.4.0-62-generic Mon Feb  6 09:16   still running
    reboot   system boot  4.4.0-62-generic Sun Feb  5 16:43   still running
    reboot   system boot  4.4.0-62-generic Sun Feb  5 00:37   still running
    

    I checked /var/log/syslog ...

    Feb  7 23:31:37 niaserver7 systemd[1]: Started Session 77 of user swmo.
    Feb  8 00:17:01 niaserver7 CRON[17883]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
    Feb  8 00:34:07 niaserver7 systemd[1]: Started CUPS Scheduler.
    Feb  8 01:17:01 niaserver7 CRON[17893]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
    Feb  8 01:35:01 niaserver7 rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="1315" x-info="http://www.rsyslog.com"] start
    Feb  8 01:35:01 niaserver7 systemd-modules-load[538]: Inserted module 'lp'
    Feb  8 01:35:01 niaserver7 systemd-modules-load[538]: Inserted module 'ppdev'
    Feb  8 01:35:01 niaserver7 kernel: [    0.000000] Initializing cgroup subsys cpuset
    Feb  8 01:35:01 niaserver7 rsyslogd-2222: command 'KLogPermitNonKernelFacility' is currently not permitted - did you already set it via a RainerScript command (v6+ config)? [v8.16.0 try http://www.rsyslog.com/e/2222 ]
    Feb  8 01:35:01 niaserver7 systemd-modules-load[538]: Inserted module 'parport_pc'
    Feb  8 01:35:01 niaserver7 rsyslogd: rsyslogd's groupid changed to 108
    Feb  8 01:35:01 niaserver7 rsyslogd: rsyslogd's userid changed to 104
    Feb  8 01:35:01 niaserver7 loadkeys[541]: Loading /etc/console-setup/cached.kmap.gz
    Feb  8 01:35:01 niaserver7 kernel: [    0.000000] Initializing cgroup subsys cpu
    Feb  8 01:35:01 niaserver7 systemd[1]: Started udev Kernel Device Manager.
    

    How can I fix it?

    • Elder Geek
      Elder Geek about 7 years
      Random reboots are often the result of a failure to provide adequate power. You may have a failing power supply or may have upgraded hardware that is drawing more power than the existing supply can provide. The only other thing I can think of is someone rebooting the system via ssh
    • albert j
      albert j about 7 years
      Can you run a memtest86 to check your RAM ?
    • Hankook Lee
      Hankook Lee about 7 years
      @ElderGeek Thanks for your response. There are 4 machines in a rack, and they connect to a same power supply. The machines sometimes reboot individually, but sometimes they reboot simultaneous. In the setting, is power failure the most reasonable casue of this problem?
    • Elder Geek
      Elder Geek about 7 years
      I think you are telling me that you have 4 machines in a rack all connected to the same UPS while I was talking about the Power Supplies in the individual machines. Can you verify that I am understanding your situation and have stated it correctly?
    • Hankook Lee
      Hankook Lee about 7 years
      @ElderGeek Sorry for confusion. I want to say they are connected to a multi-­tap. I think they can exeed the limit of power. As you say, the 4 machines have each power supply like this.
    • user535733
      user535733 about 7 years
      Ubuntu does not have a 'reboot ungracefully without logging' feature. Who would want that? 95% probability you have a hardware issue. From your description, heat or power. I think you are wise to star with power.
    • Hankook Lee
      Hankook Lee about 7 years
      @albertj Thanks for your response. I cannot test memtest86 right now. I'll test later.
  • Hankook Lee
    Hankook Lee about 7 years
    The command sudo ipmitool sel list | grep -i power produces nothing. Does this mean there is no power-failure?
  • Hankook Lee
    Hankook Lee about 7 years
    Thanks for your answer! I'll check things you mentioned.
  • 0x0C4
    0x0C4 about 7 years
    Looks strange. Please check without "|grep..." if there is any message or if you get an error message. I will post an example what you see as a minimum after the system gets first time power.
  • Hankook Lee
    Hankook Lee about 7 years
    I already run modprobe ipmi_devintf and modprobe ipmi_si. Without grep, result is 9 | 02/06/2017 | 00:14:42 | Physical Security #0xaa | General Chassis intrusion () | Asserted, a | 02/07/2017 | 16:33:50 | Physical Security #0xaa | General Chassis intrusion () | Asserted, and so on.
  • 0x0C4
    0x0C4 about 7 years
    Okay, looks like you use Supermicro System (?) and this is caused by the Chassis Intrusion Sensor during open the server chassis. No problem so far. Please have a look here to the IPMI User Manual if you use Supermirco Baords: ftp.supermicro.com/utility/IPMIView/IPMIView20.pdf --> On Page 53 you see an example for Power messages. Please check with "ipmitool sel elist" if you can see such kind of messages (you can use grep again).
  • Hankook Lee
    Hankook Lee about 7 years
    Yes, the machines use Supermicro boards. However ipmitool sel elist produces similar results: a | 02/07/2017 | 16:33:50 | Physical Security Chassis Intru | General Chassis intrusion () | Asserted