Ubuntu restarts randomly and there is no log about the reason
Solution 1
This sounds like a combination of issues.
In the case of an individual system rebooting randomly I would want to replace the power supply in the chassis with one that provided more than adequate amperage for the connected components (as you want it to keep running during periods of peak power draw).
In the case where the entire rack reboot simultaneously I would look at an inadequate UPS as the root cause or possibly an overheat condition due to AC failure in the server location.
An intermittent short in the feed cord to the multi-tap could also result in the multiple reboot result that you describe.
Solution 2
If your sever has a BMC (Baseboard Management Controller) you can check for power-failures with the following installed command:
ipmitool sel list|grep -i power
you can install ipmitool with
apt install ipmitool
here an example output:
4 | Pre-Init |0000000057| Power Unit #0x3f | Power off/down | Deasserted d | Pre-Init |0000000021| Power Unit #0x3f | Power off/down | Deasserted 13 | Pre-Init |0000000022| Power Unit #0x3f | Power off/down | Deasserted 16 | 09/12/2013 | 14:18:00 | Power Supply #0x30 | Presence detected | Asserted 17 | 09/12/2013 | 14:18:00 | Power Supply #0x31 | Presence detected | Asserted
Also make sure you loaded the kernel module for ipmi:
modprobe ipmi_devintf
For module load you can also check with dmesg command:
dmesg|grep ipmi
Related videos on Youtube
Hankook Lee
Updated on September 18, 2022Comments
-
Hankook Lee over 1 year
I installed the Ubuntu 16.04 desktop version on a machine, and used it for my research via ssh.
Sometimes the machine restarts randomly, but I cannot find why restart.
$ last reboot reboot system boot 4.4.0-62-generic Wed Feb 8 01:34 still running reboot system boot 4.4.0-62-generic Mon Feb 6 09:16 still running reboot system boot 4.4.0-62-generic Sun Feb 5 16:43 still running reboot system boot 4.4.0-62-generic Sun Feb 5 00:37 still running
I checked
/var/log/syslog
...Feb 7 23:31:37 niaserver7 systemd[1]: Started Session 77 of user swmo. Feb 8 00:17:01 niaserver7 CRON[17883]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Feb 8 00:34:07 niaserver7 systemd[1]: Started CUPS Scheduler. Feb 8 01:17:01 niaserver7 CRON[17893]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Feb 8 01:35:01 niaserver7 rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="1315" x-info="http://www.rsyslog.com"] start Feb 8 01:35:01 niaserver7 systemd-modules-load[538]: Inserted module 'lp' Feb 8 01:35:01 niaserver7 systemd-modules-load[538]: Inserted module 'ppdev' Feb 8 01:35:01 niaserver7 kernel: [ 0.000000] Initializing cgroup subsys cpuset Feb 8 01:35:01 niaserver7 rsyslogd-2222: command 'KLogPermitNonKernelFacility' is currently not permitted - did you already set it via a RainerScript command (v6+ config)? [v8.16.0 try http://www.rsyslog.com/e/2222 ] Feb 8 01:35:01 niaserver7 systemd-modules-load[538]: Inserted module 'parport_pc' Feb 8 01:35:01 niaserver7 rsyslogd: rsyslogd's groupid changed to 108 Feb 8 01:35:01 niaserver7 rsyslogd: rsyslogd's userid changed to 104 Feb 8 01:35:01 niaserver7 loadkeys[541]: Loading /etc/console-setup/cached.kmap.gz Feb 8 01:35:01 niaserver7 kernel: [ 0.000000] Initializing cgroup subsys cpu Feb 8 01:35:01 niaserver7 systemd[1]: Started udev Kernel Device Manager.
How can I fix it?
-
Elder Geek about 7 yearsRandom reboots are often the result of a failure to provide adequate power. You may have a failing power supply or may have upgraded hardware that is drawing more power than the existing supply can provide. The only other thing I can think of is someone rebooting the system via ssh
-
albert j about 7 yearsCan you run a memtest86 to check your RAM ?
-
Hankook Lee about 7 years@ElderGeek Thanks for your response. There are 4 machines in a rack, and they connect to a same power supply. The machines sometimes reboot individually, but sometimes they reboot simultaneous. In the setting, is power failure the most reasonable casue of this problem?
-
Elder Geek about 7 yearsI think you are telling me that you have 4 machines in a rack all connected to the same UPS while I was talking about the Power Supplies in the individual machines. Can you verify that I am understanding your situation and have stated it correctly?
-
Hankook Lee about 7 years@ElderGeek Sorry for confusion. I want to say they are connected to a multi-tap. I think they can exeed the limit of power. As you say, the 4 machines have each power supply like this.
-
user535733 about 7 yearsUbuntu does not have a 'reboot ungracefully without logging' feature. Who would want that? 95% probability you have a hardware issue. From your description, heat or power. I think you are wise to star with power.
-
Hankook Lee about 7 years@albertj Thanks for your response. I cannot test memtest86 right now. I'll test later.
-
-
Hankook Lee about 7 yearsThe command
sudo ipmitool sel list | grep -i power
produces nothing. Does this mean there is no power-failure? -
Hankook Lee about 7 yearsThanks for your answer! I'll check things you mentioned.
-
0x0C4 about 7 yearsLooks strange. Please check without "|grep..." if there is any message or if you get an error message. I will post an example what you see as a minimum after the system gets first time power.
-
Hankook Lee about 7 yearsI already run
modprobe ipmi_devintf
andmodprobe ipmi_si
. Withoutgrep
, result is9 | 02/06/2017 | 00:14:42 | Physical Security #0xaa | General Chassis intrusion () | Asserted
,a | 02/07/2017 | 16:33:50 | Physical Security #0xaa | General Chassis intrusion () | Asserted
, and so on. -
0x0C4 about 7 yearsOkay, looks like you use Supermicro System (?) and this is caused by the Chassis Intrusion Sensor during open the server chassis. No problem so far. Please have a look here to the IPMI User Manual if you use Supermirco Baords: ftp.supermicro.com/utility/IPMIView/IPMIView20.pdf --> On Page 53 you see an example for Power messages. Please check with "ipmitool sel elist" if you can see such kind of messages (you can use grep again).
-
Hankook Lee about 7 yearsYes, the machines use Supermicro boards. However
ipmitool sel elist
produces similar results:a | 02/07/2017 | 16:33:50 | Physical Security Chassis Intru | General Chassis intrusion () | Asserted