Regular freezing on Ryzen based system, 16.04 LTS and newer kernel
Solution 1
I had the same problem... What I did to solve this issue:
Performance:
sudo cpufreq-set -r -g performance
Set on boot:
sudo apt-get install cpufrequtils
echo 'GOVERNOR="performance"' | sudo tee /etc/default/cpufrequtils
sudo systemctl disable ondemand
Solution 2
I had kind of the same problem as you. Ryzen 1800x
I suggest you to:
Re-enable SMT - No need to disable it.
Go back to the normal current kernel for Ubuntu 16.04 which is currently 4.4.0-93
Disable all "power saving" Global C-State options in BIOS.
Disable cool n quiet option as well.
Increase the voltage of your SoC to 1.1 for stability, this is recommended. As stated in this video: https://www.hardocp.com/news/2017/05/01/how_to_stabilize_your_amd_ryzen_memory_cpu_overclocking_attempts
The above recommendation is valid for if you are stressing the CPU or if you are idling.
Download latest AMD Drivers on the AMD website for your card. You can also try the latest open-source drivers via: "Additional Drivers" under "Software & Updates". I recommend this option first.
Before doing the above, just reset the BIOS to default and check if there is a newer version available.
Related videos on Youtube
ankit7540
Updated on September 18, 2022Comments
-
ankit7540 over 1 year
I am running Ryzen 1700X CPU and doing computations. Every now and then the system crashes, while running 16.04 LTS (Kernel 4.10). The system does not reboot. There is no signal on display and the keyboard + mouse do not work. I cannot connect via SSH.
I saved the kern.log and syslog files while running 16.04 LTS.
After reading several posts, and reading issues about the new architecture and issues, I decided to try more recent kernel and I moved to 4.12.8 (dated 16th Aug, 2017) from here. I used this post on AskUbuntu to update the kernel. System booted fine and my application ran fine for ~10 hours now.
After about ~11 hours system crashed again, with the same messages in the
syslog
as seen with kernel 4.10 on 16.04 LTS, given below. {Kernel and syslog files, with 4.12 kernel: kern.log with new kernel and syslog with new kernel }Aug 18 17:27:13 vriksha systemd[1]: Starting Cleanup of Temporary Directories... Aug 18 17:27:13 vriksha systemd-tmpfiles[4661]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring. Aug 18 17:27:13 vriksha systemd[1]: Started Cleanup of Temporary Directories. Aug 18 17:28:25 vriksha ntpd[1516]: 209.242.224.117 local addr 192.168.2.15 -> <null> Aug 18 17:35:01 vriksha CRON[4821]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Aug 18 17:35:40 vriksha systemd[1]: Started Session 5 of user vani. Aug 18 17:42:18 vriksha sensord: Chip: amdgpu-pci-2700 Aug 18 17:42:18 vriksha sensord: Adapter: PCI adapter Aug 18 17:42:18 vriksha sensord: fan1: 1423 RPM Aug 18 17:42:18 vriksha sensord: temp1: 43.0 C Aug 18 17:42:18 vriksha sensord: Chip: asus-isa-0000 Aug 18 17:42:18 vriksha sensord: Adapter: ISA adapter Aug 18 17:42:18 vriksha sensord: cpu_fan: 0 RPM Aug 18 17:45:01 vriksha CRON[6142]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Aug 18 17:55:01 vriksha CRON[6431]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Aug 18 18:05:01 vriksha CRON[6607]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Aug 18 18:09:52 vriksha kernel: [ 3459.913711] perf: interrupt took too long (2529 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 Aug 18 18:12:18 vriksha sensord: Chip: amdgpu-pci-2700 Aug 18 18:12:18 vriksha sensord: Adapter: PCI adapter Aug 18 18:12:18 vriksha sensord: fan1: 1431 RPM Aug 18 18:12:18 vriksha sensord: temp1: 40.0 C Aug 18 18:12:18 vriksha sensord: Chip: asus-isa-0000 Aug 18 18:12:18 vriksha sensord: Adapter: ISA adapter Aug 18 18:12:18 vriksha sensord: cpu_fan: 0 RPM Aug 18 18:15:01 vriksha CRON[6785]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Aug 18 18:17:01 vriksha CRON[6825]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Aug 18 18:25:01 vriksha CRON[6967]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
After the last line in the above message (in
syslog
) the system froze. I had to reset to reboot again. This happened again with the new kernel.System details:
CPU-1700X Ryzen, No SMT, BIOS version- 3401 dated 12/08/2017 (AGESA 1071) RAM 32 GB AMD RX 470 GPU Lubuntu 16.04 LTS, LXDE with Openbox
Can somebody help me out.
Updates
The application I am running is not using
gcc
,g++
.lspci
output is here.dmesg | egrep 'drm|radeon'
output is here(root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
is related to thesysstat
package which I removed. The problem still exists.-
glxinfo | grep -i open
output for AMD RX 470 GPU is given belowglxinfo | grep -i open OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD POLARIS10 (DRM 3.15.0 / 4.12.8-041208-generic, LLVM 4.0.0) OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.0.7 OpenGL core profile shading language version string: 4.50 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 17.0.7 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.1 Mesa 17.0.7 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10 OpenGL ES profile extensions:
I have connected only one display to this computer. The crashes happen only when running CPU intensive tasks for long durations of time. ( I leave the system with its display off, controlling it, checking it from a SSH connection. After 5-6 hours or so, SSH connection becomes unavailable. After coming back to the machine, moving mouse and keyboard do nothing to bring the display back. A hard reset is required).
-
To check if this is because of GPU or not, I changed to nVidia GTX 1080 for which I installed the proprietary driver and still under the similar load, the system freezes. I changed back to AMD GPU and there the problem persists. I rule out this behavior due to GPU build type. For the nVidia card the
glxinfo | grep -i open
output is following;OpenGL vendor string: NVIDIA Corporation OpenGL renderer string: GeForce GTX 1080/PCIe/SSE2 OpenGL core profile version string: 4.5.0 NVIDIA 384.81 OpenGL core profile shading language version string: 4.50 NVIDIA OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 4.5.0 NVIDIA 384.81 OpenGL shading language version string: 4.50 NVIDIA OpenGL context flags: (none) OpenGL profile mask: (none) OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 384.81 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20 OpenGL ES profile extensions:
- Updated the BIOS to version 3401 (12/08/2017, AGESA 1071) and the problem persists.
-
ankit7540 over 6 yearsI disabled SMT intentionally since the application(s) I use may suffer from cache miss and hence the numerical accuracy of results. These scenario happens in high performance computing when parallel computations for long duration.
-
ankit7540 over 5 yearsI tried this. After running
sudo systemctl disable ondemand
, I receivedondemand.service is not a native service, redirecting to systemd-sysv-install Executing /lib/systemd/systemd-sysv-install disable ondemand insserv: warning: current start runlevel(s) (empty) of script ondemand overrides LSB defaults (2 3 4 5). insserv: warning: current stop runlevel(s) (2 3 4 5) of script ondemand overrides LSB defaults (empty).
Is this normal.