18.04 or 19.04 on Samsung NVMe SSD → Bus errors

18.04 upgrade 19.04 nvme

5,577

Solution 1

I solve this problem by disabling ASPM.

Add pcie_aspm=off to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub file and run sudo update-grub

Solution 2

I haven't had a failure since updating grub with the following option: nvme_core.default_ps_max_latency_us=6000

It's been 2 days now, with active gaming, which usually produced the bug within ~30 minutes.

It failed at 0, it still failed at 3000, but 6000 is going strong. And I didn't have to replace the drive.

edit: it still crashes. I was 'fine' in xfce4, i switched back to plasma, and it crashed. i rebooted to xfce again, crashed again (while gaming).

new plan: reinstall on other drive, will use seagate drive for holding files not the OS.

5,577

Déjà vu

[email protected] Linux & Mac.

Updated on September 18, 2022

Comments

Déjà vu over 1 year
Beforehand:
- Did update BIOS (MB: Asus Z390-F)
- Did update NVMe SSD firmware (Samsung 970 EVO Plus, 500G)
- Did run memtest86 for hours (after the problems happened: is OK)
- CPU is Intel i5-9400 providing UHD Graphics 630
- UEFI boot
- Not overclocking
- BIOS is all "optimized defaults", except for Wake On LAN (On), "Download and install Armory Crate" (Off)
- The MB (motherboard) has nothing else (no CD, no graphic card...)
Ubuntu install ( The SSD was supposed to be dedicated to Ubuntu 18.04 )
- Installed fresh Ubuntu 18.04.3
- Selected "Update while installing" + "Install 3rd party drivers..."
- Did manual partitioning, all ext4 (then tried also "Erase disk and install")
- → Install had a "crash", is "sorry" and "click to reboot"
So, tried again, the whole thing, without "Update while installing" & "Install 3rd party drivers..." (removed the network to be sure)
- this time it went through
- Then did apt update ; apt upgrade
- Reboot
- Then after a while, using Ubuntu GUI, suddenly pages and pages of that
Edit, dmesg excerpt:
```
[  109.632452] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
[  109.632466] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  109.632474] pcieport 0000:00:1d.0:   device [8086:a330] error status/mask=00000001/00002000
[  109.632479] pcieport 0000:00:1d.0:    [ 0] RxErr                 
[  134.215214] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
[  134.215219] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  134.215222] pcieport 0000:00:1d.0:   device [8086:a330] error status/mask=00000001/00002000
[  134.215224] pcieport 0000:00:1d.0:    [ 0] RxErr                 
[  144.288685] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[  144.320962] print_req_error: I/O error, dev nvme0n1, sector 111572600 flags 80700
[  144.344627] nvme 0000:03:00.0: enabling device (0000 -> 0002)
[  144.344951] nvme nvme0: Removing after probe failure status: -19
[  144.369035] print_req_error: I/O error, dev nvme0n1, sector 127856976 flags 801
[  144.369062] print_req_error: I/O error, dev nvme0n1, sector 332338688 flags 801
[  144.369075] EXT4-fs warning (device nvme0n1p8): ext4_end_bio:323: I/O error 10 writing to inode 16253466 (offset 0 size 233472 starting block 41542393)
[  144.369079] Buffer I/O error on device nvme0n1p8, logical block 70080
[  144.369099] Buffer I/O error on device nvme0n1p8, logical block 70081
[  144.369103] Buffer I/O error on device nvme0n1p8, logical block 70082
[  144.369106] Buffer I/O error on device nvme0n1p8, logical block 70083
[  144.369110] Buffer I/O error on device nvme0n1p8, logical block 70084
[  144.369113] Buffer I/O error on device nvme0n1p8, logical block 70085
[  144.369116] Aborting journal on device nvme0n1p5-8.
[  144.369118] Buffer I/O error on device nvme0n1p8, logical block 70086
[  144.369121] Buffer I/O error on device nvme0n1p8, logical block 70087
[  144.369124] Buffer I/O error on device nvme0n1p8, logical block 70088
[  144.369127] Buffer I/O error on device nvme0n1p8, logical block 70089
[  144.369136] EXT4-fs error (device nvme0n1p5) in ext4_reserve_inode_write:5901: Journal has aborted
[  144.369140] Buffer I/O error on dev nvme0n1p5, logical block 3178496, lost sync page write
[  144.369144] JBD2: Error -5 detected when updating journal superblock for nvme0n1p5-8.
[  144.369164] Buffer I/O error on dev nvme0n1p5, logical block 0, lost sync page write
[  144.369169] EXT4-fs (nvme0n1p5): I/O error while writing superblock
[  144.369173] EXT4-fs (nvme0n1p5): Remounting filesystem read-only
[  144.369245] EXT4-fs warning (device nvme0n1p5): ext4_end_bio:323: I/O error 10 writing to inode 1311005 (offset 0 size 0 starting block 31736)
```
get displayed. Had to reset the PC.

Initially I was looking at a bad NVMe SSD, but...

Windows 10 trial-install. So I installed Windows 10 (the free ISO) to confirm the problem is the SSD (Samsung supports only Windows)
- taking the whole disk,
- creating as many partitions as what was on Ubuntu
- did the MS updates
- did a stress test on all partitions, copying files over and over, filling the partitions
- rebooted many times
- → No errors anywhere, nothing in the s.m.a.r.t log (Samsung magician)
Therefore, it seems hard to blame the SSD.

Ubuntu 19.04 install. Then, on the data partitions created in Windows (except the one where Windows is) I installed Ubuntu 19.04 (not 18.04), thus these parts went from ntfs to ext4.
- All went fine
- Did apt update && apt upgrade
- Reboot
- Using Ubuntu,
- → then same as before, pages and pages of the same messages as above
- → Reboot: goes to Grub "shell"
Booting (from BIOS) to Windows is OK. Ubuntu boot goes to grub shell.

Not sure what to do next ...
- could be that the Ubuntu upgrade affects Grub (due to NVMe SSD)?
- could be a bad SSD, but in that case why Windows has no problem?
- both 18.04.3 and 19.04 use the kernel 5.0.x, could that be linked to the problem?
⇒ Edit ⇒
It seems the problem happened to many, and mostly on Samsung NVMe drives.

Following the advice on these pages, I installed Ubuntu (18.04.3, kernel 5.0.0.19) adding the option nvme_core.default_ps_max_latency_us=0 to the Grub Linux launch command → was fine.

Then rebooted, adding the same option to the command → fine. And confirmed APST was disabled
```
# nvme get-feature -f 0x0c -H /dev/nvme0

Autonomous Power State Transition Enable (APSTE): Disabled
```
Then did (same session)
```
# apt update
# apt upgrade
```
and a few seconds later... error messages are back, covering the screen, GUI locked, even invading the other CtrlAlt Fx consoles... (actually it happens after some time, not just because the command is apt)

Not sure if it is a Linux bug, or Samsung ignoring Linux, but I'm stuck!

(edit +7h
- Removed the Samsung,
- plugged (SATA) a "classic" SSD,
- installed Ubuntu 19.04, made all updates, did a find / -ls and other commands, used the graphics, rebooted several times
- → Works!
My next step is to replace the Samsung with a SanDisk NVMe to be purchased tomorrow (have had one in a laptop with Ubuntu for months, no problem at all)

Will keep the page updated!

)
Déjà vu almost 4 years

What's your Ubuntu version?
GHorev almost 4 years

I had the same errors on 18.04 and 20.04. CPU: i7-8750H SSD: ADATA SX6000PNP
Matjaz over 3 years

My system worked fine for 1.5 years, but started to crash as described here after I filled it to more than about 300 GB (at least, this seems the most likely trigger). It seems I also solved it with pcie_aspm=off. I am error free for 6 hours now, when previously I got an error within half an hour. I am on AMD TR 2950X, Asrock X399 Taichi and Samsung SSD 970 PRO 512GB (Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983).
curran about 3 years

To add some detail, in order to "add" that, it needs to be inserted inside the space-delimited string. The edited line in /etc/default/grub looks like this before the edit: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" and looks like this after the edit: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pcie_aspm=off"