EXT4-fs error after Ubuntu 17.04 upgrade

filesystem ext4 17.04

51,510

Solution 1

As pointed out in a comment by Elder Geek, this is due to a known bug.

From the bug report:

APST support just landed in the latest Zesty kernel (4.10.0-14.16) as part of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1664602. That patch has a quirk for certain 256GB Samsung drives found in Dell laptops that do not behave well when APST is enabled. I am experiencing the same symptoms with the same model laptop except with a 512GB Samsung. Prior to manually disabling APST the drive would die and system would go down in flames with I/O errors within 20 to 40 minutes of boot.

Until a proper fix is implemented, a workaround is suggested, which involves adding a kernel parameter:

Please try nvme_core.default_ps_max_latency_us=5500, if the issue persists, please try nvme_core.default_ps_max_latency_us=200.

To add a kernel boot parameter, edit the configuration file for GRUB:

sudo nano /etc/default/grub

Find the line beginning GRUB_CMDLINE_LINUX_DEFAULT and add the boot parameter to the others already between the quotes. For example, in this case you will probably end up with

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=5500"

Save the file and exit, then to make the change effective, run

sudo update-grub

Solution 2

First, I'd visit the Samsung support web site and assure that you've got the latest firmware installed for your model SSD.

Then, your fsck didn't make a whole lot of sense, so do it this way...

To check the file system on your Ubuntu partition...

boot to the GRUB menu
choose Advanced Options
choose Recovery mode
choose Root access
at the # prompt, type sudo fsck -f /
repeat the fsck command if there were errors
type reboot

51,510

Ben B

Updated on September 18, 2022

Comments

Ben B over 1 year

I have a Dell XPS 15 9550. I've been running Ubuntu 16.10 on it for four months with no dramas.

Two days ago, I upgraded to Ubuntu 17.04. About an hour after upgrading, my hard-drive remounted into read-only mode. When I jumped to a tty screen, this appeared:

[ 746.341551] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #525023: comm NetworkManager: reading directory iblock 0
[ 746.343318] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524289: comm pool: reading directory iblock 0
[ 746.356125] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272213: comm systemd-udevd: reading directory iblock 0
[ 746.356139] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
[ 746.356332] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272193: comm systemd-udevd: reading directory iblock 0
[ 746.356338] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272825: comm systemd-udevd: reading directory iblock 0
[ 746.356400] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
[ 746.474632] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524539: comm unity-settings-: reading directory iblock 0
[ 746.992814] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #5506108: comm BrowserBlocking: reading directory iblock 0
[ 746.304451] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #5506117: comm BrowserBlocking: reading directory iblock 0

Here's what fdisk -l shows:

Disk /dev/nvme0n1: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3CD27380-DAC8-48DC-910A-D084CE857DA3

Device             Start        End   Sectors   Size Type
/dev/nvme0n1p1      2048    1026047   1024000   500M EFI System
/dev/nvme0n1p2   1026048    1288191    262144   128M Microsoft reserved
/dev/nvme0n1p3   1288192  487948287 486660096 232.1G Microsoft basic data
/dev/nvme0n1p4 972302336  973223935    921600   450M Windows recovery environmen
/dev/nvme0n1p5 973223936  998094847  24870912  11.9G Windows recovery environmen
/dev/nvme0n1p6 998094848 1000204287   2109440     1G Windows recovery environmen
/dev/nvme0n1p7 487948288  939046911 451098624 215.1G Linux filesystem
/dev/nvme0n1p8 939046912  972302335  33255424  15.9G Linux swap

Partition table entries are not in disk order.

I rebooted, and continued to get the error around once an hour. So I reinstalled Ubuntu 17.04 from scratch. However I am still getting the same issue.

I tried running fsck by creating a /forcefsck file (I created a wrapper shell script that adds the -v flag and outputs stdout to a file). Here's the result:

fsck.fat 4.0 (2016-05-06)                               
Checking we can access the last sector of the filesystem
Boot sector contents:                                   
System ID "MSDOS5.0"                                    
Media byte 0xf8 (hard disk)                             
       512 bytes per logical sector                     
      4096 bytes per cluster                            
      6206 reserved sectors                             
First FAT starts at byte 3177472 (sector 6206)          
         2 FATs, 32 bit entries                         
    508416 bytes per FAT (= 993 sectors)                
Root directory start at cluster 2 (arbitrary size)      
Data area starts at byte 4194304 (sector 8192)          
    126976 data clusters (520093696 bytes)              
63 sectors/track, 255 heads                             
      2048 hidden sectors                               
   1024000 sectors total                                
Reclaiming unconnected clusters.                        
Checking free cluster summary.                          
/dev/nvme0n1p1: 212 files, 15526/126976 clusters

I tried booting from a live USB and running e2fsck -p /dev/nvme0n1p7 as suggested here (https://askubuntu.com/a/768813/679041). It didn't give any errors.

I also tried to run smartctl -t long /dev/nvme0n1p7 however the results seem to indicate that the tool doesn't work with my particular SSD:

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.0-19-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       PM951 NVMe SAMSUNG 512GB
Serial Number:                      S29PNX0H611013
Firmware Version:                   BXV77D0Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Utilization:            254,982,533,120 [254 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Apr 17 17:45:48 2017 AEST
Firmware Updates (0x06):            3 Slots
Optional Admin Commands (0x0017):   Security Format Frmw_DL *Other*
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.00W       -        -    0  0  0  0        5       5
 1 +     4.20W       -        -    1  1  1  1       30      30
 2 +     3.10W       -        -    2  2  2  2      100     100
 3 -   0.0700W       -        -    3  3  3  3      500    5000
 4 -   0.0050W       -        -    4  4  4  4     2000   22000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x2002

Any idea of why this issue might be occuring and how I might solve it? Thanks! :)

Admin about 7 years

Welcome to AskUbuntu! It looks like you may be affected by this bug I recommend that you let the devs know that this bug also effects you and subcribe to the bug so that you can be notified of progress/resolution.
Admin about 7 years

I'm having the exact same problem on a Lenovo Thinkpad X270 with a Toshiba SSD "THNSF5256GPUK TOSHIBA". I guess it's good to know I'm not the only one.
Admin about 7 years

@ElderGeek reading the linked bug report, it seems that until the issue is fixed, a temporary fix would be to disable APST, however from the discussion there it is unclear to me how to do that. It seems like a way to do so would be a valid answer to this question.
Admin about 7 years

Thanks for your comments guys :) Impatiently, I reinstalled again last night, however this time I explicitly formatted /dev/nvme0n1p7 and deleted /dev/nvme0n1p8 beforehand (I thought perhaps a reinstall with all the default options might not actually format, and instead only delete old files before installing new ones). Am yet to experience the issue after 4 hours of uninterrupted use however only time will tell. You'll hear my sobs across the pacific if I do :)
Admin about 7 years

OK I can confirm - I just got the issue again despite completely formatting the partition. Will add comment to bug linked above
Admin about 7 years

@Maeher That's certainly possible, however I can't seem to find the kernel option parameter to disable it. Perhaps you can. For a complete list of all known options, please see the file Documentation/kernel-parameters.txt in the kernel source tree and the individual architecture-specific documentation files. :-)
Admin about 7 years

I just want to report that I have the same machine and I experience the same bug, but with Ubuntu 16.04 LTS with kernel 4.8.0.48.20 (I have the hardware enablement stacks) I think they backported the bug from 4.10 because it started happening just one or two kernel updates ago (well after the update from 4.4 to 4.8) For me the workaround is just using older kernels, but that's not advisable in 17.04 I guess...
Admin about 6 years

VTR (Vote to Reopen). This problem effects many users including one today that confirms this Q&A's accepted answer solves: askubuntu.com/questions/1018685/… Because there are numerous times we recommend changing kernel parameters in grub for suspend/resume, graphics cards, or whatever to work properly, this question should be treated similarly.

Ben B about 7 years

Thanks for your response! I've reinstalled, but this time I explicitly formatted the problem partition first (in case the default reinstall process didn't actually format). Hopefully it's OK now, however if the issue persists I'll run an fsck and post the results (though I would say if the problem persists on a freshly formatted partition, it might be beyond fsck's capabilities)
Ben B about 7 years

The issue occurred again, however as pointed out by Elder Geek in the comments below my question, it seems to be due to a known bug (bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184).
Boris Hamanov about 7 years

@BenB did you ever check the firmware in your Samsung SSD, as I had suggested earlier? Depending on the model, they had some very mandatory updates to make the drive work right.
justmyfault about 7 years

Is this fix working for you? BTW a link on how to set kernel parameters for who might stumble on your answer wiki.ubuntu.com/Kernel/KernelBootParameters
Ben B about 7 years

I'm not actually 100% sure how to do this. I found some firmware here however I am not 100% certain any of those apply to my particular SSD. The bug report doesn't point to any firmware-related problems anyway, so at this point I'd rather wait for more info from the devs tackling the bug before trying to upgrade the firmware (knowing me, I'd do it wrong and lose all my stuff :P).
Boris Hamanov about 7 years

The Samsung Magician Software shown on the link that you gave is an excellent way to check your firmware. Your model # and firmware version are shown in your SMART report. When you run fsck does it show any errors?
Ben B about 7 years

fsck shows no errors. The problem isn't any sort of firmware issue or SSD corruption. It's due to APST, which has been enabled in 17.04. Setting the kernel parameter 'nvme_core.default_ps_max_latency_us=5500' has fixed the issue for me, and others have reported that disabling APST altogether fixes it for them.
lukecampbell over 6 years

I am running Ubuntu 16.04 and I have been upgrading packages piecewise to zesty, something I wouldn't recommend to anyone but doing out of necessity. The last package was libc, something so integral to the system that if something would go wrong it would be while upgrading libc. On reboot, I saw all of the EXT4 errors mentioned in the question above, but adding the kernel parameter finally allowed me to reboot in peace and continue. Thank you.
PPP over 6 years

any updates on this? I'm suffering this problem on my razer blade stealth with a samsung 512gb ssd
Ben B over 6 years

The above workaround worked for me, but the bug has been fixed in package linux - 4.10.0-22.24. If you are still having issues you should open up a new bug report on launchpad.
Mike Schroll over 5 years

I tried both values, but it still crashed. nvme_core.default_ps_max_latency_us=0 worked for me. Kernel 4.15.0-36-generic Ubuntu 16.04
Vanja D. over 3 years

I am trying with nvme_core.default_ps_max_latency_us=5500 on 4.15.0-122-generic Ubuntu 18.04.5 LTS.