Are these SATA errors dangerous?

147,906

Solution 1

While I essentially agree with Geppettvs D'Constanzo's answer, I would suggest that some of the first things you might also try are

  1. Checking that your SATA cable is securely attached and plugged into the sockets on the motherboard and hard drive.

  2. Replacing your SATA cable. SATA cables are (relatively) inexpensive and you do sometimes get a "bad" one. Often simply replacing the cable is the easiest way to diagnose and solve a problem like this.

(Although it is somewhat unexpected that two cables would both be bad at the same time. Still, it's an easy thing to check so in my opinion probably worth doing.)

I just saw you pastbins containing the SMART data for your drives. Notice the unexpectedly large number of CRC errors for drives sdb and sdc. I suggest you start by checking the cables and connections for those drives.

junior@mediacenter:/$ sudo  smartctl -a /dev/sda
...
Model Family:     SAMSUNG SpinPoint M7E (AFT)
Device Model:     SAMSUNG HM321HI
...
199 UDMA_CRC_Error_Count    0x0036   200   200   000   Old_age  Always -    0

junior@mediacenter:/$ sudo  smartctl -a /dev/sdb
...
Model Family:     SAMSUNG SpinPoint F4 EG (AFT)
Device Model:     SAMSUNG HD204UI
...
199 UDMA_CRC_Error_Count    0x0036   100   100   000   Old_age  Always  -  57

junior@mediacenter:/$ sudo  smartctl -a /dev/sdc
...
Model Family:     SAMSUNG SpinPoint F4 EG (AFT)
Device Model:     SAMSUNG HD204UI
...
199 UDMA_CRC_Error_Count    0x0036   100   100   000   Old_age  Always  - 398

OK. So not a latpop then. ;-)
Of course, if this is happening on a laptop than none of the above apply and I'm not sure what advice to offer. Maybe remove and re-install the hard drive? Perhaps it just needs to be re-seated in its socket to improve the connection?


sbd and sdc are connected on the same external e-sata cable (Thermaltake Duo HDD Dock). i'll replace my e-sata cable.

It could be due to a faulty or low quality cable. It could also be that the cable is somehow moved, bumped, or otherwise jostled while the drive is being used.

Solution 2

It looks like you have a bad quality/damaged SATA Power/Data Cable. Which may be causing Bad CRC's. They aren't harmful at all and you can live with them but you are going to lose a lot of data soon.

The SMART report of your hard disk drives looks sane, so I am for power supply issues based on my experience when setting 5 hard disk drives in the same case/power source. I finished using an external power source (475W) for 2 drives and the case's 600W for all the case including GPU, optical and hard disk drives.

Anyway, I suggest you to run a full backup before you do anything else. If possible, clone your hard disk drive, after which you should check your cables and power source voltages.

Solution 3

There seems to be a problem between some kernel versions ans some SATA controllers.

I have recently started to suffer a very similar problem (not sure if it is just the same) on a web server running Scientific Linux.

The most accurate and complete information I have found about such problem is this launchpad bug.

In short: Disabling NCQ seems to be the best workaround for users having this problem.

Solution 4

Had the same issue - in my case this was due to 4-pin to SATA power adapter not being plugged snugly.

Solution 5

This error is unlikely to damage your hard drive but is highly likely to corrupt your filesystem(s). Begin by determining which drive is throwing the errors. This usually be determined easily by a number of approaches such as:

1) Issuing the command dmesg | grep ata3 and looking for the hard drive make and model. (as ata3 is the port throwing the error in your situation. Adjust accordingly) this will provide output similar to this:

dmesg | grep ata3
[    4.756081] ata3: SATA max UDMA/133 abar m2048@0xf7f26000 port 0xf7f26200 irq 135
[    5.071981] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    5.077850] ata3.00: HPA detected: current 1953523055, native 1953525168
[    5.077959] ata3.00: ATA-8: SAMSUNG HD103SJ, 1AJ10001, max UDMA/133
[    5.077960] ata3.00: 1953523055 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    5.084057] ata3.00: configured for UDMA/133

A quick glance indicates that the drive connected to ata3 is the SAMSUNG HD103SJ

2) Issue the command below:

find -L /sys/bus/pci/devices/*/ata*/host*/target* -maxdepth 3 -name "sd*" 2>/dev/null | egrep block |egrep --colour '(ata[0-9]*)|(sd.*)'

This will provide both the ports and the device names highlighted on the same line as seen below:

output

It's easy to see that the device connected to ata3 has been assigned the device name sdb

3)install lsscsi with sudo apt install lsscsi and issue the command lsscsi

$ lsscsi
[0:0:0:0]    cd/dvd  ATAPI    iHAS124   F      CL9M  /dev/sr0 
[1:0:0:0]    disk    ATA      WDC WD2003FZEX-0 1A01  /dev/sda 
[2:0:0:0]    disk    ATA      SAMSUNG HD103SJ  0001  /dev/sdb 
[3:0:0:0]    disk    ATA      ST6000VN0033-2EE SC60  /dev/sdc 

Note that the first entry on each line above is the scsi_host, channel, target_number and LUN. It is placed in brackets and each element is colon separated. When there are multiple SCSI devices their entries are sorted in ascending order.

Simply adding 1 to the first number in each line of output gives you the ATA port. You can find more detail on lsscsi here and here.

Since in your case we are seeing errors thrown on both 3.00 and 3.01 you have more than one drive connected to the same ATA port. You are going toi want to carefully check connectivity to both ata3.00 and ata3.01 This could be a multi-bay drive enclosure connected to the same cable. Since both drives are throwing errors, replacing the cable to the aforementioned multi-drive bay should eliminate the problem for both drives. These devices usually have an external power source which also could be the culprit and need to be replaced, but the cable (being the weakest link) is by far the most likely root cause of the problem.

Sources:

Experience

https://linux.die.net/man/8/lsscsi

http://sg.danny.cz/scsi/lsscsi.html

https://serverfault.com/questions/244944/linux-ata-errors-translating-to-a-device-name/868943#868943

Share:
147,906

Related videos on Youtube

Marcos Junior
Author by

Marcos Junior

Updated on September 18, 2022

Comments

  • Marcos Junior
    Marcos Junior over 1 year

    I'm getting these errors ramdomly, and I don't know if its normal or not.

    [39441.061856] ata3.00: failed to read SCR 1 (Emask=0x40)
    [39441.061866] ata3.01: failed to read SCR 1 (Emask=0x40)
    [39441.061892] ata3.15: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
    [39441.061897] ata3.15: irq_stat 0x08000000, interface fatal error
    [39441.061904] ata3.15: SError: { UnrecovData 10B8B BadCRC }
    [39441.061910] ata3.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
    [39441.061917] ata3.01: exception Emask 0x100 SAct 0xe SErr 0x0 action 0x6 frozen
    [39441.061923] ata3.01: failed command: READ FPDMA QUEUED
    [39441.061933] ata3.01: cmd 60/a8:08:b0:48:62/00:00:00:00:00/40 tag 1 ncq 86016 in
    [39441.061940] ata3.01: status: { DRDY }
    [39441.061944] ata3.01: failed command: READ FPDMA QUEUED
    [39441.061953] ata3.01: cmd 60/a8:10:b0:49:62/00:00:00:00:00/40 tag 2 ncq 86016 in
    [39441.061959] ata3.01: status: { DRDY }
    [39441.061963] ata3.01: failed command: READ FPDMA QUEUED
    [39441.061972] ata3.01: cmd 60/58:18:58:4a:62/00:00:00:00:00/40 tag 3 ncq 45056 in
    [39441.061978] ata3.01: status: { DRDY }
    [39441.061987] ata3.15: hard resetting link
    [39441.608302] ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    [39441.609090] ata3.00: hard resetting link
    [39441.929246] ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
    [39441.929333] ata3.01: hard resetting link
    [39442.249184] ata3.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
    [39442.263242] ata3.00: configured for UDMA/133
    [39442.277570] ata3.01: configured for UDMA/133
    [39442.277725] ata3: EH complete
    

    I'm also pasting smartctl -a for sda, sdb and sdc.

    Thanks in advance for your help.

  • irrational John
    irrational John almost 12 years
    Out of curiosity, was that GPU a big, honkin' power hungry GPU?
  • Geppettvs D'Constanzo
    Geppettvs D'Constanzo almost 12 years
    nVidia Quadro 4000, not that hungry indeed.
  • irrational John
    irrational John almost 12 years
    Interesting. I have a 400w Antec (Neo-Eco) PSU, 5 hard drives, 2 optical drives, and an NVIDIA GeForce 9500 GT and I do not think I have had any power supply related problems. I do have drive CRC errors, but I think they are from stupid user errors I made a while back. (Bumping a cable & such.) I haven't noticed any warning logs in my kernel messages. Still, I guess I should keep a closer watch on it just to be safe.
  • Geppettvs D'Constanzo
    Geppettvs D'Constanzo almost 12 years
    1xIDE DVD-RW, 1xSATA DVD-RW and 1xSATA Blu-Ray ROM Optical Drives this side. 4 SATA and 1 IDE HDD, GPU is 142 Watts power consumption. I can't say I am absolute sure that it was about power source issues but when I added the new Power Source the problems are gone. BTW, my drives seems to be healthy. But thank you for making me see that. Your opinion is really appreciated in this side. Thank you!
  • irrational John
    irrational John almost 12 years
    Uh, 142 watts for a GPU is ... something. My entire system (usually) uses less than that. As I type this my desktop box is pulling ~117 watts. (According to the Kill-A-Watt I had forgotten I still have it plugged into. ;-)
  • Marcos Junior
    Marcos Junior almost 12 years
    sdb and sdc are both external hd's, connected on another power source.
  • Geppettvs D'Constanzo
    Geppettvs D'Constanzo almost 12 years
    You didn't mention that. But thank you for clarifying. Good luck!
  • psusi
    psusi over 11 years
    Disabling NCQ is a common workaround for buggy hardware. There does not appear to be a kernel bug.
  • Xen2050
    Xen2050 over 6 years
    You mean you changed the SATA port the hard drive was plugged into, right? Or do you mean replaced the entire hard drive with another? I think it's the former, but just double-checking
  • ultrajohn
    ultrajohn over 6 years
    It's the former.
  • Elder Geek
    Elder Geek over 4 years
    @GeppettvsD'Constanzo This is a good answer, but I would "clone your hard disk drive" after replacing the cables and insuring appropriate voltage, not before. As the link is going up and down, it will take longer and be more difficult to get an accurate clone while the root cause of the problem still exists. Cheers!
  • Elder Geek
    Elder Geek over 4 years
    After over 30 years of troubleshooting these things for a living, I can assure you that in my experience this is almost always a dodgy cable. And since they are cheap you try that first.
  • reukiodo
    reukiodo over 4 years
    Holy $#!+ that worked! All my error messages went away and my system stopped crashing! I entirely disagree with not a kernel bug, since I can use older kernel version (all the way back to at least 2.6 series) without any crashes. I can't believe I didn't find this sooner!
  • Samik R
    Samik R over 3 years
    Had to take the power adapter off and put it back again.
  • Marcelo Scofano Diniz
    Marcelo Scofano Diniz over 3 years
    Fine, precise and informative answer, should be up more....