Are these SATA errors dangerous?
Solution 1
While I essentially agree with Geppettvs D'Constanzo's answer, I would suggest that some of the first things you might also try are
Checking that your SATA cable is securely attached and plugged into the sockets on the motherboard and hard drive.
Replacing your SATA cable. SATA cables are (relatively) inexpensive and you do sometimes get a "bad" one. Often simply replacing the cable is the easiest way to diagnose and solve a problem like this.
(Although it is somewhat unexpected that two cables would both be bad at the same time. Still, it's an easy thing to check so in my opinion probably worth doing.)
I just saw you pastbins containing the SMART data for your drives. Notice the unexpectedly large number of CRC errors for drives sdb
and sdc
. I suggest you start by checking the cables and connections for those drives.
junior@mediacenter:/$ sudo smartctl -a /dev/sda
...
Model Family: SAMSUNG SpinPoint M7E (AFT)
Device Model: SAMSUNG HM321HI
...
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
junior@mediacenter:/$ sudo smartctl -a /dev/sdb
...
Model Family: SAMSUNG SpinPoint F4 EG (AFT)
Device Model: SAMSUNG HD204UI
...
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 57
junior@mediacenter:/$ sudo smartctl -a /dev/sdc
...
Model Family: SAMSUNG SpinPoint F4 EG (AFT)
Device Model: SAMSUNG HD204UI
...
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 398
OK. So not a latpop then. ;-)
Of course, if this is happening on a laptop than none of the above apply and I'm not sure what advice to offer. Maybe remove and re-install the hard drive? Perhaps it just needs to be re-seated in its socket to improve the connection?
sbd
and sdc
are connected on the same external e-sata cable (Thermaltake Duo HDD Dock). i'll replace my e-sata cable.
It could be due to a faulty or low quality cable. It could also be that the cable is somehow moved, bumped, or otherwise jostled while the drive is being used.
Solution 2
It looks like you have a bad quality/damaged SATA Power/Data Cable. Which may be causing Bad CRC's. They aren't harmful at all and you can live with them but you are going to lose a lot of data soon.
The SMART report of your hard disk drives looks sane, so I am for power supply issues based on my experience when setting 5 hard disk drives in the same case/power source. I finished using an external power source (475W) for 2 drives and the case's 600W for all the case including GPU, optical and hard disk drives.
Anyway, I suggest you to run a full backup before you do anything else. If possible, clone your hard disk drive, after which you should check your cables and power source voltages.
Solution 3
There seems to be a problem between some kernel versions ans some SATA controllers.
I have recently started to suffer a very similar problem (not sure if it is just the same) on a web server running Scientific Linux.
The most accurate and complete information I have found about such problem is this launchpad bug.
In short: Disabling NCQ seems to be the best workaround for users having this problem.
Solution 4
Had the same issue - in my case this was due to 4-pin to SATA power adapter not being plugged snugly.
Solution 5
This error is unlikely to damage your hard drive but is highly likely to corrupt your filesystem(s). Begin by determining which drive is throwing the errors. This usually be determined easily by a number of approaches such as:
1) Issuing the command dmesg | grep ata3
and looking for the hard drive make and model. (as ata3 is the port throwing the error in your situation. Adjust accordingly) this will provide output similar to this:
dmesg | grep ata3
[ 4.756081] ata3: SATA max UDMA/133 abar m2048@0xf7f26000 port 0xf7f26200 irq 135
[ 5.071981] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 5.077850] ata3.00: HPA detected: current 1953523055, native 1953525168
[ 5.077959] ata3.00: ATA-8: SAMSUNG HD103SJ, 1AJ10001, max UDMA/133
[ 5.077960] ata3.00: 1953523055 sectors, multi 16: LBA48 NCQ (depth 32), AA
[ 5.084057] ata3.00: configured for UDMA/133
A quick glance indicates that the drive connected to ata3 is the SAMSUNG HD103SJ
2) Issue the command below:
find -L /sys/bus/pci/devices/*/ata*/host*/target* -maxdepth 3 -name "sd*" 2>/dev/null | egrep block |egrep --colour '(ata[0-9]*)|(sd.*)'
This will provide both the ports and the device names highlighted on the same line as seen below:
It's easy to see that the device connected to ata3 has been assigned the device name sdb
3)install lsscsi with sudo apt install lsscsi
and issue the command lsscsi
$ lsscsi
[0:0:0:0] cd/dvd ATAPI iHAS124 F CL9M /dev/sr0
[1:0:0:0] disk ATA WDC WD2003FZEX-0 1A01 /dev/sda
[2:0:0:0] disk ATA SAMSUNG HD103SJ 0001 /dev/sdb
[3:0:0:0] disk ATA ST6000VN0033-2EE SC60 /dev/sdc
Note that the first entry on each line above is the scsi_host, channel, target_number and LUN. It is placed in brackets and each element is colon separated. When there are multiple SCSI devices their entries are sorted in ascending order.
Simply adding 1 to the first number in each line of output gives you the ATA port. You can find more detail on lsscsi
here and here.
Since in your case we are seeing errors thrown on both 3.00 and 3.01 you have more than one drive connected to the same ATA port. You are going toi want to carefully check connectivity to both ata3.00 and ata3.01 This could be a multi-bay drive enclosure connected to the same cable. Since both drives are throwing errors, replacing the cable to the aforementioned multi-drive bay should eliminate the problem for both drives. These devices usually have an external power source which also could be the culprit and need to be replaced, but the cable (being the weakest link) is by far the most likely root cause of the problem.
Sources:
Experience
https://linux.die.net/man/8/lsscsi
http://sg.danny.cz/scsi/lsscsi.html
https://serverfault.com/questions/244944/linux-ata-errors-translating-to-a-device-name/868943#868943
Related videos on Youtube
Marcos Junior
Updated on September 18, 2022Comments
-
Marcos Junior over 1 year
I'm getting these errors ramdomly, and I don't know if its normal or not.
[39441.061856] ata3.00: failed to read SCR 1 (Emask=0x40) [39441.061866] ata3.01: failed to read SCR 1 (Emask=0x40) [39441.061892] ata3.15: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen [39441.061897] ata3.15: irq_stat 0x08000000, interface fatal error [39441.061904] ata3.15: SError: { UnrecovData 10B8B BadCRC } [39441.061910] ata3.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen [39441.061917] ata3.01: exception Emask 0x100 SAct 0xe SErr 0x0 action 0x6 frozen [39441.061923] ata3.01: failed command: READ FPDMA QUEUED [39441.061933] ata3.01: cmd 60/a8:08:b0:48:62/00:00:00:00:00/40 tag 1 ncq 86016 in [39441.061940] ata3.01: status: { DRDY } [39441.061944] ata3.01: failed command: READ FPDMA QUEUED [39441.061953] ata3.01: cmd 60/a8:10:b0:49:62/00:00:00:00:00/40 tag 2 ncq 86016 in [39441.061959] ata3.01: status: { DRDY } [39441.061963] ata3.01: failed command: READ FPDMA QUEUED [39441.061972] ata3.01: cmd 60/58:18:58:4a:62/00:00:00:00:00/40 tag 3 ncq 45056 in [39441.061978] ata3.01: status: { DRDY } [39441.061987] ata3.15: hard resetting link [39441.608302] ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [39441.609090] ata3.00: hard resetting link [39441.929246] ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [39441.929333] ata3.01: hard resetting link [39442.249184] ata3.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [39442.263242] ata3.00: configured for UDMA/133 [39442.277570] ata3.01: configured for UDMA/133 [39442.277725] ata3: EH complete
I'm also pasting
smartctl -a
for sda, sdb and sdc.Thanks in advance for your help.
-
irrational John almost 12 yearsOut of curiosity, was that GPU a big, honkin' power hungry GPU?
-
Geppettvs D'Constanzo almost 12 yearsnVidia Quadro 4000, not that hungry indeed.
-
irrational John almost 12 yearsInteresting. I have a 400w Antec (Neo-Eco) PSU, 5 hard drives, 2 optical drives, and an NVIDIA GeForce 9500 GT and I do not think I have had any power supply related problems. I do have drive CRC errors, but I think they are from stupid user errors I made a while back. (Bumping a cable & such.) I haven't noticed any warning logs in my kernel messages. Still, I guess I should keep a closer watch on it just to be safe.
-
Geppettvs D'Constanzo almost 12 years1xIDE DVD-RW, 1xSATA DVD-RW and 1xSATA Blu-Ray ROM Optical Drives this side. 4 SATA and 1 IDE HDD, GPU is 142 Watts power consumption. I can't say I am absolute sure that it was about power source issues but when I added the new Power Source the problems are gone. BTW, my drives seems to be healthy. But thank you for making me see that. Your opinion is really appreciated in this side. Thank you!
-
irrational John almost 12 yearsUh, 142 watts for a GPU is ... something. My entire system (usually) uses less than that. As I type this my desktop box is pulling ~117 watts. (According to the Kill-A-Watt I had forgotten I still have it plugged into. ;-)
-
Marcos Junior almost 12 years
sdb
andsdc
are both external hd's, connected on another power source. -
Geppettvs D'Constanzo almost 12 yearsYou didn't mention that. But thank you for clarifying. Good luck!
-
psusi over 11 yearsDisabling NCQ is a common workaround for buggy hardware. There does not appear to be a kernel bug.
-
Xen2050 over 6 yearsYou mean you changed the SATA port the hard drive was plugged into, right? Or do you mean replaced the entire hard drive with another? I think it's the former, but just double-checking
-
ultrajohn over 6 yearsIt's the former.
-
Elder Geek over 4 years@GeppettvsD'Constanzo This is a good answer, but I would "clone your hard disk drive" after replacing the cables and insuring appropriate voltage, not before. As the link is going up and down, it will take longer and be more difficult to get an accurate clone while the root cause of the problem still exists. Cheers!
-
Elder Geek over 4 yearsAfter over 30 years of troubleshooting these things for a living, I can assure you that in my experience this is almost always a dodgy cable. And since they are cheap you try that first.
-
reukiodo over 4 yearsHoly $#!+ that worked! All my error messages went away and my system stopped crashing! I entirely disagree with not a kernel bug, since I can use older kernel version (all the way back to at least 2.6 series) without any crashes. I can't believe I didn't find this sooner!
-
Samik R over 3 yearsHad to take the power adapter off and put it back again.
-
Marcelo Scofano Diniz over 3 yearsFine, precise and informative answer, should be up more....