Making SATA disk write cache safe

cache sata data-integrity linux raid

6,079

Solution 1

Upgrading to kernel 2.6.38-2-amd64 (from sid) fixes the problem, at the cost of a huge performance penalty (very similar to just turning off the write caches).

Doing some research into this, it seems that MD didn't support I/O barriers (except on RAID1) until 2.6.33-rc1 (commit a2826aa92e2e14db372eda01d333267258944033).

Solution 2

Yeah for what i know this is the cost to be safe, you can see many threads about data safety and the speed cost in every one filesystem and storage layer in the Postgresql mailing list, they have been speaking lately of SSD safety for example, only the Vertex 2 Pro or the last SSD intel series that have a small memory attached (like a battery cache in a raid controller) are safe to database use and the problem with SSD can't be fixed disabling write cache.

I paste here two links but you have multiple examples in the mailing list, do a search.

http://archives.postgresql.org/pgsql-performance/2010-06/msg00076.php

http://archives.postgresql.org/pgsql-general/2011-04/msg00709.php

Solution 3

That's why you really should be using an hardware RAID controller with a BBU (battery backup unit). Then you can both have your write cache on and be safe.

6,079

derobert

Updated on September 18, 2022

Comments

derobert almost 2 years
Supposedly (see, e.g., a question about it here), with NCQ enabled drives, the drive write cache is supposed to be safe, as in it doesn't lie to the OS about data being committed to the platters when it isn't. I'm trying to figure out what settings are required to make this a reality.

I'm using diskchecker.pl to confirm if all blocks surviving a pull of the power plug. The server is configured like this:
- 4x ST3500514NS running in Linux MD RAID10. Intel 3420 chipset. In AHCI mode.
- LVM running on RAID10.
- Tested filesystem is ext4 (with barrier=1,data=ordered) on a logical volume. I also tried testing directly on a logical volume (block device); that didn't help.
- Debian 6.0 (squeeze); kernel 2.6.32-5-amd64
If I turn off write-cache (hdparm -W0), then it works (at a huge performance penalty). So it seems like the upper layers are capable.

I've tried enabling FUA in libata (by passing fua=1 to the module loading, and confirming via dmesg), that did not help.

Any suggestions on how to make this work?

edit: found the reason (see my answer); any suggestions on how to get at least some of the performance back?
derobert about 13 years

Yeah. I have a 3ware one on a different box. The data is indeed safe, in the sense that had Sony bought some of those, PSN data would be safe from crackers. Every time a single drive timed out, it'd discard cache on all drives, leading to massive corruption. I turned off the cache.
skuda21 about 13 years

I know the disks you are speaking about are not SSDs, but i wanted to point you one example of the interesting threads about data safety in the Postresql mailing list, they have been discussing many times about mechanical disks, storage layers safety, barriers, write cache, battery backed unit in hardware raid and the speed cost to be safe.
derobert about 13 years

Ummm, these are Seagate Constellation ES drives. The other server (the one with the 3ware card) has VeliciRaptor drives. No cheap desktop drives in sight. Though, honestly, I've got other machines with cheap desktop drives, they've proved only a little less reliable.
derobert about 13 years

Quite possible. Soon I hope to pull that machine, and do extensive testing on it.
RichVel over 12 years

There is more about write barriers, write caching, etc in this answer about LVM risks: serverfault.com/questions/279571/lvm-dangers-and-caveats/279‌577