What happens if a file is modified while you're copying it?

cp file-copy

9,879

Solution 1

If fileA.big is grown during the copy, the copy will include the data that was appended.

If the file is truncated shorter than where the copy is currently at, the copy will abort right where its at and the destination file will contain what was copied up to the time it aborted.

Solution 2

Patrick has it more or less correct, but here's why. The way you copy a file under UNIX works like this:

Try to read some (more) bytes from fileA.
If we failed to get bytes because we're at (or past) the end of the file, we're done; quit.
Otherwise, write the bytes to fileB and loop back to step 1.

Knowing that, and knowing it's as simple as that, lets us see some corner cases.

As soon as we find the end of the file, the copy is done. So let's say our file is growing during the copy, but is growing more slowly than we're copying it. The copy program will keep going past the original file size, because by the time it gets there, there is more to the file. But at some point, it catches up with the end of the file, and it knows it's at the end because it can't read any more bytes right now. So it quits right there, even if the file is about to grow further.

If the file is truncated, the copy program says "Whoa, I'm past the end of the file!" and quits.

And if pieces of the file are being updated at random by, say, a database program :-), then your copy is going to be some mix of old and new data, because the data is not all copied at the same time. The result will probably be a corrupt copy, which is why it's not generally a good idea to make copies of live databases.

(That said, I'm not familiar with CouchDB, and it's possible to design a database to be resistant to this sort of corruption. But best to be absolutely sure.)

9,879

Mâtt Frëëman

Updated on September 17, 2022

Comments

Mâtt Frëëman almost 2 years

What is the effect of copying a file say fileA.big (900mb) from location B to location C, if during that cp operation, say 35% through the process, fileA.big is appended with new information and grows from 900MB to 930MB?

What is the result of the end copy (i.e. fileA.big at location C)?

What if the copy is about 70% through, and the original file is updated but this time truncated to 400MB (i.e. the progress of the copy is beyond the truncation point), what is the result of the end copy?

Referring to a Linux OS on an ext3/ext4 filesystem. No volume shadow magic etc. Just plain old cp. Curiosity sparked by copying live CouchDB files for backup, but more interested in general scenarios rather than specific use case.
- Admin over 13 years
  
  Thanks for asking this one. My 'knowledge' was mostly a guess... until now.
syntaxerror over 10 years

Good explanation. BTW, this has always surprised me why this is possible under UNIX-like OSes without getting the typical error message known from Windows ("Can't access file - file in use") You could not even play an MP3 file that was already deleted while playing it. Under Unix, you can (surprisingly) - with no problems at all. I guess UNIX-based OSes always work with backup copies of the files, so this is feasible.
Jander over 10 years

Actually, being able to read a deleted file comes from a different UNIX feature: under UNIX, files and filenames are different things. When you delete a file, what you are really doing is deleting a named "link" to the file. When a program opens a file, that also counts as a link. The system will delete the file itself only when it has no links left.
Bladt almost 9 years

So if the file grows faster than we can copy it, cp will never terminate? I realize that's unlikely, as whatever writes to the file would have to be able to write to the file, faster than cp can read from it.