How does "cp" handle open files?
Solution 1
cp
does not know about opened files. So if first user uploads big file and cronjob (or any other process) starts copying this file, it will only copy as much as was already written. You can think about this in this way - cp
makes copy of what is currently on the disk, no matter if the file is complete. Otherwise, you could not copy log files for example.
Solution 2
cp
doesn't know what other programs may have the files open. There's no magic in cp
. The design of unix purposefully avoids putting any kind of locks on files unless there's a compelling reason (compelling meaning the kernel needs it). On this topic, see Does redirecting output to a file apply a lock on the file?
Such situations, where a file is produced by a producer and, once complete, consumed by a consumer, are common. The usual way to handle this is to have the producer write a temporary file that the consumer will not look for, then once the producer is finished move the file into a place where the consumer will find it. Moving a file (on the same filesystem) is an atomic operation: at some point, for the consumer, the file changes from not being there to being there.
So arrange for your upload job to move the files to a different directory when it's finished doing the upload. Point the cron job at this different directory.
Solution 3
It seems like you want to do a dir sync job.
Because the -u, --update option of cp
copy only when the SOURCE file is newer than the destination file or when the destination file is missing
So you can add a cronjob such as cp -auv SOURCEDIR/* DESTDIR
which will copy those files whose modification time have changed. That means DESTDIR
will eventually get the complete copy while the uploading has finished.
rsync
can do the same job. e.g., rsync -av SOURCEDIR/ DESTDIR
.
Although -a option is applied, some specified attributes(e.g., ownership) can only be preserved by super-user.
See man cp
, man rsync
for details.
Related videos on Youtube
Stuffy
Updated on September 18, 2022Comments
-
Stuffy almost 2 years
I'm having two separate directories. The user loads a file into the first. Theres a cronjob running in the background which copies the files every 5 minutes over to the second directory.
What happens if the user has not completed his upload and the cronjob copies the files? Note that the two directories are owned by different users, the cronjob is performed as root.
-
Serge over 11 yearsplease read this post to see what happens in such case: unix.stackexchange.com/questions/49299/…
-
Stuffy over 11 yearsThanks, good post you wrote. But my question was more cp-related, not to linux-file-handling in general. I though maybe cp checks if the file is still open and waits till its closed or something.
-
Serge over 11 yearsNo.
cp
will not wait until the file is completely uploaded. As we expect that the network transfer rate is lower than just copying the file from one location to another inside the same host then at some pointcp
will reach the current end-of-file and will stop copying. The solution to your problem may by simple: first the user uploads the file with some specially mangled file name (for example prepended with.
(dot character). When the transfer is done user renames it to the original name. Then the cron job looks only for the files that are not starting with.
.
-
-
Stuffy over 11 yearsThanks, thats what I wanted to know! Is there a simple way to avoid that? I checked the cp man page but found nothing of use.
-
Krzysztof Adamski over 11 yearsTo do what exactly? To copy all the files except open ones? I don't think there is any easy way of doing this (other than writin your own script that uses
fuser
+cp
. Such copy would really be very unreliable. It won't copy any file that is opened in text editor for example. -
Richard Fortune over 11 yearsJust beware of relying on recent entries in the destination folder---they may not be complete files.
-
Wojtek over 11 years@Stuffy, maybe in your cronjob you could list open files with
lsof
? The output of that is meant to be easy to process. You could filter the files being opened (say, by an instance ofcp
) for writing. -
Stuffy over 11 years@WojtekRzepala, I'll have a look at this, thanks. Maybe I'll write a small script which gets executed by the cronjob
-
Krzysztof Adamski over 11 years@Stuffy: Keep in mind that it may not be really reliable if it's not run by root user (the same problem is with
fuser
of course) as this tool may not show all the files. -
JohnyTex over 2 yearsWhat if I want to overwrite the file that is being used?