How does "cp" handle open files?

21,014

Solution 1

cp does not know about opened files. So if first user uploads big file and cronjob (or any other process) starts copying this file, it will only copy as much as was already written. You can think about this in this way - cp makes copy of what is currently on the disk, no matter if the file is complete. Otherwise, you could not copy log files for example.

Solution 2

cp doesn't know what other programs may have the files open. There's no magic in cp. The design of unix purposefully avoids putting any kind of locks on files unless there's a compelling reason (compelling meaning the kernel needs it). On this topic, see Does redirecting output to a file apply a lock on the file?

Such situations, where a file is produced by a producer and, once complete, consumed by a consumer, are common. The usual way to handle this is to have the producer write a temporary file that the consumer will not look for, then once the producer is finished move the file into a place where the consumer will find it. Moving a file (on the same filesystem) is an atomic operation: at some point, for the consumer, the file changes from not being there to being there.

So arrange for your upload job to move the files to a different directory when it's finished doing the upload. Point the cron job at this different directory.

Solution 3

It seems like you want to do a dir sync job.

Because the -u, --update option of cp

copy only when the SOURCE file is newer than the destination file or when the destination file is missing

So you can add a cronjob such as cp -auv SOURCEDIR/* DESTDIR which will copy those files whose modification time have changed. That means DESTDIR will eventually get the complete copy while the uploading has finished.

rsync can do the same job. e.g., rsync -av SOURCEDIR/ DESTDIR.

Although -a option is applied, some specified attributes(e.g., ownership) can only be preserved by super-user.

See man cp, man rsync for details.

Share:
21,014

Related videos on Youtube

Stuffy
Author by

Stuffy

Updated on September 18, 2022

Comments

  • Stuffy
    Stuffy almost 2 years

    I'm having two separate directories. The user loads a file into the first. Theres a cronjob running in the background which copies the files every 5 minutes over to the second directory.

    What happens if the user has not completed his upload and the cronjob copies the files? Note that the two directories are owned by different users, the cronjob is performed as root.

    • Serge
      Serge over 11 years
      please read this post to see what happens in such case: unix.stackexchange.com/questions/49299/…
    • Stuffy
      Stuffy over 11 years
      Thanks, good post you wrote. But my question was more cp-related, not to linux-file-handling in general. I though maybe cp checks if the file is still open and waits till its closed or something.
    • Serge
      Serge over 11 years
      No. cp will not wait until the file is completely uploaded. As we expect that the network transfer rate is lower than just copying the file from one location to another inside the same host then at some point cp will reach the current end-of-file and will stop copying. The solution to your problem may by simple: first the user uploads the file with some specially mangled file name (for example prepended with . (dot character). When the transfer is done user renames it to the original name. Then the cron job looks only for the files that are not starting with ..
  • Stuffy
    Stuffy over 11 years
    Thanks, thats what I wanted to know! Is there a simple way to avoid that? I checked the cp man page but found nothing of use.
  • Krzysztof Adamski
    Krzysztof Adamski over 11 years
    To do what exactly? To copy all the files except open ones? I don't think there is any easy way of doing this (other than writin your own script that uses fuser+cp. Such copy would really be very unreliable. It won't copy any file that is opened in text editor for example.
  • Richard Fortune
    Richard Fortune over 11 years
    Just beware of relying on recent entries in the destination folder---they may not be complete files.
  • Wojtek
    Wojtek over 11 years
    @Stuffy, maybe in your cronjob you could list open files with lsof? The output of that is meant to be easy to process. You could filter the files being opened (say, by an instance of cp) for writing.
  • Stuffy
    Stuffy over 11 years
    @WojtekRzepala, I'll have a look at this, thanks. Maybe I'll write a small script which gets executed by the cronjob
  • Krzysztof Adamski
    Krzysztof Adamski over 11 years
    @Stuffy: Keep in mind that it may not be really reliable if it's not run by root user (the same problem is with fuser of course) as this tool may not show all the files.
  • JohnyTex
    JohnyTex over 2 years
    What if I want to overwrite the file that is being used?