Unix Shell scripting for copying files and creating directory

13,258

Solution 1

This is remarkably similar to a (closed) question: Bash scripting copying files without overwriting. The answer I gave cites the 'find | cpio' solution mentioned in other answers (minus the time criteria, but that's the difference between 'similar' and 'same'), and also outlines a solution using GNU 'tar'.

ctime

When I tested on Solaris, neither GNU tar nor (Solaris) cpio was able to preserve the ctime setting; indeed, I'm not sure that there is any way to do that. For example, the touch command can set the atime or the mtime or both - but not the ctime. The utime() system call also only takes the mtime or atime values; it does not handle ctime. So, I believe that if you find a solution that preserves ctime, that solution is likely to be platform-specific. (Weird example: hack the disk device and edit the data in the inode - not portable, requires elevated privileges.) Rereading the question, though, I see that 'preserving ctime' is not part of the requirements (phew); it is simply the criterion for whether the file is copied or not.

chdir

I think that the 'cd' operations are necessary - but they can be wholly localized to the script or command line, though, as illustrated in the question cited and the command lines below, the second of which assumes GNU tar.

(cd /my; find source/directory -ctime -2 | cpio -pvdm /my/dest/directory)

(cd /my; find source/directory -ctime -2 | tar -cf - -F - ) |
    (cd /my/dest/directory; tar -xf -)

Without using chdir() (aka cd), you need specialized tools or options to handle the manipulation of the pathnames on the fly.

Names with blanks, newlines, etc

The GNU-specific 'find -print0' and 'xargs -0' are very powerful and effective, as noted by Adam Hawes. Funnily enough, GNU cpio has an option to handle the output from 'find -print0', and that is '--null' or its short form '-0'. So, using GNU find and GNU cpio, the safe command is:

(cd /my; find source/directory -ctime -2 -print0 |
    cpio -pvdm0 /my/dest/directory)

Note:This does not overwrite pre-existing files under the backup directory. Add -u to the cpio command for that.

Similarly, GNU tar supports --null (apparently with no -0 short-form), and could also be used:

(cd /my; find source/directory -ctime -2 -print0 | tar -cf - -F - --null ) |
    (cd /my/dest/directory; tar -xf -)

The GNU handling of file names with the null terminator is extremely clever and a valuable innovation (though I only became aware of it fairly recently, courtesy of SO; it has been in GNU tar for at least a decade).

Solution 2

You could try cpio using the copy-pass mode, -p. I usually use it with overwrite all (-u), create directories (-d), and maintain modification time (-m).

find myfiles | cpio -pmud target-dir

Keep in mind that find should produce relative path names, which doesn't fit your absolute path criteria. This cold be of course be 'solved' using cd, which you also don't like (why not?)

(cd mypath; find myfiles | cpio ... )

The brackets will spawn a subshell, and will keep the state-change (i.e. the directory switch) local. You could also define a small procedure to abstract away the 'uglyness'.

Solution 3

IF you're using find always use -print0 and pipe the output through xargs -0; well almost always. The first file with a space in its name will bork the script if you use the default newline terminator output of find.

I agree with all the other posters - use cpio or tar if you can. It'll do what you want and save the hassle.

Share:
13,258
Nick Fortescue
Author by

Nick Fortescue

Software developer

Updated on June 05, 2022

Comments

  • Nick Fortescue
    Nick Fortescue almost 2 years

    I have a source directory eg /my/source/directory/ and a destination directory eg /my/dest/directory/, which I want to mirror with some constraints.

    • I want to copy files which meet certain criteria of the find command, eg -ctime -2 (less than 2 days old) to the dest directory to mirror it
    • I want to include some of the prefix so I know where it came from, eg /source/directory
    • I'd like to do all this with absolute paths so it doesn't depend which directory I run from
    • I'd guess not having cd commands is good practice too.
    • I want the subdirectories created if they don't exist

    So

    /my/source/directory/1/foo.txt -> /my/dest/directory/source/directory/1/foo.txt
    /my/source/directory/2/3/bar.txt -> /my/dest/directory/source/directory/2/3/bar.txt
    

    I've hacked together the following command line but it seems a bit ugly, can anyone do better?

    find /my/source/directory -ctime -2 -type f -printf "%P\n" | xargs -IFILE rsync -avR /my/./source/directory/FILE /my/dest/directory/
    

    Please comment if you think I should add this command line as an answer myself, I didn't want to be greedy for reputation.

  • Nick Fortescue
    Nick Fortescue about 15 years
    Nice answer, I hadn't thought of tar.
  • Nick Fortescue
    Nick Fortescue about 15 years
    actually, this doesn't manage the conditions and base on filetime
  • falstro
    falstro about 15 years
    that's why I put it in brackets, that'll spawn a subshell and so won't have side effects outside the brackets. I'll add that...
  • falstro
    falstro about 15 years
    I think mkdir -p will succeed if the directory already exists, so the if is bit redundant. You could use it in conjunction with an existence tests to make sure it's not a file though.
  • Jonathan Leffler
    Jonathan Leffler about 15 years
    @vatine: it was not, fortunately, preserving ctime but simply selecting based on ctime.
  • jfs
    jfs about 15 years
    The OP said "mirror it" so cpio requires -u. I don't remember times when GNU utils didn't support -0 option. Such command-line tools are like unsafe razor. It is easy to shoot yourself in the foot. Nice Answer!
  • Dave C
    Dave C about 15 years
    Any white space or special characters in the file or directory names will bork this up. In scripts always quote such variables; e.g. mkdir -p "$SUBDST". Better to just use find and cpio as others have suggested instead of recreating a non-round wheel.
  • jandersson
    jandersson about 15 years
    quoting will not help at all, since it's all about the delimiter of the for loop (IFS). I agree in this case it's better to use cpio, but what if you want to implement custom logic based on a file name, and not just copy files as cpio does?