How do I save changed files?

5,723

Solution 1

With rsync

What you're doing is essentially an incremental backup: your friend (your backup) already has the original files, and you want to make an archive containing the files you've changed from that original.

Rsync has features for incremental backups.

cd ORIGINAL_AND_MY_CHANGED
rsync -a -c --compare-dest=../ORIGINAL . ../CHANGES_ONLY
  • -a means to preserve all attributes (times, ownership, etc.).
  • -c means to compare file contents and not rely on date and size.
  • --compare-dest=/some/directory means that files which are identical under that directory and the source tree are not copied. Note that the path is relative to the destination directory.

Rsync copies all directories, even if no files end up there. To get rid of these empty directories, run find -depth CHANGES_ONLY -type d -empty -delete (or if your find doesn't have -delete and -empty, run find -depth CHANGES_ONLY -exec rmdir {} + 2>/dev/null).

Then make the archive from the CHANGES_ONLY directory.

The pedestrian way

Traverse the directory with your file. Skip files that are identical with the original. Create directories in the target as necessary. Copy changed files.

cd ORIGINAL_AND_MY_CHANGES
find . \! -type d -exec sh -c '
  for x; do
    if cmp -s "$x" "../ORIGINAL/$x"; then continue; fi
    [ -d "../CHANGES_ONLY/$x" ] || mkdir -p "../CHANGES_ONLY/${%/*}"
    cp -p "$x" "../CHANGES_ONLY/$x"
  done
' {} +

Solution 2

The command

rsync --only-write-batch=FILE $other_options ORIGINAL_AND_MY_CHANGES/ ORIGINAL/

would produce a batch FILE containing the changes required (without modifying anything).

The patch could be applied on another site, where you take the batch FILE, with

rsync --read-batch=FILE ORIGINAL/
Share:
5,723
Dmitry
Author by

Dmitry

Updated on September 18, 2022

Comments

  • Dmitry
    Dmitry over 1 year

    I have two folders:

    ORIGINAL/
    ORIGINAL_AND_MY_CHANGES/
    

    My friend has a copy of ORIGINAL/. I would like to generate MY_CHANGES.tgz -- it should contain only new/changed files from ORIGINAL_AND_MY_CHANGES/ comparing to ORIGINAL/. So my friend can unpack it into his copy of ORIGINAL/ and get ORIGINAL_AND_MY_CHANGES/.

    How can I do this?

    P.S. I tried diff but it can't save binary data and rsync --link-dest -- it generates hard links which are useless in the archive.

    P.P.S. In my case modification time can't be used to decide which file was changed.

  • Dmitry
    Dmitry over 12 years
    It's even better solution than enzotib's because I can put MY_CHANGES in source control and update/track these changes (if I update rsync's batch file under source control it'll be impossible to see what files were changed)
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 12 years
    @Dmitry If you're using source control, why not put import/track ORIGINAL and make ORIGINAL_AND_MY_CHANGES a branch? Then find out CHANGES with an scm command.
  • Dmitry
    Dmitry over 12 years
    In my case ORIGINAL it's Android platform sources (3GB, 126000 files). Even running rsync takes ~15-20 minutes. I think that adding all this stuff under source control will take too much space and time.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 12 years
    @Dmitry That settles it then. If it's Android sources, use repo and git. Work on your own branch. It's hard enough managing those with version control, I shudder to think what it may be like without it. Fortunately git is very good at managing local branches.