faster alternative to cp -a

31,786

Solution 1

Try tar, pax, cpio, with something buffering.

(cd /home && bsdtar cf - .) |
  pv -trab -B 500M |
  (cd /dest && bsdtar xpSf -)

I suggest bsdtar instead of tar because at least on some Linux distributions tar is GNU tar which contrary to bsdtar (from libarchive) doesn't handle preserving extended attributes or ACLs or linux attributes.

pv will buffer up to 500M of data so can better accommodate fluctuations in reading and writing speeds on the two file systems (though in reality, you'll probably have a disk slower that the other and the OS' write back mechanism will do that buffering as well so it will probably not make much difference). Older versions of pv don't support -a (for average speed reporting), you can use pv -B 200M alone there.

In any case, those will not have the limitation of cp, that does the reads and the writes sequentially. Here we've got two tar working concurrently, so one can read one FS while the other one is busy waiting for the other FS to finish writing.

For ext4 and if you're copying onto a partition that is at least as large as the source, see also clone2fs which works like ntfsclone, that is copies the allocated blocks only and sequentially, so on rotational storage is probably going to be the most efficient.

partclone generalises that to a few different file systems.

Now a few things to take into consideration when cloning a file system.

Cloning would be copying all the directories, files and their contents... and everything else. Now the everything else varies from file system to file systems. Even if we only consider the common features of traditional Unix file systems, we have to consider:

  • links: symbolic links and hard links. Sometimes, we'll have to consider what to do with absolute symlinks or symlinks that point out of the file system/directory to clone
  • last modification, access and change times: only the first two can be copied using filesystem API (cp, tar, rsync...)
  • sparseness: you've got that 2TB sparse file which is a VM disk image that only takes 3GB of disk space, the rest being sparse, doing a naive copy would fill up the destination drive.

Then if you consider ext4 and most Linux file systems, you'll have to consider:

  • ACLs and other extended attributes (like the ones used for SELinux)
  • Linux attributes like immutable or append-only flags

Not all tools support all of those, or when they do, you have to enable it explicitly like the --sparse, --acls... options of rsync, tar... And when copying onto a different filesystems, you have to consider the case where they don't support the same feature set.

You may also have to consider attributes of the file system themselves like the UUID, the reserved space for root, the fsck frequency, the journalling behavior, format of directories...

Then there are more complex file systems, where you can't really copy the data by copying files. Consider for example zfs or btrfs when you can take snapshots of subvolumes and branch them off... Those would have their own dedicated tools to copy data.

The byte to byte copy of the block device (or at least of the allocated blocks when possible) is often the safest if you want to make sure that you copy everything. But beware of the UUID clash problem, and that implies you're copying onto something larger (though you could resize a snapshot copy of the source before copying).

Solution 2

I recommend rsync, for example:

rsync -av --progress --stats dest orig

Or, to transfer with compression:

rsync -avz --progress --stats dest orig
Share:
31,786

Related videos on Youtube

Yurij73
Author by

Yurij73

Updated on September 18, 2022

Comments

  • Yurij73
    Yurij73 almost 2 years

    For a simple transfer of /home to another disk i use cp -a that seems to me an extremely slow way. Should like know a more efficient way to complete the task. I have /home mounted as logical volume, but the target disk is not an LVM system

    • daisy
      daisy over 11 years
      If cp is slow, other methods would be slow too. Unless it's not file-oriented copying
  • vonbrand
    vonbrand over 11 years
    GNU tar has the --acls option, to store the ACLs into the archive. And I'd be surprised if an alien (of sorts) tool like bsdtar handles it better than the (essentially) native one...
  • Stéphane Chazelas
    Stéphane Chazelas over 11 years
    @vonbrand. Your tar must have been patched for that (I think RedHat has a patch for GNU tar for ACLs), because the latest version of GNU tar doesn't support such an option. There exist a number of implementations of tar for Linux (star, bsdtar, tar), I'm not aware that GNU tar is any better than the others. The choice for GNU tools is generally more political than technical (see for instance bash).
  • vonbrand
    vonbrand over 11 years
    Using GNU tools might be a political choice, but it is the default choice nevertheless. And as they are much more popular than the alternatives, there also is more developer (and other) manpower behind them.
  • Yurij73
    Yurij73 over 11 years
    thanks, the next time i will use pv and tar rather then cp
  • Victor Aurélio
    Victor Aurélio over 11 years
    Thanks for this information :) but i never compared these two...
  • user2948306
    user2948306 over 11 years
  • Totor
    Totor over 11 years
    rsync is mostly efficient is you already partially have the source data available on the destination volume because it will only transfer missing/changed data. I would not use it for a fast "first copy".
  • Ploni
    Ploni almost 6 years
    @StéphaneChazelas Currently GNU tar does support --acls