faster alternative to cp -a
Solution 1
Try tar
, pax
, cpio
, with something buffering.
(cd /home && bsdtar cf - .) |
pv -trab -B 500M |
(cd /dest && bsdtar xpSf -)
I suggest bsdtar
instead of tar
because at least on some Linux distributions tar
is GNU tar which contrary to bsdtar
(from libarchive
) doesn't handle preserving extended attributes or ACLs or linux attributes.
pv
will buffer up to 500M of data so can better accommodate fluctuations in reading and writing speeds on the two file systems (though in reality, you'll probably have a disk slower that the other and the OS' write back mechanism will do that buffering as well so it will probably not make much difference). Older versions of pv
don't support -a
(for average speed reporting), you can use pv -B 200M
alone there.
In any case, those will not have the limitation of cp
, that does the reads and the writes sequentially. Here we've got two tar
working concurrently, so one can read one FS while the other one is busy waiting for the other FS to finish writing.
For ext4 and if you're copying onto a partition that is at least as large as the source, see also clone2fs
which works like ntfsclone
, that is copies the allocated blocks only and sequentially, so on rotational storage is probably going to be the most efficient.
partclone generalises that to a few different file systems.
Now a few things to take into consideration when cloning a file system.
Cloning would be copying all the directories, files and their contents... and everything else. Now the everything else varies from file system to file systems. Even if we only consider the common features of traditional Unix file systems, we have to consider:
- links: symbolic links and hard links. Sometimes, we'll have to consider what to do with absolute symlinks or symlinks that point out of the file system/directory to clone
- last modification, access and change times: only the first two can be copied using filesystem API (cp, tar, rsync...)
- sparseness: you've got that 2TB sparse file which is a VM disk image that only takes 3GB of disk space, the rest being sparse, doing a naive copy would fill up the destination drive.
Then if you consider ext4
and most Linux file systems, you'll have to consider:
- ACLs and other extended attributes (like the ones used for
SELinux
) - Linux attributes like immutable or append-only flags
Not all tools support all of those, or when they do, you have to enable it explicitly like the --sparse
, --acls
... options of rsync
, tar
... And when copying onto a different filesystems, you have to consider the case where they don't support the same feature set.
You may also have to consider attributes of the file system themselves like the UUID, the reserved space for root, the fsck frequency, the journalling behavior, format of directories...
Then there are more complex file systems, where you can't really copy the data by copying files. Consider for example zfs
or btrfs
when you can take snapshots of subvolumes and branch them off... Those would have their own dedicated tools to copy data.
The byte to byte copy of the block device (or at least of the allocated blocks when possible) is often the safest if you want to make sure that you copy everything. But beware of the UUID clash problem, and that implies you're copying onto something larger (though you could resize a snapshot copy of the source before copying).
Solution 2
I recommend rsync, for example:
rsync -av --progress --stats dest orig
Or, to transfer with compression:
rsync -avz --progress --stats dest orig
Related videos on Youtube
Yurij73
Updated on September 18, 2022Comments
-
Yurij73 almost 2 years
For a simple transfer of /home to another disk i use
cp -a
that seems to me an extremely slow way. Should like know a more efficient way to complete the task. I have /home mounted as logical volume, but the target disk is not an LVM system-
daisy over 11 yearsIf
cp
is slow, other methods would be slow too. Unless it's not file-oriented copying
-
-
vonbrand over 11 yearsGNU tar has the
--acls
option, to store the ACLs into the archive. And I'd be surprised if an alien (of sorts) tool likebsdtar
handles it better than the (essentially) native one... -
Stéphane Chazelas over 11 years@vonbrand. Your tar must have been patched for that (I think RedHat has a patch for GNU tar for ACLs), because the latest version of GNU tar doesn't support such an option. There exist a number of implementations of
tar
for Linux (star
,bsdtar
,tar
), I'm not aware that GNU tar is any better than the others. The choice for GNU tools is generally more political than technical (see for instancebash
). -
vonbrand over 11 yearsUsing GNU tools might be a political choice, but it is the default choice nevertheless. And as they are much more popular than the alternatives, there also is more developer (and other) manpower behind them.
-
Yurij73 over 11 yearsthanks, the next time i will use pv and tar rather then cp
-
Victor Aurélio over 11 yearsThanks for this information :) but i never compared these two...
-
user2948306 over 11 years
-
Totor over 11 years
rsync
is mostly efficient is you already partially have the source data available on the destination volume because it will only transfer missing/changed data. I would not use it for a fast "first copy". -
Ploni almost 6 years@StéphaneChazelas Currently GNU tar does support
--acls