Does any file system implement Copy on Write mechanism for CP?

14,431

Solution 1

The keyword to search is reflink. It was recently implemented in XFS.

EDIT: the XFS implementation was initially marked EXPERIMENTAL. This warning was removed in the kernel release 4.16, a number of months after I wrote the above :-).

Solution 2

From cp man page:

When --reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails, or if --reflink=auto is specified, fall back to a standard copy.

This works on file systems which support Copy-on-Write (reflink), mainly BTRFS at the moment. XFS reflink implementation is in development [1][2].

Solution 3

Linux has a system call that allows userspace processes to tell the kernel to make copy on write copies of files. FICLONERANGE and FICLONE used as options to ioctl allow copy on write copies of files and ranges within files to be made.

This is used by cp --reflink to make the copies where the file system supports this.

Solution 4

Unless you introduce a syscall for cp (or at least to copy a block), the OS has a hard time figuring out that the data the cp program is going to write is the same as the one it just read from another block. On top of that, you'd have additional overhead to manage the "several files share the same blocks" scenario. Large similar files that only differ in few blocks happen rarely. So it's cheaper on the whole to just copy those blocks, then to add this administrative overhead to all files.

Now if you "copy" files (lots of them) by adding another clone/snapshot of the file system in, say, BTRFS, the situation is different: Now you've "copied" all files in the filesystem, and any changes to them will be copy-on-write. This exists, but not in ext4.

"Journalling" is a completely independent concept from that, it's the administrative structures for the files that count.

Share:
14,431

Related videos on Youtube

Mridul Verma
Author by

Mridul Verma

Updated on September 18, 2022

Comments

  • Mridul Verma
    Mridul Verma over 1 year

    We have seen OS doing Copy on Write optimisation when forking a process. Reason being that most of the time fork is preceded by exec, so we don't want to incur the cost of page allocations and copying the data from the caller address space unnecessarily.

    So does this also happen when doing CP on a linux with ext4 or xfs (journaling) file systems? If it does not happen, then why not?

  • Stéphane Chazelas
    Stéphane Chazelas over 6 years
    Some network file systems like NFS, CIFS, OCFS2 may pass those along to the server as well.
  • bitifet
    bitifet over 5 years
    Large files one being a binary copy of the other extremely rare times differ in a single bit and whin it happen is due to an error.
  • Q the Platypus
    Q the Platypus almost 5 years
    A system call for copy has been introduced (see my answer).
  • Rashini Gamalath
    Rashini Gamalath about 3 years
    in 5.11 it's supported in Btrfs, CIFS, NFS 4.2, OCFS2, overlayfs, and XFS