Are there any disadvantages of `cp --sparse=always`?

7,195

There are a few reasons why it is not default, one is backwards compatibility, performance, and last but not least, the principle of least surprise.

My understanding is that when you enable this option, there is CPU overhead which might not necessarily be acceptable, besides, backwards compatibility is also key. The cp command works reliably without, it does add a little space saving, but these days, that really is negligible, in most cases at least ...

I think the comments you received also highlighted other reasons.

Principle of least surprise means you do not change something needlessly, cp has been around for decades, changing its default behavior will upset many veterans.

Share:
7,195

Related videos on Youtube

Tom Hale
Author by

Tom Hale

Updated on September 18, 2022

Comments

  • Tom Hale
    Tom Hale almost 2 years

    Is there any reason not to use use --sparse=always with every invocation to cp?

    info cp says:

    ‘--sparse=WHEN’
         A “sparse file” contains “holes”—a sequence of zero bytes that does
         not occupy any physical disk blocks; the ‘read’ system call reads
         these as zeros.  This can both save considerable disk space and
         increase speed, since many binary files contain lots of consecutive
         zero bytes.  By default, ‘cp’ detects holes in input source files
         via a crude heuristic and makes the corresponding output file
         sparse as well.  Only regular files may be sparse.
    
        The WHEN value can be one of the following:
    

    ...

        ‘always’
              For each sufficiently long sequence of zero bytes in the input
              file, attempt to create a corresponding hole in the output
              file, even if the input file does not appear to be sparse.
              This is useful when the input file resides on a file system
              that does not support sparse files (for example, ‘efs’ file
              systems in SGI IRIX 5.3 and earlier), but the output file is
              on a type of file system that does support them.  Holes may be
              created only in regular files, so if the destination file is
              of some other type, ‘cp’ does not even try to make it sparse.
    

    It also says:

    [...] with the following alias, ‘cp’ will use the minimum amount of space supported by the file system.

    alias cp='cp --reflink=auto --sparse=always'
    

    Why isn't --sparse=always the default?

    • Stephen Kitt
      Stephen Kitt almost 7 years
      It’s incompatible with --reflink, apart from that I don’t know...
    • cat
      cat almost 7 years
      perhaps just because the developers wanted to utilise the principle of least surprise, or because POSIX specified otherwise? (is cp even in posix, i forget)
    • frostschutz
      frostschutz almost 7 years
      Checking for sparseness might be detrimal to performance, sparse files may cause severe filesystem fragmentation, and there was at least one instance of data corruption with cp --sparse.
    • Stephen Kitt
      Stephen Kitt almost 7 years
      @frostschutz the default still creates sparse files, with always the behaviour is slightly different. If you want to avoid the issues you mention, you need to disable sparse writes with --sparse=never explicitly.
    • Stephen Kitt
      Stephen Kitt almost 7 years
      To clarify, as I understand it the default copies existing holes, always also looks for runs of zeroes which could be used to create new holes.
    • meuh
      meuh almost 7 years
      Copying the data for (mainly non-sparse) files through a read/write loop involves dma of data into and out of memory, whereas looking for runs of zeroes implied by always (or auto where the number of blocks doesn't match the file size) will drag the data into the cpu caches and involve much more cpu bandwidth and cycles.
    • Stéphane Chazelas
      Stéphane Chazelas almost 7 years
    • Kusalananda
      Kusalananda almost 7 years
      @cat cp is in POSIX, but GNU cp implements extra stuff.
    • Tom Hale
      Tom Hale almost 7 years
      @StephenKitt It is compatible with --reflink: info cp contains: with the following alias, ‘cp’ will use the minimum amount of space supported by the file system. alias cp='cp --reflink=auto --sparse=always'
    • Stephen Kitt
      Stephen Kitt almost 7 years
      @Tom hah, serves me right for just reading the source code and not seeing the auto variant!
    • Tom Hale
      Tom Hale almost 7 years
      I like the way you roll, @StephenKitt!
    • Admin
      Admin almost 7 years
      Some bootloaders don't like holes in files...
  • Atemu
    Atemu over 2 years
    How big is the CPU overhead of --sparse?