tar/bz2 compress a file removing uncompressed original

6,340

tar can't do that, but you can achieve what you want with:

find dir1 -depth -print0 | xargs -0 tar --create --no-recursion --remove-file --file - | bzip2 > dir1.tar.bz2

where:

  • find dir1 -depth -print0

    lists all files and directories in dir1, listing the directory contents before the directory itself (-depth). The use of -print0 (and -0 in xargs below) is the key to supporting directory and file names with embedded spaces.

  • xargs -0 tar --create --no-recursion --remove-file --file -

    creates a tar archive and adds every file or directory to it. The tar archive is sent to standard output with option --file -.

  • bzip2 > dir1.tar.bz2

    compresses the tar archive from standard input to a file called dir1.tar.bz2.

The amount of free disk space needed is the size of the largest compressed file in dir1 because tar, when processing a file, waits until archiving is complete before deleting it. Since tar is piped to bzip2, for a short moment, before tar removes it, every file resides in two places: uncompressed in the filesystem and compressed inside dir1.tar.bz2.

I was curious to see how disk space was used so I made this experiment on my Ubuntu VM:

  1. Create a 1 GB filesystem:

    $ dd if=/dev/zero of=/tmp/1gb bs=1M count=1024
    $ losetup /dev/loop0 /tmp/1gb
    $ mkfs.ext3 /dev/loop0
    $ sudo mount /dev/loop0 /tmp/mnt
    $ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/loop0     1008M   34M  924M   4% /tmp/mnt
    
  2. Fill the filesystem with 900 1 megabyte-files:

    $ chown jaume /tmp/mnt
    $ mkdir /tmp/mnt/dir1
    $ for (( i=0; i<900; i++ )); do dd if=/dev/urandom of=/tmp/mnt/dir1/file$i bs=1M count=1; done
    $ chown -R jaume /tmp/mnt
    $ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/loop0     1008M  937M   20M  98% /tmp/mnt
    

    The filesystem is now 98% full.

  3. Make a copy of dir1 for later verification:

    $ cp -a /tmp/mnt/dir1 /tmp/dir1-check
    
  4. Compress dir1:

    $ ls /tmp/mnt
    dir1  lost+found
    $ find /tmp/mnt/dir1 -depth -print0 | xargs -0 tar --create --no-recursion --remove-file --file - | bzip2 > /tmp/mnt/dir1.tar.bz2
    $
    

    Note that the commands ran without any 'no space left on device' errors.

    dir1 was removed, only dir1.tar.bz2 exists:

    $ ls /tmp/mnt
    dir1.tar.bz2  lost+found
    
  5. Expand dir1.tar.bz2 and compare to /tmp/dir1-check:

    $ tar --extract --file dir1.tar.bz2 --bzip2 --directory /tmp
    $ diff -s /tmp/dir1 /tmp/dir1-check
    (...)
    Files /tmp/dir1/file97 and /tmp/dir1-check/file97 are identical
    Files /tmp/dir1/file98 and /tmp/dir1-check/file98 are identical
    Files /tmp/dir1/file99 and /tmp/dir1-check/file99 are identical
    $
    

    Copy of dir1 and uncompressed dir1.tar.bz2 are identical!

This can be generalized in a script:

  1. Create a file called tarrm (or any other name of your liking) with these contents:

    #!/bin/bash
    
    # This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
    # This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
    # You should have received a copy of the GNU General Public License along with this program.  If not, see <http://www.gnu.org/licenses/>.
    
    # dir is first argument
    dir="$1"
    # check dir exists
    if [ ! -d "$dir" ]; then
        echo "$(basename $0): error: '$dir' doesn't exist" 1>&2
        exit 1
    fi
    # check if tar file exists
    if [ -f "${dir}.tar" -o -f "${dir}.tar.bz2" ]; then
        echo "$(basename $0): error: '$dir.tar' or '${dir}.tar.bz2' already exist" 1>&2
        exit 1
    fi
    
    # --keep is second argument
    if [ "X$2" == "X--keep" ]; then
        # keep mode
        removefile=""
        echo " Tarring '$dir'"
    else
        removefile="--remove-file"
        echo " Tarring and **deleting** '$dir'"
    fi
    
    # normalize directory name (for example, /home/jaume//// is a legal directory name, but will break ${dir}.tar.bz2 - it needs to be converted to /home/jaume)
    dir=$(dirname "$dir")/$(basename "$dir")
    
    # create compressed tar archive and delete files after adding them to it
    find "$dir" -depth -print0 | xargs -0 tar --create --no-recursion $removefile --file - | bzip2 > "${dir}.tar.bz2"
    
    # return status of last executed command
    if [ $? -ne 0 ]; then
        echo "$(basename $0): error while creating '${dir}.tar.bz2'" 1>&2
    fi
    
  2. Make it executable:

    chmod a+x tarrm

The script does some basic error checking: dir1 must exist, dir1.tar.bz2 and dir1.tar shouldn't exist and has a keep mode. It also supports directory and file names with embedded spaces.

I've tested the script but can't guarantee it is flawless, so first use it in keep mode:

./tarrm dir1 --keep

This invocation will add dir1 to dir1.tar.bz2 but won't delete the directory.

When you trust the script use it like this:

./tarrm dir1

The script will inform you that dir1 will be deleted in the process of tarring it:

Tarring and **deleting** 'dir1'

For example:

$ ls -lF
total 4
drwxrwxr-x 3 jaume jaume 4096 2013-10-11 11:00 dir 1/
$ find "dir 1"
dir 1
dir 1/subdir 1
dir 1/subdir 1/file 1
dir 1/file 1
$ /tmp/tarrm dir\ 1/
 Tarring and **deleting** 'dir 1/'
$ echo $?
0
$ ls -lF
total 4
-rw-rw-r-- 1 jaume jaume 181 2013-10-11 11:00 dir 1.tar.bz2
$ tar --list --file dir\ 1.tar.bz2 
dir 1/subdir 1/file 1
dir 1/subdir 1/
dir 1/file 1
dir 1/
Share:
6,340

Related videos on Youtube

Gregg Leventhal
Author by

Gregg Leventhal

Updated on September 18, 2022

Comments

  • Gregg Leventhal
    Gregg Leventhal over 1 year

    Is there any way to turn a directory called dir1 into dir1.tar.bz2 without keeping the original? I need to save space and want to compress some large files but don't have enough room to keep a compressed copy and the original. Is there any way to transform the existing file into an archive directly?

  • user
    user over 10 years
    Interesting approach. It does seem to depend on there being enough disk space to hold both the uncompressed tar archive as well as the almost-fully-compressed archive, though, since bzip2 (as well as other tools that I'm aware of) don't actually compress in place. Maybe, just maybe, you could use a pipe from a subshell to help with that?
  • jaume
    jaume over 10 years
    Yes, my proposed solution does indeed need enough space to compress dir1.tar. Another (much simpler) approach would be to use zip instead: zip --recurse-paths --move "dir 1.zip" "dir 1". I've edited my answer to mention zip...
  • jaume
    jaume over 10 years
    Well, as it turns out zip is not an option. I rolled back to the original answer. zip doesn't provide what the OP wants (from the man page): --move Move the specified files into the zip archive; actually, this deletes the target directories/files after making the specified zip archive. If a directory becomes empty after removal of the files, the directory is also removed. No deletions are done until zip has created the archive without error.
  • Gregg Leventhal
    Gregg Leventhal over 10 years
    Impressive solution. I guess the remaining question is how much free space is required to run this operation. Does it require original file + archive, or is it less than that?
  • jaume
    jaume over 10 years
    @MichaelKjörling Thanks for the hint, I noticed I could use tar --create instead of tar --append so I improved the solution to pipe and send the compressed output to a file. Now the amount of free disk space needed is the compressed size of the file in dir1 that is the largest after being compressed, much less than dir.tar.
  • jaume
    jaume over 10 years
    @GreggLeventhal I've improved the solution and now the amount of free disk space needed is the compressed size of the file in dir1 that is the largest after being compressed. I made a test with a filesystem 98% full and worked without a hitch.
  • Gregg Leventhal
    Gregg Leventhal over 10 years
    I haven't tested it myself yet, but am awarding you the answer on the amount of time and effort you put in. You should put this script into a higher level language like Python and opensource it. Thanks for your work!
  • jaume
    jaume over 10 years
    I appreciate your comments very much, Gregg. My Python (and PHP, Perl, etc for that matter) is pretty bad so I don't think I'll rewrite the script but I've added the standard GPL copying permission statement. Make sure you have a backup of your data before testing the script.
  • user
    user over 10 years
    I'd upvote this again if I could, but alas, I've already used my allotment of upvotes on this particular answer...
  • mveroone
    mveroone over 10 years
    I think something may be missing here. the variable "removedir" isn't used, and when tested, this program delete files only but not directories. I'll just add if [ "$removedir" == "true" ]; then;rm -rf $dir; fi
  • jaume
    jaume over 10 years
    You are right, thanks for pointing it out and for your suggestion. I'll check the script later and edit it.
  • jaume
    jaume over 10 years
    @Kwaio In my tests I found out that the removedir variable is not necessary, so I've removed it. The directory is deleted if --keep is not specified, I wonder why it only deleted files in your test. I ran tarrm on OS X 10.9.1.
  • mveroone
    mveroone over 10 years
    I'm using RHEL3 with kernel 2.4.17 and coreutils 4.5.3. that may change things. Thanks for the script anyway files are what takes the most space.