tar/bz2 compress a file removing uncompressed original

linux compression tar bzip2

6,340

tar can't do that, but you can achieve what you want with:

find dir1 -depth -print0 | xargs -0 tar --create --no-recursion --remove-file --file - | bzip2 > dir1.tar.bz2

where:

find dir1 -depth -print0

lists all files and directories in dir1, listing the directory contents before the directory itself (-depth). The use of -print0 (and -0 in xargs below) is the key to supporting directory and file names with embedded spaces.
xargs -0 tar --create --no-recursion --remove-file --file -

creates a tar archive and adds every file or directory to it. The tar archive is sent to standard output with option --file -.
bzip2 > dir1.tar.bz2

compresses the tar archive from standard input to a file called dir1.tar.bz2.

The amount of free disk space needed is the size of the largest compressed file in dir1 because tar, when processing a file, waits until archiving is complete before deleting it. Since tar is piped to bzip2, for a short moment, before tar removes it, every file resides in two places: uncompressed in the filesystem and compressed inside dir1.tar.bz2.

I was curious to see how disk space was used so I made this experiment on my Ubuntu VM:

Create a 1 GB filesystem:

$ dd if=/dev/zero of=/tmp/1gb bs=1M count=1024
$ losetup /dev/loop0 /tmp/1gb
$ mkfs.ext3 /dev/loop0
$ sudo mount /dev/loop0 /tmp/mnt
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0     1008M   34M  924M   4% /tmp/mnt

Fill the filesystem with 900 1 megabyte-files:

$ chown jaume /tmp/mnt
$ mkdir /tmp/mnt/dir1
$ for (( i=0; i<900; i++ )); do dd if=/dev/urandom of=/tmp/mnt/dir1/file$i bs=1M count=1; done
$ chown -R jaume /tmp/mnt
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0     1008M  937M   20M  98% /tmp/mnt

The filesystem is now 98% full.

Make a copy of dir1 for later verification:
```
$ cp -a /tmp/mnt/dir1 /tmp/dir1-check
```

Compress dir1:

$ ls /tmp/mnt
dir1  lost+found
$ find /tmp/mnt/dir1 -depth -print0 | xargs -0 tar --create --no-recursion --remove-file --file - | bzip2 > /tmp/mnt/dir1.tar.bz2
$

Note that the commands ran without any 'no space left on device' errors.

dir1 was removed, only dir1.tar.bz2 exists:

$ ls /tmp/mnt
dir1.tar.bz2  lost+found

Expand dir1.tar.bz2 and compare to /tmp/dir1-check:

$ tar --extract --file dir1.tar.bz2 --bzip2 --directory /tmp
$ diff -s /tmp/dir1 /tmp/dir1-check
(...)
Files /tmp/dir1/file97 and /tmp/dir1-check/file97 are identical
Files /tmp/dir1/file98 and /tmp/dir1-check/file98 are identical
Files /tmp/dir1/file99 and /tmp/dir1-check/file99 are identical
$

Copy of dir1 and uncompressed dir1.tar.bz2 are identical!

This can be generalized in a script:

Create a file called tarrm (or any other name of your liking) with these contents:

#!/bin/bash

# This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
# This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
# You should have received a copy of the GNU General Public License along with this program.  If not, see <http://www.gnu.org/licenses/>.

# dir is first argument
dir="$1"
# check dir exists
if [ ! -d "$dir" ]; then
    echo "$(basename $0): error: '$dir' doesn't exist" 1>&2
    exit 1
fi
# check if tar file exists
if [ -f "${dir}.tar" -o -f "${dir}.tar.bz2" ]; then
    echo "$(basename $0): error: '$dir.tar' or '${dir}.tar.bz2' already exist" 1>&2
    exit 1
fi

# --keep is second argument
if [ "X$2" == "X--keep" ]; then
    # keep mode
    removefile=""
    echo " Tarring '$dir'"
else
    removefile="--remove-file"
    echo " Tarring and **deleting** '$dir'"
fi

# normalize directory name (for example, /home/jaume//// is a legal directory name, but will break ${dir}.tar.bz2 - it needs to be converted to /home/jaume)
dir=$(dirname "$dir")/$(basename "$dir")

# create compressed tar archive and delete files after adding them to it
find "$dir" -depth -print0 | xargs -0 tar --create --no-recursion $removefile --file - | bzip2 > "${dir}.tar.bz2"

# return status of last executed command
if [ $? -ne 0 ]; then
    echo "$(basename $0): error while creating '${dir}.tar.bz2'" 1>&2
fi

Make it executable:

chmod a+x tarrm

The script does some basic error checking: dir1 must exist, dir1.tar.bz2 and dir1.tar shouldn't exist and has a keep mode. It also supports directory and file names with embedded spaces.

I've tested the script but can't guarantee it is flawless, so first use it in keep mode:

./tarrm dir1 --keep

This invocation will add dir1 to dir1.tar.bz2 but won't delete the directory.

When you trust the script use it like this:

./tarrm dir1

The script will inform you that dir1 will be deleted in the process of tarring it:

Tarring and **deleting** 'dir1'

For example:

$ ls -lF
total 4
drwxrwxr-x 3 jaume jaume 4096 2013-10-11 11:00 dir 1/
$ find "dir 1"
dir 1
dir 1/subdir 1
dir 1/subdir 1/file 1
dir 1/file 1
$ /tmp/tarrm dir\ 1/
 Tarring and **deleting** 'dir 1/'
$ echo $?
0
$ ls -lF
total 4
-rw-rw-r-- 1 jaume jaume 181 2013-10-11 11:00 dir 1.tar.bz2
$ tar --list --file dir\ 1.tar.bz2 
dir 1/subdir 1/file 1
dir 1/subdir 1/
dir 1/file 1
dir 1/

6,340

Gregg Leventhal

Updated on September 18, 2022

Comments

Gregg Leventhal over 1 year

Is there any way to turn a directory called dir1 into dir1.tar.bz2 without keeping the original? I need to save space and want to compress some large files but don't have enough room to keep a compressed copy and the original. Is there any way to transform the existing file into an archive directly?
user over 10 years

Interesting approach. It does seem to depend on there being enough disk space to hold both the uncompressed tar archive as well as the almost-fully-compressed archive, though, since bzip2 (as well as other tools that I'm aware of) don't actually compress in place. Maybe, just maybe, you could use a pipe from a subshell to help with that?
jaume over 10 years

Yes, my proposed solution does indeed need enough space to compress dir1.tar. Another (much simpler) approach would be to use zip instead: zip --recurse-paths --move "dir 1.zip" "dir 1". I've edited my answer to mention zip...
jaume over 10 years

Well, as it turns out zip is not an option. I rolled back to the original answer. zip doesn't provide what the OP wants (from the man page): --move Move the specified files into the zip archive; actually, this deletes the target directories/files after making the specified zip archive. If a directory becomes empty after removal of the files, the directory is also removed. No deletions are done until zip has created the archive without error.
Gregg Leventhal over 10 years

Impressive solution. I guess the remaining question is how much free space is required to run this operation. Does it require original file + archive, or is it less than that?
jaume over 10 years

@MichaelKjörling Thanks for the hint, I noticed I could use tar --create instead of tar --append so I improved the solution to pipe and send the compressed output to a file. Now the amount of free disk space needed is the compressed size of the file in dir1 that is the largest after being compressed, much less than dir.tar.
jaume over 10 years

@GreggLeventhal I've improved the solution and now the amount of free disk space needed is the compressed size of the file in dir1 that is the largest after being compressed. I made a test with a filesystem 98% full and worked without a hitch.
Gregg Leventhal over 10 years

I haven't tested it myself yet, but am awarding you the answer on the amount of time and effort you put in. You should put this script into a higher level language like Python and opensource it. Thanks for your work!
jaume over 10 years

I appreciate your comments very much, Gregg. My Python (and PHP, Perl, etc for that matter) is pretty bad so I don't think I'll rewrite the script but I've added the standard GPL copying permission statement. Make sure you have a backup of your data before testing the script.
user over 10 years

I'd upvote this again if I could, but alas, I've already used my allotment of upvotes on this particular answer...
mveroone over 10 years

I think something may be missing here. the variable "removedir" isn't used, and when tested, this program delete files only but not directories. I'll just add if [ "$removedir" == "true" ]; then;rm -rf $dir; fi
jaume over 10 years

You are right, thanks for pointing it out and for your suggestion. I'll check the script later and edit it.
jaume over 10 years

@Kwaio In my tests I found out that the removedir variable is not necessary, so I've removed it. The directory is deleted if --keep is not specified, I wonder why it only deleted files in your test. I ran tarrm on OS X 10.9.1.
mveroone over 10 years

I'm using RHEL3 with kernel 2.4.17 and coreutils 4.5.3. that may change things. Thanks for the script anyway files are what takes the most space.