tar/bz2 compress a file removing uncompressed original
tar
can't do that, but you can achieve what you want with:
find dir1 -depth -print0 | xargs -0 tar --create --no-recursion --remove-file --file - | bzip2 > dir1.tar.bz2
where:
find dir1 -depth -print0
lists all files and directories in
dir1
, listing the directory contents before the directory itself (-depth
). The use of-print0
(and-0
inxargs
below) is the key to supporting directory and file names with embedded spaces.xargs -0 tar --create --no-recursion --remove-file --file -
creates a tar archive and adds every file or directory to it. The tar archive is sent to standard output with option
--file -
.bzip2 > dir1.tar.bz2
compresses the tar archive from standard input to a file called
dir1.tar.bz2
.
The amount of free disk space needed is the size of the largest compressed file in dir1
because tar
, when processing a file, waits until archiving is complete before deleting it. Since tar
is piped to bzip2
, for a short moment, before tar
removes it, every file resides in two places: uncompressed in the filesystem and compressed inside dir1.tar.bz2
.
I was curious to see how disk space was used so I made this experiment on my Ubuntu VM:
Create a 1 GB filesystem:
$ dd if=/dev/zero of=/tmp/1gb bs=1M count=1024 $ losetup /dev/loop0 /tmp/1gb $ mkfs.ext3 /dev/loop0 $ sudo mount /dev/loop0 /tmp/mnt $ df -h Filesystem Size Used Avail Use% Mounted on /dev/loop0 1008M 34M 924M 4% /tmp/mnt
Fill the filesystem with 900 1 megabyte-files:
$ chown jaume /tmp/mnt $ mkdir /tmp/mnt/dir1 $ for (( i=0; i<900; i++ )); do dd if=/dev/urandom of=/tmp/mnt/dir1/file$i bs=1M count=1; done $ chown -R jaume /tmp/mnt $ df -h Filesystem Size Used Avail Use% Mounted on /dev/loop0 1008M 937M 20M 98% /tmp/mnt
The filesystem is now 98% full.
Make a copy of
dir1
for later verification:$ cp -a /tmp/mnt/dir1 /tmp/dir1-check
Compress
dir1
:$ ls /tmp/mnt dir1 lost+found $ find /tmp/mnt/dir1 -depth -print0 | xargs -0 tar --create --no-recursion --remove-file --file - | bzip2 > /tmp/mnt/dir1.tar.bz2 $
Note that the commands ran without any 'no space left on device' errors.
dir1
was removed, onlydir1.tar.bz2
exists:$ ls /tmp/mnt dir1.tar.bz2 lost+found
Expand
dir1.tar.bz2
and compare to/tmp/dir1-check
:$ tar --extract --file dir1.tar.bz2 --bzip2 --directory /tmp $ diff -s /tmp/dir1 /tmp/dir1-check (...) Files /tmp/dir1/file97 and /tmp/dir1-check/file97 are identical Files /tmp/dir1/file98 and /tmp/dir1-check/file98 are identical Files /tmp/dir1/file99 and /tmp/dir1-check/file99 are identical $
Copy of
dir1
and uncompresseddir1.tar.bz2
are identical!
This can be generalized in a script:
Create a file called
tarrm
(or any other name of your liking) with these contents:#!/bin/bash # This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. # This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. # You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. # dir is first argument dir="$1" # check dir exists if [ ! -d "$dir" ]; then echo "$(basename $0): error: '$dir' doesn't exist" 1>&2 exit 1 fi # check if tar file exists if [ -f "${dir}.tar" -o -f "${dir}.tar.bz2" ]; then echo "$(basename $0): error: '$dir.tar' or '${dir}.tar.bz2' already exist" 1>&2 exit 1 fi # --keep is second argument if [ "X$2" == "X--keep" ]; then # keep mode removefile="" echo " Tarring '$dir'" else removefile="--remove-file" echo " Tarring and **deleting** '$dir'" fi # normalize directory name (for example, /home/jaume//// is a legal directory name, but will break ${dir}.tar.bz2 - it needs to be converted to /home/jaume) dir=$(dirname "$dir")/$(basename "$dir") # create compressed tar archive and delete files after adding them to it find "$dir" -depth -print0 | xargs -0 tar --create --no-recursion $removefile --file - | bzip2 > "${dir}.tar.bz2" # return status of last executed command if [ $? -ne 0 ]; then echo "$(basename $0): error while creating '${dir}.tar.bz2'" 1>&2 fi
Make it executable:
chmod a+x tarrm
The script does some basic error checking: dir1
must exist, dir1.tar.bz2
and dir1.tar
shouldn't exist and has a keep mode. It also supports directory and file names with embedded spaces.
I've tested the script but can't guarantee it is flawless, so first use it in keep mode:
./tarrm dir1 --keep
This invocation will add dir1
to dir1.tar.bz2
but won't delete the directory.
When you trust the script use it like this:
./tarrm dir1
The script will inform you that dir1
will be deleted in the process of tarring it:
Tarring and **deleting** 'dir1'
For example:
$ ls -lF
total 4
drwxrwxr-x 3 jaume jaume 4096 2013-10-11 11:00 dir 1/
$ find "dir 1"
dir 1
dir 1/subdir 1
dir 1/subdir 1/file 1
dir 1/file 1
$ /tmp/tarrm dir\ 1/
Tarring and **deleting** 'dir 1/'
$ echo $?
0
$ ls -lF
total 4
-rw-rw-r-- 1 jaume jaume 181 2013-10-11 11:00 dir 1.tar.bz2
$ tar --list --file dir\ 1.tar.bz2
dir 1/subdir 1/file 1
dir 1/subdir 1/
dir 1/file 1
dir 1/
Related videos on Youtube
Gregg Leventhal
Updated on September 18, 2022Comments
-
Gregg Leventhal over 1 year
Is there any way to turn a directory called dir1 into dir1.tar.bz2 without keeping the original? I need to save space and want to compress some large files but don't have enough room to keep a compressed copy and the original. Is there any way to transform the existing file into an archive directly?
-
user over 10 yearsInteresting approach. It does seem to depend on there being enough disk space to hold both the uncompressed tar archive as well as the almost-fully-compressed archive, though, since bzip2 (as well as other tools that I'm aware of) don't actually compress in place. Maybe, just maybe, you could use a pipe from a subshell to help with that?
-
jaume over 10 yearsYes, my proposed solution does indeed need enough space to compress
dir1.tar
. Another (much simpler) approach would be to usezip
instead:zip --recurse-paths --move "dir 1.zip" "dir 1"
. I've edited my answer to mentionzip
... -
jaume over 10 yearsWell, as it turns out
zip
is not an option. I rolled back to the original answer.zip
doesn't provide what the OP wants (from the man page): --move Move the specified files into the zip archive; actually, this deletes the target directories/files after making the specified zip archive. If a directory becomes empty after removal of the files, the directory is also removed. No deletions are done until zip has created the archive without error. -
Gregg Leventhal over 10 yearsImpressive solution. I guess the remaining question is how much free space is required to run this operation. Does it require original file + archive, or is it less than that?
-
jaume over 10 years@MichaelKjörling Thanks for the hint, I noticed I could use
tar --create
instead oftar --append
so I improved the solution to pipe and send the compressed output to a file. Now the amount of free disk space needed is the compressed size of the file indir1
that is the largest after being compressed, much less thandir.tar
. -
jaume over 10 years@GreggLeventhal I've improved the solution and now the amount of free disk space needed is the compressed size of the file in
dir1
that is the largest after being compressed. I made a test with a filesystem 98% full and worked without a hitch. -
Gregg Leventhal over 10 yearsI haven't tested it myself yet, but am awarding you the answer on the amount of time and effort you put in. You should put this script into a higher level language like Python and opensource it. Thanks for your work!
-
jaume over 10 yearsI appreciate your comments very much, Gregg. My Python (and PHP, Perl, etc for that matter) is pretty bad so I don't think I'll rewrite the script but I've added the standard GPL copying permission statement. Make sure you have a backup of your data before testing the script.
-
user over 10 yearsI'd upvote this again if I could, but alas, I've already used my allotment of upvotes on this particular answer...
-
mveroone over 10 yearsI think something may be missing here. the variable "removedir" isn't used, and when tested, this program delete files only but not directories. I'll just add
if [ "$removedir" == "true" ]; then;rm -rf $dir; fi
-
jaume over 10 yearsYou are right, thanks for pointing it out and for your suggestion. I'll check the script later and edit it.
-
jaume over 10 years@Kwaio In my tests I found out that the
removedir
variable is not necessary, so I've removed it. The directory is deleted if--keep
is not specified, I wonder why it only deleted files in your test. I rantarrm
on OS X 10.9.1. -
mveroone over 10 yearsI'm using RHEL3 with kernel 2.4.17 and coreutils 4.5.3. that may change things. Thanks for the script anyway files are what takes the most space.