How do I compress files in-place?
Solution 1
gzip
or bzip2
will compress the file and remove the non-compressed one automatically (this is their default behaviour).
However, keep in mind that while the compressing process, both files will exists.
If you want to compress log files (ie: files containing text), you may prefer bzip2
, since it has a better ratio for text files.
bzip2 -9 myfile # will produce myfile.bz2
Comparison and examples:
$ ls -l myfile
-rw-rw-r-- 1 apaul apaul 585999 29 april 10:09 myfile
$ bzip2 -9 myfile
$ ls -l myfile*
-rw-rw-r-- 1 apaul apaul 115780 29 april 10:09 myfile.bz2
$ bunzip2 myfile.bz2
$ gzip -9 myfile
$ ls -l myfile*
-rw-rw-r-- 1 apaul apaul 146234 29 april 10:09 myfile.gz
UPDATE as @Jjoao told me in a comment, interestingly, xz
seems to have a best ratio on plain files with its default options:
$ xz -9 myfile
$ ls -l myfile*
-rw-rw-r-- 1 apaul apaul 109384 29 april 10:09 myfile.xz
For more informations, here is an interesting benchmark for different tools: http://binfalse.de/2011/04/04/comparison-of-compression/
For the example above, I use -9
for a best compression ratio, but if the time needed to compress data is more important than the ratio, you'd better not use it (use a lower option, ie -1
, or something between).
Solution 2
I figured out a tar solution by myself.
It deletes single file after compressed it into the target file.
The compressing speed is not quite fast, though. The command looks like:
tar -zcvf my_log.tar.gz *.log --remove-files
Solution 3
In complement to @apaul, I emphasize that compressing files individually
bzip2 *.log.*
(replace bzip2 by gzip, xz, or what ever your favorite file zip is) may be important:
This way you can still see (bzcat file.bz2
), search (bzgrep file.bz2
), edit (vi file.bz2
) the compressed file
and remove the older ones when necessary.
Solution 4
I was trying to do this on the BSD-version of tar. In this case, the --remove-files option is not available. What I ended up doing (and worked) was:
find folder_to_tar -type f -exec tar --append --file=output_tar_file.tar {} \; -exec rm -v {} \;
Solution 5
when you use io redirection in bash with >
, the original file will be empty before write new data.
there is a command dd that can overwrite some content of the file instead of empty the file before writing, so following command may work:
gzip -c some-file | dd conv=notrunc of=some-file
mostly, compressed data are smaller than original data. when gzip read first N bytes, it only output M bytes where M < N, so one can safely overwrites first M bytes of original file with compressed data, and leave data after first N bytes not changed.
but there will be data after the end of gzip.
however, if dd write faster than gzip, i do not know what will happen.
or you can map a file to a block device by losetup. for block device, writing operation will not empty the original data.
loop_device=$(losetup -f--show some-file)
gzip -c $loop_device > $loop_device
Related videos on Youtube
Zen
Updated on September 18, 2022Comments
-
Zen over 1 year
I have a machine with 90% hard-disk usage. I want to compress its 500+ log files into a smaller new file. However, the hard disk is too small to keep both the original files and the compressed ones.
So what I need is to compress all log files into a single new file one by one, deleting each original once compressed.
How can I do that in Linux?
-
Hermann over 4 yearsDuplicate: superuser.com/questions/378230
-
-
JJoao about 9 years+1; Just curious: could you add a
xz myfile
? -
apaul about 9 years@JJoao thanks! It's interesting, I'm not used to use
xz
, but I'll consider it now. See the update of my post. -
ju5tin about 9 yearsPlease don't do
xz -9
. It greatly increases the memory required for compression/decompression, without significantly improving the compression ratio. The manpage even says (emphasis theirs) "Specifically, it's not a good idea to blindly use -9 for everything like it often is with gzip(1) and bzip2(1)". The defaultxz -6
is good enough, and evenxz -0
/xz -1
usually compress better thangzip -9
. -
apaul about 9 years@user49740 you're right. I rarely use
-9
, but I used it here since I wanted to make some kind of benchmark for compression ratio "on the same scale". But once again, you're totally right: it's a bad idea to blindly use-9
. -
pgilmon over 7 yearsBy the way, the BSD version is what you get by default if you happen to be using MacOS
-
Chris about 3 years# you can also just point to the log directory ('log' being the log directory containing the logfiles)
tar --remove-files -zcvf my_log.tar.gz log