How do I compress files in-place?

39,942

Solution 1

gzip or bzip2 will compress the file and remove the non-compressed one automatically (this is their default behaviour).

However, keep in mind that while the compressing process, both files will exists.

If you want to compress log files (ie: files containing text), you may prefer bzip2, since it has a better ratio for text files.

bzip2 -9 myfile       # will produce myfile.bz2

Comparison and examples:

$ ls -l myfile
-rw-rw-r-- 1 apaul apaul 585999 29 april 10:09 myfile

$ bzip2 -9 myfile

$ ls -l myfile*
-rw-rw-r-- 1 apaul apaul 115780 29 april 10:09 myfile.bz2

$ bunzip2 myfile.bz2

$ gzip -9 myfile

$ ls -l myfile*
-rw-rw-r-- 1 apaul apaul 146234 29 april 10:09 myfile.gz

UPDATE as @Jjoao told me in a comment, interestingly, xz seems to have a best ratio on plain files with its default options:

$ xz -9 myfile

$ ls -l myfile*
-rw-rw-r-- 1 apaul apaul 109384 29 april 10:09 myfile.xz

For more informations, here is an interesting benchmark for different tools: http://binfalse.de/2011/04/04/comparison-of-compression/

For the example above, I use -9 for a best compression ratio, but if the time needed to compress data is more important than the ratio, you'd better not use it (use a lower option, ie -1, or something between).

Solution 2

I figured out a tar solution by myself.
It deletes single file after compressed it into the target file.
The compressing speed is not quite fast, though. The command looks like:

tar -zcvf my_log.tar.gz *.log --remove-files

Solution 3

In complement to @apaul, I emphasize that compressing files individually

 bzip2 *.log.*

(replace bzip2 by gzip, xz, or what ever your favorite file zip is) may be important:

This way you can still see (bzcat file.bz2), search (bzgrep file.bz2), edit (vi file.bz2) the compressed file and remove the older ones when necessary.

Solution 4

I was trying to do this on the BSD-version of tar. In this case, the --remove-files option is not available. What I ended up doing (and worked) was:

find folder_to_tar -type f -exec tar --append --file=output_tar_file.tar {} \; -exec rm -v {} \;

Solution 5

when you use io redirection in bash with >, the original file will be empty before write new data.

there is a command dd that can overwrite some content of the file instead of empty the file before writing, so following command may work:

gzip -c some-file | dd conv=notrunc of=some-file

mostly, compressed data are smaller than original data. when gzip read first N bytes, it only output M bytes where M < N, so one can safely overwrites first M bytes of original file with compressed data, and leave data after first N bytes not changed.

but there will be data after the end of gzip.

however, if dd write faster than gzip, i do not know what will happen.


or you can map a file to a block device by losetup. for block device, writing operation will not empty the original data.

loop_device=$(losetup -f--show some-file)
gzip -c $loop_device > $loop_device
Share:
39,942

Related videos on Youtube

Zen
Author by

Zen

Updated on September 18, 2022

Comments

  • Zen
    Zen over 1 year

    I have a machine with 90% hard-disk usage. I want to compress its 500+ log files into a smaller new file. However, the hard disk is too small to keep both the original files and the compressed ones.

    So what I need is to compress all log files into a single new file one by one, deleting each original once compressed.

    How can I do that in Linux?

  • JJoao
    JJoao about 9 years
    +1; Just curious: could you add a xz myfile ?
  • apaul
    apaul about 9 years
    @JJoao thanks! It's interesting, I'm not used to use xz, but I'll consider it now. See the update of my post.
  • ju5tin
    ju5tin about 9 years
    Please don't do xz -9. It greatly increases the memory required for compression/decompression, without significantly improving the compression ratio. The manpage even says (emphasis theirs) "Specifically, it's not a good idea to blindly use -9 for everything like it often is with gzip(1) and bzip2(1)". The default xz -6 is good enough, and even xz -0/xz -1 usually compress better than gzip -9.
  • apaul
    apaul about 9 years
    @user49740 you're right. I rarely use -9, but I used it here since I wanted to make some kind of benchmark for compression ratio "on the same scale". But once again, you're totally right: it's a bad idea to blindly use -9.
  • pgilmon
    pgilmon over 7 years
    By the way, the BSD version is what you get by default if you happen to be using MacOS
  • Chris
    Chris about 3 years
    # you can also just point to the log directory ('log' being the log directory containing the logfiles) tar --remove-files -zcvf my_log.tar.gz log