Is there a way to determine the decompressed size of a .bz2 file?

20,635

As noted by others, bzip2 doesn't provide much information. But this technique works -- you will have to decompress the file, but you won't have to write the decompressed data to disk, which may be a "good enough" solution for you:

$ ls -l foo.bz2
-rw-r--r-- 1 ~quack ~quack 2364418 Jul  4 11:15 foo.bz2

$ bzcat foo.bz2 | wc -c         # bzcat decompresses to stdout, wc -c counts bytes
2928640                         # number of bytes of decompressed data

You can pipe that output into something else to give you a human-readable form:

$ ls -lh foo.bz2
-rw-r--r-- 1 quack quack 2.3M Jul  4 11:15 foo.bz2

$ bzcat foo.bz2 | wc -c | perl -lne 'printf("%.2fM\n", $_/1024/1024)'
2.79M
Share:
20,635

Related videos on Youtube

endolith
Author by

endolith

I used to run Ubuntu, but then I upgraded to Windows 7.

Updated on September 17, 2022

Comments

  • endolith
    endolith over 1 year

    Is there a way to print the decompressed size of a .bz2 file without actually decompressing the entire thing?

    • endolith
      endolith over 14 years
      So there is no metadata about the original file in the bzip output? >:(
    • quack quixote
      quack quixote over 14 years
      not that i've seen reference to. :/
  • endolith
    endolith over 14 years
    Well, that only took five minutes of 100% CPU to calculate.
  • quack quixote
    quack quixote over 14 years
    only? AND it would fill up a disk? i've got a compressed tarball of an old linux install that's only 407meg yet took my poor ancient server 30-45 minutes to extract. that included writing to disk, tho, i'll have to run that script to time it. get back to ya in half an hour... :)
  • endolith
    endolith over 14 years
    I picked the smallest file for the first test, of course. 140 MB compressed --> 3 GB uncompressed. The larger files are 5 GB compressed...
  • quack quixote
    quack quixote over 14 years
    heh .. lemme know how big the 5GBs turn out to be... and how long it takes to figure it out via this XD
  • Nick Russo
    Nick Russo about 6 years
    bzcat and zless don't work together like this. Use "bzcat file.bz2 | less" or "bzless file.bz2", or if you have a gzipped file, "zcat file.gz | less" or "zless file.gz". In fact, the man page for zless notes that "Zless does not work with compressed data that is piped to it via standard input; it requires that input files be specified as arguments."
  • Skippy le Grand Gourou
    Skippy le Grand Gourou almost 4 years
    FWIW, it took about 30 minutes (7’35 sys) to find a 5.6 GB archive was 71 GB uncompressed on a 2×2.4 GHz, 8 GB RAM virtual machine.
  • Admin
    Admin almost 2 years
    @SkippyleGrandGourou The performance of the CPU almost doesn't matter – it's the read speed of the disk which is going to dominate the performance equation, here.
  • Admin
    Admin almost 2 years
    @ChristopherSchultz I may be wrong but I seem to recall (b)zip(2) compression and decompression are CPU-intensive operations.