Why do bzip2 and gzip corrupt large heapdump files and can I work around it?

5,522

Solution 1

If this is more than one file, try putting them in a tar archive first

tar czvf dumps.tar.gz file1 file2

or for bzip compression

tar cjvf dumps.tar.bz2 file1 file2

I've never had any problems with either method on numerous systems and filesystems.

Will also work for 1 file of course!

Solution 2

gzip versions 1.2.4 and older has problems decompressing files larger than 4Gb (see: http://www.gzip.org/#faq10)

According to bzip2's changelog it seems to also have had some trouble with larger files prior to verision 1.0.0

Share:
5,522
Hanno Fietz
Author by

Hanno Fietz

I'm running my own startup which sells software to help companies monitor their energy consumption in detail. My programming tasks center on the Java / OSGi backend and a web-based GUI that is currently being migrated from ActionScript / Flash to JS / AJAX.

Updated on September 18, 2022

Comments

  • Hanno Fietz
    Hanno Fietz almost 2 years

    I'm trying to analyze heap dumps from a server which are fairly large files (10-15 GB). I create those files on the server and want to analyze them on my machine, so for downloading them, I tried compressing them with both bzip and gzip. Both programs consistently produce corrupted files that they can't decompress anymore.

    I'm using ext3 with a block size of 4 KiB, so the file size limit should be 2 TiB and therefore irrelevant in my case. I'm using gzip 1.3.12 and bzip 1.0.5 on a Ubuntu Jaunty, 64-bit server edition, in a mostly vanilla state (only added some packages, nothing fancy).

    There is a RAID-1 running, but it reports no synchronization problems or delays.

    The dumps are created with jmap.

    Is there any particular type of data that makes those programs choke?

    Is the size a problem?

    What could I try to find out more or circumvent the problem?

    • Paul
      Paul over 12 years
      Are you doing your test decompressions on the same machine that you do the compression on?
    • Kjetil Jørgensen
      Kjetil Jørgensen over 12 years
      Does gzip and/or bzip2 exit successfully ? (gzip file; echo $?) At which point is the file corrupted, before or after the transfer ? (That is, if you try to gunzip/bunzip2 the file where it was created, is it corrupted already at that stage ?)
    • Kjetil Jørgensen
      Kjetil Jørgensen over 12 years
      Also, which versions of gzip and bzip2 are you using ? (--version) Apparently, older gzip versions had problems decompressing files larger than 4Gb. gzip.org/#faq10
    • Wayne Jhukie
      Wayne Jhukie over 12 years
      Have you run out of space for intermediate files?
    • Hanno Fietz
      Hanno Fietz over 12 years
      @Paul - Yes, I am, with the -t flag.
    • Hanno Fietz
      Hanno Fietz over 12 years
      @pjc50 - No. The file system is fine, there's plenty of space and no file size limit I have to care about (it's actually 2 TiB).
    • t0r0X
      t0r0X over 7 years
      I know this is way too late, but... If you get weird file corruptions while copying / compressing large files locally on a system, then you might have defect RAM modules. Happened to me some years ago, went almost nuts because of it... :-/
  • Hanno Fietz
    Hanno Fietz over 12 years
    This is useful information. My versions are higher, though. (Will edit them into the original question)
  • Steve
    Steve over 12 years
    I can only guess that passing it through tar resolves some sort of timing issue. I have no end of trouble with zipping on various NFS systems at work, and yet tar works every time. In my case the filesystems are over half the planet away, but I've had issues with gzip at home too, and that's all local. I just tar things up by default now.