Why use deflate instead of gzip for text files served by Apache?

137,281

Solution 1

Why use deflate instead of gzip for text files served by Apache?

The simple answer is don't.


RFC 2616 defines deflate as:

deflate The "zlib" format defined in RFC 1950 in combination with the "deflate" compression mechanism described in RFC 1951

The zlib format is defined in RFC 1950 as :

     0   1
     +---+---+
     |CMF|FLG|   (more-->)
     +---+---+

       0   1   2   3
     +---+---+---+---+
     |     DICTID    |   (more-->)
     +---+---+---+---+

     +=====================+---+---+---+---+
     |...compressed data...|    ADLER32    |
     +=====================+---+---+---+---+

So, a few headers and an ADLER32 checksum

RFC 2616 defines gzip as:

gzip An encoding format produced by the file compression program "gzip" (GNU zip) as described in RFC 1952 [25]. This format is a Lempel-Ziv coding (LZ77) with a 32 bit CRC.

RFC 1952 defines the compressed data as:

The format presently uses the DEFLATE method of compression but can be easily extended to use other compression methods.

CRC-32 is slower than ADLER32

Compared to a cyclic redundancy check of the same length, it trades reliability for speed (preferring the latter).

So ... we have 2 compression mechanisms that use the same algorithm for compression, but a different algorithm for headers and checksum.

Now, the underlying TCP packets are already pretty reliable, so the issue here is not Adler 32 vs CRC-32 that GZIP uses.


Turns out many browsers over the years implemented an incorrect deflate algorithm. Instead of expecting the zlib header in RFC 1950 they simply expected the compressed payload. Similarly various web servers made the same mistake.

So, over the years browsers started implementing a fuzzy logic deflate implementation, they try for zlib header and adler checksum, if that fails they try for payload.

The result of having complex logic like that is that it is often broken. Verve Studio have a user contributed test section that show how bad the situation is.

For example: deflate works in Safari 4.0 but is broken in Safari 5.1, it also always has issues on IE.


So, best thing to do is avoid deflate altogether, the minor speed boost (due to adler 32) is not worth the risk of broken payloads.

Solution 2

GZip is simply deflate plus a checksum and header/footer. Deflate is faster, though, as I learned the hard way.

gzip vs deflate graph

Solution 3

You are likely not able to actually pick deflate as an option. Contrary to what you may expect mod_deflate is not using deflate but gzip. So while most of the points made are valid it likely is not relevant for most.

Solution 4

I think there's no big difference between deflate and gzip, because gzip basically is just a header wrapped around deflate (see RFCs 1951 and 1952).

Solution 5

The main reason is that deflate is faster to encode than gzip and on a busy server that might make a difference. With static pages it's a different question, since they can easily be pre-compressed once.

Share:
137,281

Related videos on Youtube

Ken
Author by

Ken

Web programmer at cms.scot - 20 years of experience working with custom code bases for a variety of accommodation, retail, mapping and weather sites. Mainly PHP/MySQL on custom sites, incorporating the likes of CS-Cart, WordPress, Google Maps and including custom back end integrations with external services (Booking.com, Italian train and ticket services, Stripe payments...). Conscientious version control (currently SVN) and more pragmatic Unit Testing and object-oriented programming. Remote worker with extensive experience of dealing with external programmers and providers, sales, accounting and office staff in both English and Italian. Current tools (although always happy to learn something new): Ubuntu terminal/Vim with scripting (Bash/PHP) to handle automation, data processing and to integrate with remote sites (SSH, Curl, etc.)

Updated on May 27, 2020

Comments

  • Ken
    Ken almost 4 years

    What advantages do either method offer for html, css and javascript files served by a LAMP server. Are there better alternatives?

    The server provides information to a map application using Json, so a high volume of small files.

    See also Is there any performance hit involved in choosing gzip over deflate for http compression?

    • Ken
      Ken over 11 years
      switched accepted answers... the current consensus is two to one in favour of gzip
    • SPRBRN
      SPRBRN over 9 years
      mod_deflate is for Apache 2, mod_gzip is for Apache 1.3.
  • Dijar
    Dijar over 15 years
    Seems to be mainly the checksum, which would match with the smaller difference on faster systems.
  • Admin
    Admin about 15 years
    On the Intel chips it doesn't make a bit of difference, since they get that fancy SSE4.2 "CRC32" instruction...
  • Jack Lloyd
    Jack Lloyd over 14 years
    Not to mention that zlib doesn't have support for the extension, and even if it did, the CRC32 instruction in SSE 4.2 uses the polynomial 1EDC6F41, and the gzip format uses the polynomial EDB88320 - totally different algorithms, effectively.
  • Jack Lloyd
    Jack Lloyd over 14 years
    In the gzip format, the checksum comes at the end of the file, specifically so one can start writing deflate blocks as they are processed without having to hold everything up.
  • David Murdoch
    David Murdoch over 14 years
    gzip is deflate with a header. And HTTP 1.1 deflate is actually zlib (which is also a wrapper around deflate)
  • Seun Osewa
    Seun Osewa over 14 years
    GZip probably checks the checksum on decompression, hence the speed difference for decompression.
  • David Murdoch
    David Murdoch over 13 years
    And since deflate is faster, why is SO using gzip?
  • Randy the Dev
    Randy the Dev over 13 years
    Deflate is actually the zlib format, which is raw deflate data with a much smaller header and the computationally cheaper Adler32 checksum on the end.
  • Sam Saffron
    Sam Saffron about 12 years
    Well, this answer turns out to be incorrect ... see: zoompf.com/blog/2012/02/lose-the-wait-http-compression ... in particular client have 2 ways they can "interpret" deflate, headerless/checksumless and with zlib header. The implementation across browsers of a correct deflate is bad. deflate should be avoided.
  • Peter Burns
    Peter Burns about 12 years
    @sam additionally I just re-ran the benchmarks and on a modern Intel chip, I get gzip 1441/692 and deflate 1286/531. Second number is decompress, first is compress. So deflate is still faster, do your benchmarks show otherwise? (I agree it may not be useful for other reasons, but the answer is correct, deflate is faster..)
  • Ken
    Ken almost 12 years
    @JeffAtwood but the question wasn't faster?
  • Pacerier
    Pacerier almost 12 years
    Shouldn't there be a new standard that combines adler32 with gzip?
  • Evgeniy Berezovsky
    Evgeniy Berezovsky almost 10 years
    @Ken Your question was: What advantages ... for ... files served by LAMP server?. And performance is one of the issues on the server side for anyone who's successful in attracting lots of requests. So Jeff's performance comparison - the main point of his answer - is pertinent. Coincidentally Sam's comment does not address it at all. I guess he's just criticizing the GZip is simply deflate plus a checksum and header/footer intro, which might be technically wrong.
  • Gert van den Berg
    Gert van den Berg almost 10 years
    Except that that enables gzip, since mod_deflate confusingly implements only gzip compression...
  • Xegara
    Xegara over 7 years
    @Sam Saffron, does this mean if the web browser is not in the picture, I can use deflate over gzip? For instance, if I'm going to upload a compressed file to my FTP server.
  • mar
    mar almost 7 years
    Sorry but you seem to be comparing apples to oranges. Deflate is a compression algorithm while GZip is a file format. The only difference is that GZip computes checksum (CRC-32).
  • mar
    mar almost 7 years
    CRC-32 is slow, Adler32 is slow as well. These can easily account for those extra 20-40%. So naturally, no checksum is faster. But then you also can't detect corrupt streams.
  • Mark Adler
    Mark Adler over 6 years
    Another very minor difference is that the zlib wrapper is six bytes vs. 18 bytes for gzip. So for very small packets, there may be an advantage to sending 12 fewer bytes. The conclusion does not change however, which is that due to Microsoft screwing it up for everyone by misinterpreting what "deflate" meant in what they delivered on their IIS servers, it is easier to just use the gzip format.
  • aidan
    aidan over 6 years
    @GertvandenBerg I've updated my answer, but for the record, gzip is deflate, just with extra headers and a checksum
  • Gert van den Berg
    Gert van den Berg over 6 years
    @aiden yep but the checksum has a performance impact... (and raw deflate is not standard compliant)
  • user1095108
    user1095108 over 5 years
    But how could the payload possibly be broken, if it is transmitted using TCP? The whole idea of TCP is to transmit unbroken payloads.
  • ihebiheb
    ihebiheb almost 5 years
    This answer date from 2012. So does modern browsers still suffer from the issue of the incorrect implementation of the deflate algorithms or is it safe to use it now ? Is this part of the answer still up to date ?
  • GChuf
    GChuf over 4 years
    For whoever is wondering, with deflate my text files go from 30KB to 10KB - so the files have to be even smaller than that to not get any savings. I am guessing less than 1KB or something similar.