Compressing request body with python-requests?

14,050

Solution 1

Or perhaps there is a mistake in my approach?

I'm unsure how you arrived at your approach, frankly, but there's certainly a simpler way of doing this.

First, a few things:

  1. The files parameter constructs a multipart/form-data body. So you're compressing something that the server potentially has no clue about.
  2. Content-Encoding and Transfer-Encoding are two very different things. You want Transfer-Encoding here.
  3. You don't need to set a suffix on your NamedTemporaryFile.
  4. Since you didn't explicitly mention that you're trying to compress a multipart/form-data request, I'm going to assume that you don't actually want to do that.
  5. Your call to session.Request (which I assume should be, requests.Request) is missing a method, i.e., it should be: requests.Request('POST', endpoint_url, ...)

With those out of the way, here's how I would do this:

# Assuming `file` is a file-like obj
with NamedTemporaryFile(delete=True) as gzfile:
    gzip.GzipFile(fileobj=gzfile, mode="wb").write(file.read())
    headers = {'Content-Length': str(gzfile.tell()),
               'Transfer-Encoding': 'gzip'}
    gzfile.seek(0, 0)
    response = session.post(endpoint_url, data=gzfile, 
                            headers=headers)

Assuming that file has the xml content in it and all you meant was to compress it, this should work for you. You probably want to set a Content-Type header though, for example, you'd just do

 headers = {'Content-Length': gzfile.tell(),
            'Content-Type': 'application/xml',  # or 'text/xml'
            'Transfer-Encoding': 'gzip'}

The Transfer-Encoding tells the server that the request is being compressed only in transit and it should uncompress it. The Content-Type tells the server how to handle the content once the Transfer-Encoding has been handled. 

Solution 2

I had a question that was marked as an exact duplicate. I was concernd with both ends of the transaction.

The code from sigmavirus24 wasn't a direct cut and paste fix, but it was the inspiration for this version.

Here's how my solution ended up looking:

sending from the python end

import json
import requests
import StringIO
import gzip

url = "http://localhost:3000"
headers = {"Content-Type":"application/octet-stream"}
data = [{"key": 1,"otherKey": "2"},
        {"key": 3,"otherKey": "4"}]

payload = json.dumps(data)

out = StringIO.StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
  f.write(json.dumps(data))
out.getvalue()

r = requests.post(url+"/zipped", data=out.getvalue(), headers=headers)

receiving at the express end

var zlib = require("zlib");
var rawParser = bodyParser.raw({type: '*/*'});

app.post('/zipped', rawParser, function(req, res) {

    zlib.gunzip(req.body, function(err, buf) {
        if(err){
            console.log("err:", err );
        } else{
            console.log("in the inflate callback:",
                        buf,
                        "to string:", buf.toString("utf8") );
        }
    });

    res.status(200).send("I'm in ur zipped route");
});

There's a gist here with more verbose logging included. This version doesn't have any safety or checking built in either.

Share:
14,050
Dan Lenski
Author by

Dan Lenski

Python and C pro, unafraid of assembly, Java, and Kotlin, and Rust and C# are growing on me when I have time. I don't always develop database-driven applications, but when I do, I prefer PostgreSQL. I'm a Linux guy, a Git fanboy, a fan of Debian and its derivatives, and usually a cheerleader for solving problems with free/open-source software. I've reverse-engineered a couple of interesting protocols and written cleanroom documentation and reimplementations. I've used Python, C, Perl, C#, and x86 assembly language professionally. I've dabbled in hardware projects with Verilog as well as MIPS, PIC, and AVR assembly. In terms of human languages, I speak English, French, Spanish, and some Hebrew. I have a PhD in semiconductor physics, and worked as a process engineer at Intel, and have also worked in the speech synthesis and magnetic storage industries.

Updated on June 09, 2022

Comments

  • Dan Lenski
    Dan Lenski almost 2 years

    (This question is not about transparent decompression of gzip-encoded responses from a web server; I know that requests handles that automatically.)

    Problem

    I'm trying to POST a file to a RESTful web service. Obviously, requests makes this pretty easy to do:

    files = dict(data=(fn, file))
    response = session.post(endpoint_url, files=files)
    

    In this case, my file is in a really highly-compressible format (yep, XML) so I'd like to make sure that the request body is compressed.

    The server claims to accept gzip encoding (Accept-Encoding: gzip in response headers), so I should be able to gzip the whole body request body, right?

    Attempted solution

    Here's my attempt to make this work: I first construct the request and prepare it, then I go into the PreparedRequest object, yank out the body, run it through gzip, and put it back. (Oh, and don't forget to update the Content-Length and Content-Encoding headers.)

    files = dict(data=(fn, file))
    request = request.Request('POST',endpoint_url, files=files)
    
    prepped = session.prepare_request(request)
    with NamedTemporaryFile(delete=True) as gzfile:
        gzip.GzipFile(fileobj=gzfile, mode="wb").write(prepped.body)
        prepped.headers['Content-Length'] = gzfile.tell()
        prepped.headers['Content-Encoding'] = 'gzip'
        gzfile.seek(0,0)
        prepped.body = gzfile.read()
        response = session.send(prepped)
    

    Unfortunately, the server is not cooperating and returns 500 Internal Server Error. Perhaps it doesn't really accept gzip-encoded requests?

    Or perhaps there is a mistake in my approach? It seems rather convoluted. Is there an easier way to do request body compression with python-requests?

    EDIT: Fixed (3) and (5) from @sigmavirus24's answer (these were basically just artifacts I'd overlooked in simplifying the code to post it here).

  • Dan Lenski
    Dan Lenski about 9 years
    Thanks very much for the answer! I fixed #3 and #5, which were just artifacts of my attempt at simplifying the code to post it. As for #1 and #4, I'm aware that I'm posting a multipart/form-data body: the service requires this in order to accept the file in its uncompressed form. You're right that I'm definitely confused about the semantics of Content-Encoding as opposed to Transfer-Encoding. Here is one of the clearer discussions that I've found... any better explanation out there?
  • Dan Lenski
    Dan Lenski about 9 years
    I was not worrying about the fact that the body is a multipart/form-data because I thought the effect of Content-Encoding: gzip would be to tell the server to transparently gunzip the entire request body before handling it in any way. Seems like this is a faulty assumption?
  • Ian Stapleton Cordasco
    Ian Stapleton Cordasco about 9 years
    Yeah, your original code will mostly work then, but you'll want to set the Transfer-Encoding. Alternatively, if you don't want to hijack the request preparation part of requests, you can use the requests-toolbelt to generate the body and appropriate headers and then gzip it with the proper Transfer-Encoding.
  • Julian Reschke
    Julian Reschke over 7 years
    I don't believe T-E is going to help here; it has even less support than Content-Encoding in requests (and it also doesn't exist in HTTP/2).