Requests Gzip HTTP download and write to disk

10,007

Solution 1

try:
    response = requests.get(url, paramDict)
except Exception as e:
    print(e)

data = zlib.decompress(response.content, zlib.MAX_WBITS|32)

with open('outFileName.txt','w') as outFile:
    outFile.write(data)

Here is the code that I wrote that ended up working. It is as sigmavirus said: the file was gzipped to begin with. I knew this fact, but did not describe it clearly enough apparently as I kept read/writing the gzipped bytes.

Using the zlib module, I was able to decompress the content of the response all at one time into the data variable; I then wrote that variable containing the decompressed data into a file.

I'm not sure if this is the best or most pythonic way to do this, but it worked. If anyone can enlighten me as to why I cannot gzip.open this content (perhaps I needed to use an alternative method, I tried gzipstream library to no avail), I would appreciate any explanations, but I do consider this question answered.

Thanks to everyone who helped me, even if you didn't have the solution, you helped encourage me to persevere!

Solution 2

So the combination here of stream=True and iter_content is what is causing your problems. What you might want to do is something akin to this (to preserve the streaming behaviour):

try:
    response = requests.get(url, params=paramDict, stream=True)
except Exception as e:
    print(e)

raw = response.raw
with open(outName, 'wb') as out_file
    while True:
        chunk = raw.read(1024, decode_content=True)
        if not chunk:
            break
        out_file.write(chunk)

Note that you still want to use bytes because you haven't determined the character encoding of the content so you still have bytes but you're no longer dealing with the gzipped bytes.

Share:
10,007
jaxas
Author by

jaxas

Updated on June 09, 2022

Comments

  • jaxas
    jaxas almost 2 years

    I'm using the requests library and python 2.7 to download a gzipped text file from a web api. Using the code below, I'm able to successfully send a get request and, judging from the headers, receive a response in the formed of the gzip file.

    I know Requests decompresses these files for you automatically if it detects from the header that the response is gzipped. I wanted to take that download in the form of a file stream and write the contents to disk for storage and future analysis.

    When I get open the resulting file in my working directory however I get characters like this: —}}¶— Q@Ï 'õ

    For reference, some of the response headers include 'Content-Encoding': 'gzip', 'Content-Type': 'application/download', 'Accept-Encoding,User-Agent'

    Am I wrong to write in binary? Am I not encoding the text correctly(ie. could it be ASCII vs utf-8)? There is no apparent character encoding noted in the response headers.

    try:
        response = requests.get(url, paramDict, stream=True)
    except Exception as e:
        print(e)
    
    with open(outName, 'wb') as out_file:
        for chunk in response.iter_content(chunk_size=1024):
            out_file.write(chunk)
    

    EDIT 3.30.2016: Now I've changed my code a little bit to utilize gzipstream library. I tried using the stream to read the entirety of the Gzipped text file that is in my response content:

    with open(outName, 'wb') as out_file, GzipStreamFile(response.content) as fileStream:
        streamContent = fileStream.read()
        out_file.write(streamContent)
    

    I then received this error: out_file.write(streamContent) AttributeError: '_GzipStreamFile' object has no attribute 'close'

    The output was an empty text file with the file name as anticipated. Do I need to initialize my streamContent variable outside of the with block so that it doesn't automatically try to call a close method at the end of the block?

    EDIT 4.1.2016 Just thought I'd clarify that this DOES NOT have to be a stream, that was just one solution I encountered. I just want to make a daily request for this gzipped file and have it saved locally in plaintext