How to download .gz files with requests in Python without decoding it?

Question

I am downloading a file using requests:

import requests

req = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
    for chunk in req.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)
            f.flush()

The problem with gzip files is that they being automatically decoded by requests, hence i get the unpacked file on disk, while i need the original file.

Is there a way to tell requests not to do this?

Here's what I found when Googling "python requests gzip": ["Requests automatically decompresses gzip-encoded responses ... You can get direct access to the raw response (and even the socket), if needed as well."](http://docs.python-requests.org/en/latest/community/faq/#encoded-data) So then I searched the docs for raw responses and found [`requests.Response.raw`](http://docs.python-requests.org/en/latest/api/?highlight=raw#requests.Response.raw); perhaps that's what you need? — Dan Lenski, Sep 09 '14 at 16:19
This might help http://stackoverflow.com/questions/18364193/python-requests-disable-auto-decoding — yonili, Sep 09 '14 at 16:20
The code you've displayed correctly downloads the `.gz` file for me. What server are you using? What is the value of `req.headers`? Is the URL you are downloading publicly available, and can you share it with us? — Robᵩ, Sep 09 '14 at 16:20
Just to be sure, can you tell us how you determined that you are getting the unpacked file on the disk? — Robᵩ, Sep 09 '14 at 16:29
Can you try your code with `url='https://wiki.mozilla.org/images/f/ff/Example.json.gz'` and `local_filename='Example.json.gz'`? Does that still automatically decompress? — Robᵩ, Sep 09 '14 at 16:33
@DanLenski, you're right, this works perfectly, please, put it as an answer so that i could accept it. — funkifunki, Sep 12 '14 at 13:55
@Robᵩ it is easy to spot the difference in the file size. Besides, it downloads .txt.gz just fine, but any simple text editor (like nano), displays the text, not archived content. Anyway, the use of request.raw.stream fixes the problem — funkifunki, Sep 12 '14 at 13:58
@funkifunki what python are you using ? I tried your code in python2 and it doesn't decode the file — Ciprian Tomoiagă, Feb 12 '17 at 16:36
it was long ago, cannot recall for sure, but most probably it was 2.6. also, the answer of Dan Lenski has solved the issue — funkifunki, Feb 13 '17 at 08:41

Boban P. · Answer 1 · 2020-06-05T15:57:48.910

8

import requests

r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
    for chunk in r.raw.stream(1024, decode_content=False):
        if chunk:
            f.write(chunk)

This way, you will avoid automatic decompress of gzip-encoded response, save it to file as it's received from web server, chunk by chunk.

edited Jun 05 '20 at 15:57

answered Jun 04 '20 at 18:51

Boban P.

183
2
5

score 6 · Accepted Answer · answered Sep 12 '14 at 15:27

As discussed in the comments above, this seems to have solved the issue:

From the docs for the requests module:

Requests automatically decompresses gzip-encoded responses ... You can get direct access to the raw response (and even the socket), if needed as well.

Searching the docs for "raw responses" yields requests.Response.raw, which gives a file-like representation of the raw response stream.

How to download .gz files with requests in Python without decoding it?

2 Answers2