10

I am downloading a file using requests:

import requests

req = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
    for chunk in req.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)
            f.flush()

The problem with gzip files is that they being automatically decoded by requests, hence i get the unpacked file on disk, while i need the original file.

Is there a way to tell requests not to do this?

funkifunki
  • 1,149
  • 2
  • 13
  • 24
  • Here's what I found when Googling "python requests gzip": ["Requests automatically decompresses gzip-encoded responses ... You can get direct access to the raw response (and even the socket), if needed as well."](http://docs.python-requests.org/en/latest/community/faq/#encoded-data) So then I searched the docs for raw responses and found [`requests.Response.raw`](http://docs.python-requests.org/en/latest/api/?highlight=raw#requests.Response.raw); perhaps that's what you need? – Dan Lenski Sep 09 '14 at 16:19
  • This might help http://stackoverflow.com/questions/18364193/python-requests-disable-auto-decoding – yonili Sep 09 '14 at 16:20
  • 1
    The code you've displayed correctly downloads the `.gz` file for me. What server are you using? What is the value of `req.headers`? Is the URL you are downloading publicly available, and can you share it with us? – Robᵩ Sep 09 '14 at 16:20
  • Just to be sure, can you tell us how you determined that you are getting the unpacked file on the disk? – Robᵩ Sep 09 '14 at 16:29
  • 1
    Can you try your code with `url='https://wiki.mozilla.org/images/f/ff/Example.json.gz'` and `local_filename='Example.json.gz'`? Does that still automatically decompress? – Robᵩ Sep 09 '14 at 16:33
  • @DanLenski, you're right, this works perfectly, please, put it as an answer so that i could accept it. – funkifunki Sep 12 '14 at 13:55
  • @Robᵩ it is easy to spot the difference in the file size. Besides, it downloads .txt.gz just fine, but any simple text editor (like nano), displays the text, not archived content. Anyway, the use of request.raw.stream fixes the problem – funkifunki Sep 12 '14 at 13:58
  • @funkifunki what python are you using ? I tried your code in python2 and it doesn't decode the file – Ciprian Tomoiagă Feb 12 '17 at 16:36
  • it was long ago, cannot recall for sure, but most probably it was 2.6. also, the answer of Dan Lenski has solved the issue – funkifunki Feb 13 '17 at 08:41

2 Answers2

8
import requests

r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
    for chunk in r.raw.stream(1024, decode_content=False):
        if chunk:
            f.write(chunk)

This way, you will avoid automatic decompress of gzip-encoded response, save it to file as it's received from web server, chunk by chunk.

Boban P.
  • 183
  • 2
  • 5
6

As discussed in the comments above, this seems to have solved the issue:

From the docs for the requests module:

Requests automatically decompresses gzip-encoded responses ... You can get direct access to the raw response (and even the socket), if needed as well.

Searching the docs for "raw responses" yields requests.Response.raw, which gives a file-like representation of the raw response stream.

Dan Lenski
  • 76,929
  • 13
  • 76
  • 124