6

Using http.client in Python 3.3+ (or any other builtin python HTTP client library), how can I read a chunked HTTP response exactly one HTTP chunk at a time?

I'm extending an existing test fixture (written in python using http.client) for a server which writes its response using HTTP's chunked transfer encoding. For the sake of simplicity, let's say that I'd like to be able to print a message whenever an HTTP chunk is received by the client.

My code follows a fairly standard pattern for reading a large response:

conn = http.client.HTTPConnection(...)
conn.request(...)
response = conn.getresponse()

resbody = []

while True:
    chunk = response.read(1024)
    if len(chunk):
        resbody.append(chunk)
    else:
        break

conn.close();

But this reads 1024 byte chunks regardless of whether or not the server is sending 10 byte chunks or 10MiB chunks.

What I'm looking for would be something like the following:

while True:
    chunk = response.readchunk()
    if len(chunk):
        resbody.append(chunk)
    else
        break

If this is not possible with http.client, is it possible with another builtin http client library? If it's not possible with a builtin client lib, is it possible with pip installable module?

Ben Burns
  • 14,978
  • 4
  • 35
  • 56

2 Answers2

7

I found it easier to use the requests library like so

r = requests.post(url, data=foo, headers=bar, stream=True)

for chunk in (r.raw.read_chunked()):
    print(chunk)
Preston
  • 1,300
  • 1
  • 17
  • 32
  • 1
    Thanks a lot. This is exactly what I was looking for. – KalC May 28 '19 at 04:04
  • 1
    i use this method: `for chunk in response.iter_content(chunk_size=maxsizepossible)` – Lei Yang Jun 22 '21 at 03:32
  • @Preston Does this apply backpressure to the `http `stream? That is, does that iterator get the next chunk over the network? Or, is the entire request read into memory? – lmonninger May 25 '22 at 17:59
  • @lmonninger I'm honestly not sure, accessing the raw data like this requires the use of stream=True which will keep the connection open until you have read all of the data. I assume read_chunked will force you to wait if it isn't finished, that's what my testing shows anyways. Note: official documentation recommends using a with statement for cases where you fail to read all data. – Preston Jun 29 '22 at 14:33
6

Update:

The benefit of chunked transfer encoding is to allow the transmission of dynamically generated content. Whether a HTTP library lets you read individual chunks or not is a separate issue (see RFC 2616 - Section 3.6.1).

I can see how what you are trying to do would be useful, but the standard python http client libraries don't do what you want without some hackery (see http.client and httplib).

What you are trying to do may be fine for use in your test fixture, but in the wild there are no guarantees. It is possible for the chunking of the data read by your client to be be different from the chunking of the data sent by your server. E.g. the data could have been "re-chunked" by a proxy server before it arrived (see RFC 2616 - Section 3.2 - Framing Techniques).


The trick is to tell the response object that it isn't chunked (resp.chunked = False) so that it returns the raw bytes. This allows you to parse the size and data of each chunk as it is returned.

import http.client

conn = http.client.HTTPConnection("localhost")
conn.request('GET', "/")
resp = conn.getresponse()
resp.chunked = False

def get_chunk_size():
    size_str = resp.read(2)
    while size_str[-2:] != b"\r\n":
        size_str += resp.read(1)
    return int(size_str[:-2], 16)

def get_chunk_data(chunk_size):
    data = resp.read(chunk_size)
    resp.read(2)
    return data

respbody = ""
while True:
    chunk_size = get_chunk_size()
    if (chunk_size == 0):
        break
    else:
        chunk_data = get_chunk_data(chunk_size)
        print("Chunk Received: " + chunk_data.decode())
        respbody += chunk_data.decode()

conn.close()
print(respbody)
Martin Carpenter
  • 5,893
  • 1
  • 28
  • 32
poida
  • 3,403
  • 26
  • 26
  • 2
    Upvoted because you actually answered my question. Didn't accept however because it's a bit of a hack. Benefit of chunked encoding should be the ability to read chunk-by-chunk. The fact that http.client supports chunked encoding but apparently doesn't expose it is a bit sad... – Ben Burns Dec 16 '14 at 10:28
  • 2
    Small bug: chunk size is represented in hex so should read `int(size_str[:-2], 16)`. See HTTP/1.1 https://tools.ietf.org/html/rfc7230#section-4.1 (I edited). Otherwise: works well enough, if hacky as Ben says. – Martin Carpenter Nov 01 '17 at 09:51