7

I'm scraping HTML pages and have set up a HTTP client like so:

client := *http.Client{
        Transport: &http.Transport{
            Dial: (&net.Dialer{
                Timeout:   30 * time.Second,
                KeepAlive: 30 * time.Second,
            }).Dial,
            TLSHandshakeTimeout:   10 * time.Second,
            ResponseHeaderTimeout: 10 * time.Second,
        },
    }

Now when I make GET requests of multiple URLs I don't want to get stuck with URLs that deliver massive amount of data.

response, err := client.Get(page.Url)
checkErr(err)
body, err := ioutil.ReadAll(response.Body)
checkErr(err)
page.Body = string(body)

Is there a way to limit the amount of data (bytes) the GET request accepts from a resource and stops?

kostix
  • 51,517
  • 14
  • 93
  • 176
KiwiJuicer
  • 1,952
  • 14
  • 28
  • Not an answer to your question, but related: if you want to limit the incoming request size (instead of the response of an outgoing request): [Limiting file size in FormFile](http://stackoverflow.com/questions/28073395/limiting-file-size-in-formfile) – icza Aug 10 '16 at 14:25

2 Answers2

27

Use an io.LimitedReader

A LimitedReader reads from R but limits the amount of data returned to just N bytes.

limitedReader := &io.LimitedReader{R: response.Body, N: limit}
body, err := io.ReadAll(limitedReader)

or

body, err := io.ReadAll(io.LimitReader(response.Body, limit))    
JimB
  • 104,193
  • 13
  • 262
  • 255
  • 6
    Or just simply call [`io.LimitReader()`](https://golang.org/pkg/io/#LimitReader) (so you don't need to know the internals of `io.LimitedReader`): `io.LimitReader(response.Body, limit)` – icza Aug 10 '16 at 14:20
  • @icza: thanks ;) I always forget that one, I think because it's not a `New*` function. – JimB Aug 10 '16 at 14:21
  • 1
    @JimB ...and maybe because it is not grouped under the `LimitedReader` type in the doc (because it returns `io.Reader` and not `io.LimitedReader`). – icza Aug 10 '16 at 14:31
1

You can use io.CopyN:

package main

import (
   "io"
   "net/http"
   "os"
)

func main() {
   r, e := http.Get("http://speedtest.lax.hivelocity.net")
   if e != nil {
      panic(e)
   }
   defer r.Body.Close()
   io.CopyN(os.Stdout, r.Body, 100)
}

Or Range header:

package main

import (
   "net/http"
   "os"
)

func main() {
   req, e := http.NewRequest("GET", "http://speedtest.lax.hivelocity.net", nil)
   if e != nil {
      panic(e)
   }
   req.Header.Set("Range", "bytes=0-99")
   res, e := new(http.Client).Do(req)
   if e != nil {
      panic(e)
   }
   defer res.Body.Close()
   os.Stdout.ReadFrom(res.Body)
}
Zombo
  • 1
  • 62
  • 391
  • 407