3

Specifically, I'm trying to scrape this entire page, but am only getting a portion of it. If I use:

r = requests.get('http://store.nike.com/us/en_us/pw/mens-shoes/7puZoi3?ipp=120')

it only gets the "visible" part of the page, since more items load as you scroll downwards.

I know there are some solutions in PyQT such as this, but is there a way to have python requests continuously scroll to the bottom of a webpage until all items load?

Jaroslav Bezděk
  • 6,967
  • 6
  • 29
  • 46
David Yang
  • 2,101
  • 13
  • 28
  • 46

1 Answers1

3

You could monitor page network activity with browser development console (F12 - Network in Chrome) to see what request does the page do when you scroll down, use that data and reproduce the request with requests. As an alternative, you can use selenium to control a browser programmatically to scroll down until page is ended, then save its HTML.

I guess I found the right request

Request URL:http://store.nike.com/html-services/gridwallData?country=US&lang_locale=en_US&gridwallPath=mens-shoes/7puZoi3&pn=3
Request Method:GET
Status Code:200 OK
Remote Address:87.245.221.98:80

Request Headers

Provisional headers are shown
Accept:application/json, text/javascript, */*; q=0.01
Referer:http://store.nike.com/us/en_us/pw/mens-shoes/7puZoi3?ipp=120
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
X-NewRelic-ID:VQYGVF5SCBAJVlFaAQIH
X-Requested-With:XMLHttpRequest

Seems that query parameter pn means the current "subpage". But you still need to understand the response correctly.

Andrew Che
  • 928
  • 7
  • 20