0

I have the following search query in Google: https://www.google.com/search?q=bonpland&tbm=isch&hl=en-US&tbs=qdr:w

This search returns all images found in the last week for the search term bonpland. Now I want to have all this HTML or image links and image redirections returned to my Python console, using the get requests library. If I run this URL in my browser, it shows initially some ~108 images. If I click one of the images, more load, and if i scroll down more and more are loaded until ~450 images are loaded then a Show more results button is prompted. Once clicked, another ~480 images load, so lets say roughly a thousand images are found with this query.

However, when I run the get command in Python as shown below, only 49 original images are returned:

import requests

headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
 'Connection': 'keep-alive',
 'DNT': '1',
 'Accept-Language': 'en-US,en;q=0.5',
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36 OPR/55.0.2994.37',
     'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',}

response = requests.get(
    'https://www.google.com/search?q=bonpland&tbm=isch&hl=en-US&tbs=qdr:w',
    headers=headers,
)
soup = BeautifulSoup(response.text, 'html.parser')
soup

Is there any way we can modify the URL to return all links, or modify the code such that we can retrieve all results with this library? I have tried to modify the URL in several ways without success.

I tried to scroll down and see in the network what happens, it seems like a post response, which returns a json, and I can recreate this in Python, but I seem to be unable to decode this json response, nor be able to think of some logic to generate these requests myself:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36 OPR/55.0.2994.37',
    'Accept': '*/*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Referer': 'https://www.google.com/',
    'X-Same-Domain': '1',
    'x-goog-ext-190139975-jspb': '["NL","ZZ","KgKka8TAAAFqmWCfx71ZfQ=="]',
    'Content-Type': 'application/x-www-form-urlencoded;charset=utf-8',
    'Origin': 'https://www.google.com',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin'
}

params = {
    'rpcids': 'HoAMBc',
    'source-path': '/search',
    'f.sid': '2747221314367002709',
    'bl': 'boq_visualfrontendserver_20230813.07_p1',
    'hl': 'en-US',
    'authuser': '0',
    'soc-app': '162',
    'soc-platform': '1',
    'soc-device': '1',
    '_reqid': '303769',
    'rt': 'c',
}

data = 'f.req=%5B%5B%5B%22HoAMBc%22%2C%22%5Bnull%2Cnull%2C%5B3%2Cnull%2C4294967246%2C1%2C3766%2C%5B%5B%5C%22CYxr5OmPOtywOM%5C%22%2C259%2C194%2C536870912%5D%2C%5B%5C%222BSR5sBuDzSHqM%5C%22%2C306%2C165%2C0%5D%2C%5B%5C%22ZA_122FexBY1nM%5C%22%2C268%2C188%2C34340864%5D%2C%5B%5C%22-3b9ovO7KQ_dYM%5C%22%2C275%2C183%2C10485760%5D%2C%5B%5C%220iXGzZD6KO-t_M%5C%22%2C275%2C183%2C444596224%5D%2C%5B%5C%22hO_2vHlaM5M5mM%5C%22%2C277%2C182%2C0%5D%2C%5B%5C%22eLrlE2L34a8f8M%5C%22%2C323%2C156%2C0%5D%2C%5B%5C%22Ahf1fxWknMx_AM%5C%22%2C259%2C195%2C956301312%5D%2C%5B%5C%22Nv1VenvVudaghM%5C%22%2C261%2C193%2C134217728%5D%2C%5B%5C%22Q9QbJWUxHV4hnM%5C%22%2C171%2C295%2C179568640%5D%2C%5B%5C%22FOerVX6mz_YP4M%5C%22%2C225%2C225%2C-2147483648%5D%2C%5B%5C%22kCWjgzlqhj6N8M%5C%22%2C225%2C225%2C-1257766912%5D%5D%2C%5B%5D%2C%5B%5D%2Cnull%2Cnull%2Cnull%2C0%5D%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2C%5B%5C%22bonpland%5C%22%2C%5C%22en-US%5C%22%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2C%5C%22qdr%3Aw%5C%22%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2C%5B%5D%5D%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2Cnull%2C%5Bnull%2C%5C%22CAM%3D%5C%22%2C%5C%22GKwCIAA%3D%5C%22%5D%5D%22%2Cnull%2C%22generic%22%5D%5D%5D&at=AAuQa1qdstatNh2yQw-sJIcvETC_%3A1692054165315&'

response = requests.post(
    'https://www.google.com/_/VisualFrontendUi/data/batchexecute',
    params=params,
    headers=headers,
    data=data,
)

response.content

Returns:

b')]}\'\n\n128460\n[["wrb.fr","HoAMBc","[null,[],null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,[],null,null,null,null,false,null,null,null,null,null,null,null,null,null,null,null,null,null,[null,[[\\"/search?q\\\\u003dbonpland\\\\u0026source\\\\u003dlmns\\",null,null,\\"All\\",false,null,null,null,null,\\"WEB\\",[0],null,null,0],[\\"/search?q\\\\u003dbonpland\\\\u0026source\\\\u003dlmns\\\\u0026tbm\\\\u003disch\\",null,null,\\"Images\\",true,null,null,null,null,\\"IMAGES\\",[6],null,null,6]],[[\\"//maps.google.com/maps?q\\\\u003dbonpland\\\\u0026source\\\\u003dlmns\\\\u0026entry\\\\u003dmt\\",null,null,\\"Maps\\",false,null,null,null,null,\\"MAPS\\",[8],null,null,8],[\\"/search?q\\\\u003dbonpland\\\\u0026source\\\\u003dlmns\\\\u0026tbm\\\\u003dvid\\",null,null,\\"Videos\\",false,null,null,null,null,\\"VIDEOS\\",[13],null,null,13],[\\"/search?q\\\\u003dbonpland\\\\u0026source\\\\u003dlmns\\\\u0026tbm\\\\u003dnws\\",null,null,\\"News\\",false,null,null,null,null,\\"NEWS\\",[10],null,null,10],[\\"/search?q\\\\u003dbonpland\\\\u0026source\\\\u003dlmns\\\\u0026tbm\\\\u003dbks\\",null,null,\\"Books\\",false,null,null,null,null,\\"BOOKS\\",[2],null,null,2],[\\"/travel/flights?q\\\\u003dbonpland\\\\u0026source\\\\u003dlmns\\\\u0026tbm\\\\u003dflm\\",null,null,\\"Flights\\",false,null,null,null,null,\\"FLIGHTS\\",[20],null,null,20],[\\"/search?q\\\\u003dbonpland\\\\u0026source\\\\u003dlmns\\\\u0026tbm\\\\u003dfin\\",null,null,\\"Finance\\",false,null,null,null,null,\\"FINANCE\\",[22],null,null,22]]],0,null,null,null,null,null,null,null,null,null,true,null,null,null,[[false],false,null,null,null,null,[true,false],true,null,0.564668],false,[[{\\"444381080\\":[]}],[[[[{\\"444383007\\":[7,null,null,null,null,null,null,\\"b-GRID_STATE0\\",-1,null,null,null,[\\"GRID_STATE0\\",null,null,null,null,null,1,[],null,null,null,[4,null,4294966996,1,3766,[[\\"mroB5K80ptCTuM\\",259,194,16777216],[\\"c1GanxYo-04FqM\\",299,168,117440512],[\\"CwPZkiIyxII1IM\\",259,194,524288],[\\"K-EOlTyweDVg8M\\",275,183,16777216],[\\"7aHfaSFG7gX8iM\\",225,225,-1874853888],[\\"cgME6WrQ91TqbM\\",264,191,262144],[\\"zYwz1EEHHPaiqM\\",225,225,-1090519040],[\\"eTmT6kDpI3uIQM\\",223,226,0],[\\"Q0rGwJ3za5hy4M\\",276,183,50593792],[\\"eiNmj70lbdjNUM\\",260,194,17563648],[\\"7g0dolqpZILMhM\\",300,168,1040187392],[\\"UfVRJEEcMgRyyM\\",183,275,-1074003968],[\\"7v0HRR8xWKSvLM\\",215,234,2097152],[\\"wpRPgCuU5zJAsM\\",300,168,-1788084224],[\\"ttZT8wyA9AIcpM\\",193,261,17039360]],null,null,null,null,null,0],null,null,null,null,[true,null,null,\\"CAQ\\\\u003d\\",\\"GJADIAA\\\\u003d\\"],null,null,null,null,null,null,null,null,null,20],[[1692054866047747,117621638,1745567696],null,null,null,null,[[1]]]]}],[[[[{\\"444383007\\":[1,[0,\\"cjxY8tC9TPK3gM\\",[\\"https://encrypted-tbn0.gstatic.com/images?q\\\\u003dtbn:ANd9GcQkdUVoe0SeMM_uE_oUKymwnw4XFeg5IQ_a0xmxkByykYSCPGI1icA-E1WsxqOzfqSEvb8\\\\u0026usqp\\\\u003dCAU\\",159,316],[\\"https://www.rematadores.com/rematadores/remates/2023/27986_5.jpg\\",576,1140],null,0,\\"rgb(240,240,221)\\",null,false,null,null,null,null,null,null,null,null,null,null,null,null,false,false,null,false,{\\"2001\\":[null,null,null,0,0,0,0,true,false],\\"2003\\":[\\"6 days ago\\",\\"N-SEEMfLWqwvkM
Rivered
  • 741
  • 7
  • 27

1 Answers1

1

Although I cannot give you the correct answer, I will give you some directions that will hopefully set you on the right path.

Another related topic recommends to monitor network activity and/or use selenium. Note: with selenium you can quite easily load all the images by scrolling down (automatically). As being said here: Scrape entire scrolling-load page with Python Requests. (try to look for similar questions before posting)

If you really want to use a request you should look into XMLHttpRequest. An informative page is: https://www.accordbox.com/blog/how-crawl-infinite-scrolling-pages-using-python/

Hopefully this helps you

statisjar
  • 11
  • 2