0

I need to get all the src of video banners from a YouTube page. By inspecting the page through XPath I see 171 src values but in selenium with XPath and CSS selector I could get some 120+ src values.

youtube page - https://www.youtube.com/@JohnWatsonRooney/videos


Banners = WebDriverWait(driver, 10).until(
    EC.presence_of_all_elements_located((
        By.XPATH, '//div[contains(@id,"dismissible")]/ytd-thumbnail/a')))

Above, Banner length is 171.


Banners_src = []
for element in Banners:
    Banners_src.append(element.find_element_by_xpath('.//img').get_attribute('src'))

Size of Banners_src is 123 (it keeps varying - can also be 121 or 122 sometimes).


Banners_Sample = driver.find_elements_by_xpath('//img[contains(@src, "https://i.ytimg.com")]') 

Size of Banners_sample is 123 (same result).

But when inspecting in browser using same XPath, I get 171 src values.

Driftr95
  • 4,572
  • 2
  • 9
  • 21
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jan 02 '23 at 17:32

1 Answers1

0

You should consider doing something like

xpBanners = '//div[contains(@id,"dismissible")]/ytd-thumbnail/a'
## WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, xpBanners)))
## scroll to load all Banners (will explain below)

Banners_imgs = driver.find_elements_by_xpath(f'{xpBanners}//img')
Banners_src = [b.get_attribute('src') for b in Banners_imgs]

instead of

Banners_src = [] 
for element in Banners:
    Banners_src.append(element.find_element_by_xpath('.//img').get_attribute('src'))

because element.find_element... behaves unexpectedly sometimes [see this and this for example].


On my browser, I start with just 30 //div[contains(@id,"dismissible")]/ytd-thumbnail/a and have to scroll to the bottom to load the next 30. Normally, I'd suggest that you either scroll to the last element using

# import time
scrollCt = 10 # 5 should be enough, but just in case...
scrollWait = 2 # in seconds - adjust according to loading speed
xpBanner = '//div[contains(@id,"dismissible")]/ytd-thumbnail/a'

## wait to load... as necessary
for x in range(scrollCt):
    lastBanner = driver.find_elements_by_xpath(xpBanner)[-1]
    driver.execute_script('arguments[0].scrollIntoView(false);', lastBanner)
    time.sleep(scrollWait)

or scroll to the bottom of the page with

for x in range(scrollCt):
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    time.sleep(scrollWait)    

But neither of those seem to work with YouTube pages... So instead, you could try to scrolling bit by bit with PGDN .

# import time
# from selenium.webdriver.common.keys import Keys

pgCt = 10 # adjust according to your browser (mine needs 7 - used 10 just in case)
scrollCt = 10 # 5 should be enough, but just in case...
scrollWait = 2 # in seconds - adjust according to loading speed

for x in range(scrollCt):
    driverD.find_element(By.XPATH, '//body').send_keys(Keys.PAGE_DOWN)
    if x%pgCt == 0: time.sleep(scrollWait)
Driftr95
  • 4,572
  • 2
  • 9
  • 21
  • Thanks for your suggestion. xpBanners size 50 Banners_images size 171 123 values out of 171 extracted. This is what I get as per ur suggestion. Still unable to get all 171 values. In the beginning of the script itself I scroll to end of the page then i collect the xpath values. – Siva Ranjjan Jan 04 '23 at 08:44
  • @SivaRanjjan I got 171 with both manual and `Keys.PAGE_DOWN` scrolling....how many times did you scroll to the bottom? Did you wait to see if more thumbnails loaded? Like I said each time you scroll to the bottom a few more loads and then you have to scroll to the bottom again....although 123 is a strange number. Btw, did you check in the inspect tab **of the selenium-driven window** how many `//div[contains(@id,"dismissible")]/ytd-thumbnail/a//img` there were? Did the page look different from when manually browsing? – Driftr95 Jan 04 '23 at 16:47