1

I'm kinda new to python but I'm trying to make a web scraper script where it downloads all the pictures on a website. I'm using requests and PyQuery since many people recommended it after some research. This is all I have right now and I'm not sure where to go.

r = requests.get("some url")
images = pq(r.text)
for image in images.find("img"):

I know that I need to get the source of the img but how do I do that after finding the img tags? Also, I've viewed the page source of some htmls and some pictures are stored on their database so the src starts with "/"some extension" so I was wondering how I would be able to get the full url.

Took
  • 11
  • 2
  • you can check my answer here https://stackoverflow.com/questions/43982002/extract-src-attribute-from-img-tag-using-beautifulsoup/47166671#47166671 which is similar to yours. – Abu Shoeb Mar 02 '18 at 01:18

1 Answers1

0

(python3)

from pyquery import PyQuery as pq
import requests
from urllib.parse import urljoin

url = "..."
response = requests.get(url).text
for image in pq(response)("img") :
    imgurl = urljoin(url,image.get("src"))

In your defense, the pyquery docs seem out of date. urllib takes care of merging relative urls into absolute ones.

xavier
  • 877
  • 6
  • 13