i just started programming. I have the task to extract data from a HTML page to Excel. Using Python 3.7. My Problem is, that i have a website, whith more urls inside. Behind these urls again more urls. I need the data behind the third url. My first Problem would be, how i can dictate the programm to choose only specific links from an ul rather then every ul on the page?
from bs4 import BeautifulSoup
import urllib
import requests
import re
page = urllib.request.urlopen("file").read()
soup = BeautifulSoup(page, "html.parser")
print(soup.prettify())
for link in soup.find_all("a", href=re.compile("katalog_")):
links= link.get("href")
if "katalog" in links:
for link in soup.find_all("a", href=re.compile("alle_")):
links = link.get("href")