I am trying to work through an exercise in a Practical Data Analysis book where the goal is to scrape the price of gold from a website. The original code does not work and I have traced it down to what I think is a re-working of the website from the time of the original script.
To try to still get the exercise to work I have been working on revamping the script a bit:
from bs4 import BeautifulSoup
import requests
import re
from time import sleep
from datetime import datetime
def getGoldPrice():
url = "http://www.gold.org"
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")
price = soup.find_all("dd", class_="value")[1]
return price
with open("goldPrice.out","w") as f:
for x in range(0,3):
sNow = datetime.now().strftime("%I:%M:%S%p")
f.write("{0}, {1} \n ".format(sNow, getGoldPrice()))
sleep(59)
This worked for the initial part until I realized it was not pulling the active tags updating every minute (the original goal). After doing a bit more research I found out that I could dig into that a bit more with a
soup.find('script', type="text/javascript").text
in place of the .find_all() usage and run a regex on the script.
This worked very well with the exception of the original posts regex so I was working on figuring out what to use to get the price for the "ask" group. When I went back to call this updated regex on the file my expression no longer provided the same base result.
Currently if I do a
soup.find_all('script', type="text/javascript")
I get a different set of results than with a
soup.find('script', type="text/javascript").text
unfortunately I can't seem to take the soup.find_all result into a .text command like I can for the soup.find command. Is there a portion of this command that I am missing that I am getting such different results?
Thanks for the help!
EDIT: Using the help from the answer I ended up using the following bits of line to replace the price
component to get what I was looking for!
js_text = soup.find_all('script', type="text/javascript")[10]
js_text = js_text.string
regex = re.compile('"ask":{"css":"minus","price":"(.*)","performance":-1}},"G')
price = re.findall(regex, js_text)
Admittedly my regex is very specific to my problem.