2

I am trying to work through an exercise in a Practical Data Analysis book where the goal is to scrape the price of gold from a website. The original code does not work and I have traced it down to what I think is a re-working of the website from the time of the original script.

To try to still get the exercise to work I have been working on revamping the script a bit:

from bs4 import BeautifulSoup
import requests
import re
from time import sleep
from datetime import datetime

def getGoldPrice():
    url = "http://www.gold.org"
    req = requests.get(url)
    soup = BeautifulSoup(req.text, "lxml")
    price = soup.find_all("dd", class_="value")[1]
    return price

with open("goldPrice.out","w") as f:
    for x in range(0,3):
        sNow = datetime.now().strftime("%I:%M:%S%p")
        f.write("{0}, {1} \n ".format(sNow, getGoldPrice()))
        sleep(59)

This worked for the initial part until I realized it was not pulling the active tags updating every minute (the original goal). After doing a bit more research I found out that I could dig into that a bit more with a

soup.find('script', type="text/javascript").text

in place of the .find_all() usage and run a regex on the script.

This worked very well with the exception of the original posts regex so I was working on figuring out what to use to get the price for the "ask" group. When I went back to call this updated regex on the file my expression no longer provided the same base result.

Currently if I do a

soup.find_all('script', type="text/javascript")

I get a different set of results than with a

soup.find('script', type="text/javascript").text

unfortunately I can't seem to take the soup.find_all result into a .text command like I can for the soup.find command. Is there a portion of this command that I am missing that I am getting such different results?

Thanks for the help!

EDIT: Using the help from the answer I ended up using the following bits of line to replace the price component to get what I was looking for!

js_text = soup.find_all('script', type="text/javascript")[10]
    js_text = js_text.string
    regex = re.compile('"ask":{"css":"minus","price":"(.*)","performance":-1}},"G')
    price = re.findall(regex, js_text)

Admittedly my regex is very specific to my problem.

Community
  • 1
  • 1

1 Answers1

2
for a in soup.find_all('script', type="text/javascript"):
    print(a.text)

find_all() will return a tag list like:

[tag1, tag2, tag3]

find() will only return the first tag:

tag1

if you want to get all the tag in the tag list, use for loop to iterate it.

宏杰李
  • 11,820
  • 2
  • 28
  • 35
  • 1
    Thanks for the help! The `for` loop did not get the result I was looking for but with the explanation of it being a list of tags vs a single tag I was able to get the correct index for the specific portion I was looking for and able to get a regex to narrow down what I needed! – Bryon Martinez Jan 28 '17 at 02:59