When I use the following code.
from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup (open("43rd-congress.htm"))
final_link = soup.p.a
final_link.decompose()
f = csv.writer(open("43rd_congress_all.csv", "w"))
f.writerow(["Name","Years","Position","Party", "State", "Congress", "Link"])
trs = soup.find_all('tr')
for tr in trs:
for link in tr.find_all('a'):
fulllink = link.get ('href')
print fulllink #print in terminal to verify results
tds = tr.find_all("td")
try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
years = str(tds[1].get_text())
positions = str(tds[2].get_text())
parties = str(tds[3].get_text())
states = str(tds[4].get_text())
congress = tds[5].get_text()
except:
print "bad tr string"
continue #This tells the computer to move on to the next item after it encounters an error
print names, years, positions, parties, states, congress
f.writerow([names, years, posiitons, parties, states, congress, fullLink])
I get a NameError. However when I try to correct for the error, I get an error on the last line of code saying the variables are undefined. I have made the corrections to get it to where it is now with the community. How do I fix it?
I appreciate your help.
I am running this in notepad++ and powershell. I am on the last section of this tutorial here...http://jeriwieringa.com/blog/2012/11/04/beautiful-soup-tutorial-part-1/