-1

When I use the following code.

from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup (open("43rd-congress.htm"))

final_link = soup.p.a
final_link.decompose()


f = csv.writer(open("43rd_congress_all.csv", "w"))
f.writerow(["Name","Years","Position","Party", "State", "Congress", "Link"])
trs = soup.find_all('tr')

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')

        print fulllink #print in terminal to verify results

        tds = tr.find_all("td")

        try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
            names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
            years = str(tds[1].get_text())
            positions = str(tds[2].get_text())
            parties = str(tds[3].get_text())
            states = str(tds[4].get_text())
            congress = tds[5].get_text()

        except:
            print "bad tr string"
            continue #This tells the computer to move on to the next item after it encounters an error

        print names, years, positions, parties, states, congress
        f.writerow([names, years, posiitons, parties, states, congress, fullLink])

I get a NameError. However when I try to correct for the error, I get an error on the last line of code saying the variables are undefined. I have made the corrections to get it to where it is now with the community. How do I fix it?

I appreciate your help.

I am running this in notepad++ and powershell. I am on the last section of this tutorial here...http://jeriwieringa.com/blog/2012/11/04/beautiful-soup-tutorial-part-1/

Rob B.
  • 130
  • 1
  • 7
  • 24
  • 1
    In the future, just copy and paste the full stack trace instead of trying to describe the error in your own words. In your case it seems that you have two typos in the last line: ``posiitons`` and ``fullLink``. – fjarri Oct 21 '13 at 05:11
  • Thank you...at some point you should never code past midnight. I can't believe I missed all that. – Rob B. Oct 21 '13 at 05:17

2 Answers2

2

names, years, posiitons, parties, states, congress will never be created if the first line in the try/except clause raises an error.

What's happening is an error is raised during the try structure. Let's say names = str(tds[0].get_text()) creates an error. You catch it, but then the latter variables are never created.

You might want to consider making default values before your try/except, eg names = ''.


Your indentation error could just be because of a mix of tabs and spaces because your code looks fine to me.

TerryA
  • 58,805
  • 11
  • 114
  • 143
0
    #                       |-> Different from when passed below
    print names, years, positions, parties, states, congress
    f.writerow([names, years, posiitons, parties, states, congress, fullLink])
    #                             |-> Different from original name    |-> Same with fullLink, its supposed to be called fullink when instantiated.

In the above example, positions and posiitons are not the same. This is a simple typing mistake.

Take a look at the following code, and see if it runs, as I don't have your files.

from bs4 import BeautifulSoup
import csv

soup = BeautifulSoup(open("43rd-congress.htm"))

final_link = soup.p.a
final_link.decompose()

f = csv.writer(open("43rd_congress_all.csv", "w"))
f.writerow(["Name", "Years", "Position", "Party", "State", "Congress", "Link"])
trs = soup.find_all('tr')

for tr in trs:
    for link in tr.find_all('a'):
        fullLink = link.get('href')

        print fullLink  # print in terminal to verify results

        tds = tr.find_all("td")

        try:  # we are using "try" because the table is not well formatted. This allows the program to continue after
              # encountering an error.
            # This structure isolate the item by its column in the table and converts it into a string
            names = str(tds[0].get_text())
            years = str(tds[1].get_text())
            positions = str(tds[2].get_text())
            parties = str(tds[3].get_text())
            states = str(tds[4].get_text())
            congress = tds[5].get_text()

            print names, years, positions, parties, states, congress
            f.writerow([names, years, positions, parties, states, congress, fullLink])
        except IndexError:
            print "bad tr string"
            continue  # This tells the computer to move on to the next item after it encounters an error
Games Brainiac
  • 80,178
  • 33
  • 141
  • 199