Trying to find the right variable to screen scraping.

Question

I have code written out, have tested the first bit. (The logging into website) but I am trying to add on a screen scraping part into the code and am having a bit of trouble getting the result that I want. When I run the code I get "None" im unsure what is causing this. I think it is due to me maybe not having the right attribute that it is trying to scrape.

    import requests
import urllib2
from bs4 import BeautifulSoup

with requests.session() as c:
    url = 'https://signin.acellus.com/SignIn/index.html'
    USERNAME = 'My user name'
    PASSWORD = 'my password'
    c.get(url)
    login_data = dict(Name=USERNAME, Psswrd=PASSWORD, next='/')
    c.post(url, data=login_data, headers={"Referer": "https://www.acellus.com/"})
    page = c.get('https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326')


quote_page = 'https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find('div', attrs={'class':'Object7069'})
price = price_box
print price

This is a screenshot of the "inspect element" of the data I want to screen scrape

I'm confused; you get page using requests (while logged in); but then get it again using urllib2 in which you don't log in... did you check whether the second one redirected you to a login page? — Foon, Jan 09 '18 at 23:01
Sorry this probably sounds like a stupid question but how would I check if it redirected me to a login page? — Kyle, Jan 10 '18 at 02:05

score 0 · Answer 1 · answered Jan 24 '18 at 23:54

I don't think using requests and urllib2 to log in is a good idea. There is mechanize module for python2.x using which you could log in through forms and retrieve content. Here is how your code would look like.

import mechanize
from bs4 import BeautifulSoup

# logging in...
br = mechanize.Browser()
br.set_handle_robots(False)
br.open("https://signin.acellus.com/SignIn/index.html")
br.select_form(nr=0)
br['AcellusID'] = 'your username'
br['Password'] = 'your password'
br.submit()

# parsing required information..
quote_page = 'https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326'
page = br.open(quote_page).read()
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find('div', attrs={'class':'Object7069'})
price = price_box
print price

Reference link: http://www.pythonforbeginners.com/mechanize/browsing-in-python-with-mechanize/

P.S: mechanize is only available for python2.x. If you wish to use python3.x, there are other options (Installing mechanize for python 3.4).

Trying to find the right variable to screen scraping.

1 Answers1