I'd like to grab my medical summary page from the Stanford Health website https://myhealth.stanfordmedicine.org/myhealth/inside.asp?mode=download&view=true and dump it into a JSON file. However, I seem to be struggling with just getting past the login page.
Here's the code I've come up with so far:
import mechanize
br = mechanize.Browser()
br.set_handle_robots(True)
br.set_handle_refresh(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# Open webpage and inspect its contents
url = "https://myhealth.stanfordmedicine.org/"
response = br.open(url)
# Test to make sure we've got the right page
# print response.read() # the text of the page
# Select form
br.select_form(nr=0)
# User credentials
br.form["Login"] = 'user@example.com'
br.form["Password"] = 'password123'
br.submit()
However, when I run it, I get the following error:
Traceback (most recent call last):
File "test_mech_bitbybit.py", line 27, in <module>
br.form["Login"] = 'user@example.com'
File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 2784, in __setitem__
ValueError: control 'Login' is disabled
In doing some research it appears as though JavaScript needs to be enabled in order for the login to be processed (in fact, with JavaScript disabled, the login/password fields become disabled and it's impossible to input anything in them). This leads me to believe that JavaScript has something to do with keeping the session alive and, possibly, handing off cookies to the browser. This is the point where I get overwhelmed and question whether I should even be using mechanize for this task.
Does anyone have experience, who'd be kind enough to hold my hand through this issue, and explain to me what I need to do to properly get through this login page and/or mimic whatever JavaScript is being used to accomplish?