Scraping Data from website with a login page

Question

I am trying to login to my university website using python and the requests library using the following code, nonetheless I am not able to.

import requests

payloads = {"User_ID": <username>,
"Password": <passwrord>,
    "option": "credential",
"Log in":"Log in"
}
with requests.Session() as session:
    session.post('', data=payloads)
    get = session.get("")
print(get.text)

Does anyone have any idea on what I am doing wrong?

We're gonna need more to go on, short of actually doing this ourselves. Requests posts data as form-encoded data when using the data keyword and json when using the json keyword. That's a fairly common problem so that may be it for you. `session.post(url, json=data)` — DerekR, Jan 16 '18 at 16:44
Is it because you're passing in the password as the userid and the userid as the password? — Alan Hoover, Jan 16 '18 at 16:47
@AlanHoover Good call but sadly no this is just a typo I did when I entered the code on stackoverflow — Nazim Kerimbekov, Jan 16 '18 at 16:49
@Fozoro I just took a look at the website in question. It's form-encoded data so using data instead of json should still work. But there's another field that looks like its required `"option": "credential"`. You should use chrome developer tools' network tab to find out this information. Basically just view what valid requests look like and then try to recreate those with python. The website could also be filtering out requests that don't look right, e.g. Improper `Referer` headers, etc... — DerekR, Jan 16 '18 at 21:10
@DerekR I have update my code on stackoverflow, please take a look. I have tried adding "option" and a few other things to the payload but It keeps printing the something over and over again (PS: I tried just using request.get without payloads and it gives me the same thing as of when I am trying with the payloads) — Nazim Kerimbekov, Jan 16 '18 at 21:21
Try using an HTTP client like Postman or Insomnia and see if you can recreate the login request first before you get into python. When doing this kind of stuff it helps to start small and incrementally add in the layers so you know where the problems originate at. — DerekR, Jan 16 '18 at 22:01

score 0 · Answer 1 · answered Jan 16 '18 at 17:02

In order to login you will need to to post all the informations requested by the <input> tag. In your case you will have also to provide the hidden inputs. You can do this by scraping for these values and then post them. You might also need to post some headers to simulate a browser behaviour.

from lxml import html
import requests

s = requests.Session()
login_url = "https://intranet.cardiff.ac.uk/students/applications"
session_url = "https://login.cardiff.ac.uk/nidp/idff/sso?sid=1&sid=1"
to_get = s.get(login_url)
tree = html.fromstring(to_get.text)
hidden_inputs = tree.xpath(r'//form//input[@type="hidden"]')
payloads = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
payloads["Ecom_User_ID"] = "<username>"
payloads["Ecom_Password"] = "<password>"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
result = s.post(session_url, data=payloads, headers = headers)

Hope this works

I plugged in my login details into your code and sadly, it is still not working — Nazim Kerimbekov, Jan 16 '18 at 17:15
Maybe check this [guide](https://brennan.io/2016/03/02/logging-in-with-requests/) about logging in to website. In the bottom part "Single Sign On" there is a section about logging in to websites which delegate authentication to somebody else like it might be your case. — Gozy4, Jan 17 '18 at 07:25

score -1 · Answer 2 · answered Jan 16 '18 at 16:49

In order to login to a website with python, you will have to use a more involved method than the request library because you will have to simulate the browser in your code and have it make requests to login to the school's website servers. The reason for this is that you need the school's server to think that it is getting the request from the browser, then it should return you the contents of the resulting page, and then you have to have those contents rendered so that you can scrape it. Luckily, a great way to do this is with the selenium module in python.

I would recommend googling around to learn more about selenium. This blog post is a good example of using selenium to log into a web page with detailed explanations of what each line of code is doing. This SO answer on using selenium to login to a website is also good as an entry point into doing this.

Scraping Data from website with a login page

2 Answers2