2

I have read extensively about how do this and I have tried a number of different variations, but I can't get it to work.

Basically, I just want to login to the ConEdison website and scrape my billing history. Here is what I have:

Connection.Response loginForm = Jsoup.connect("https://apps.coned.com/cemyaccount/NonMemberPages/Login.aspx?lang=eng")
                    .data("_LASTFOCUS","")
                    .data("_EVENTTARGET","")
                    .data("_EVENTARGUMENT","")
                    .data("_VIEWSTATE", viewState)
                    .data("_EVENTVALIDATION", eventValidation)
                    .data("ctl00$Main$Login1$UserName", username)
                    .data("ctl00$Main$Login1$Password", password)
                    .data("ctl00$Main$Login1$LoginButton", "Sign In")
                    .userAgent("Mozilla/5.0")
                    .method(Method.POST)
                    .execute();

            Map<String, String> loginCookies = loginForm.cookies();

            Document document = Jsoup.connect("https://apps.coned.com/CEMyAccount/CSOL/BillHistory.aspx?lang=eng")
                    .cookies(loginCookies)
                    .get();

            Elements data = document.select("table.ctl00_Main_lvBillHistory_Table1");

            //checking if it found the right page
            System.out.println("document: " + document);
            //checking if it found the table
            System.out.println("data: " + data);

I know the information is correct (though I don't know if I really need to pass the data parameters with no values).

I am not getting any errors, just printing out the login page (https://apps.coned.com/cemyaccount/NonMemberPages/Login.aspx?lang=eng)

Any help would be greatly appreciated.

Thanks

EDIT

So, I am now convinced that I was not able to get to the internal page, because after the POST to https://apps.coned.com/cemyaccount/NonMemberPages/Login.aspx?lang=eng, 3 cookies are set, but then it sends GET requests to https://apps.coned.com/cemyaccount/SessionTransfer.aspx?dir=2asp&url=https://apps.coned.com/csol/MainHome.asp?src=DOTNET then to https://apps.coned.com/csol/SessionTransfer.asp?dir=2asp&guid=3c413f48-d2eb-434a-896b-f9c4eb100714&url=https://apps.coned.com/csol/MainHome.asp?src=DOTNET&frm= for additional cookies before going to the homepage

Does anyone know how I can follow all these redirects and get the cookies in the end?

Here is what I currently have, but I cannot get the cookies from the POST call.

Response response = Jsoup
                        .connect("https://apps.coned.com/cemyaccount/NonMemberPages/Login.aspx?lang=eng")
                        .method(Method.GET)
                        .execute();

                Map<String, String> cookies = response.cookies();
                cookies.put("NSC_DpoFe_Bqqt-TTM-pme", response.cookie("NSC_DpoFe_Bqqt-TTM-ofx"));
                System.out.println("response cookies: " + cookies);

                response = Jsoup
                        .connect("https://apps.coned.com/cemyaccount/NonMemberPages/Login.aspx?lang=eng")
                        .header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
                        .header("Accept-Encoding", "gzip, deflate")
                        .header("Accept-Language", "en-US,en;q=0.8")
                        .header("Connection", "keep-alive")
                        .cookies(cookies)
                        .header("Host", "apps.coned.com")
                        .referrer("https://apps.coned.com/cemyaccount/NonMemberPages/Login.aspx?lang=eng&login=0")
                        .userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36")
                        .data("_LASTFOCUS", "")
                        .data("_EVENTTARGET", "")
                        .data("_EVENTARGUMENT", "")
                        .data("_VIEWSTATE", viewState)
                        .data("_EVENTVALIDATION", eventValidation)
                        .data("ctl00$Main$Login1$UserName", username)
                        .data("ctl00$Main$Login1$Password", password)
                        .data("ctl00$Main$Login1$LoginButton", "Sign In")
                        .followRedirects(false)
                        .method(Method.POST)
                        .execute(); 

                System.out.println("post cookies: " + response.cookies());
                cookies.putAll(response.cookies());
                System.out.println("response cookies: " + cookies);

                response = Jsoup
                        .connect("https://apps.coned.com/cemyaccount/SessionTransfer.aspx?dir=2asp&url=https:"
                                + "//apps.coned.com/csol/MainHome.asp?src=DOTNET")
                        .cookies(cookies)
                        .followRedirects(false)
                        .method(Method.GET)
                        .execute();

                cookies.putAll(response.cookies());
                System.out.println("response cookies: " + cookies);
                String guid = response.header("location");

                response = Jsoup
                        .connect("https://apps.coned.com/csol/SessionTransfer.asp?dir=2asp&guid="
                                + guid + "&url=https://apps.coned.com/csol/MainHome.asp"
                                + "?src=DOTNET&frm=")
                        .cookies(cookies)
                        .method(Method.GET)
                        .execute();

                cookies.putAll(response.cookies());
                System.out.println("response cookies: " + cookies);

                Document dataPage = Jsoup
                        .connect("https://apps.coned.com/CEMyAccount/CSOL/BillHistory.aspx?lang=eng")
                        .cookies(cookies)
                        .get();

                System.out.println("data page: " + dataPage);

                Elements data = dataPage.select("table.ctl00_Main_lvBillHistory_Table1");

                System.out.println("data: " + data);

In the output I get all the cookies, except the POST cookies which are blank.

ComlyW
  • 77
  • 9

2 Answers2

0
  • Open Development tools (press F12).
  • Select Network tab option.
  • Up and left there is a round button. If it's not red, click it. This will record the traffic.
  • Visit the login page. Enter your credentials and login.
  • Check what happens in Development tools. There is a table there, showing all the files that were received.
  • Check the Type column. Search for the row that has the value document. Select it. This will open a new screen.
  • Select Headers and scroll down until you locate the Request Headers. There you will find the request that was made from your browser. You will find there all the values that were send to the server.
  • Search for the parameters you need. Take the values and hardcode them to your code. Use the same user-agent (just in case) and in general try to imitate this request with your code.

All the steps above are for the Chrome browser.

Alkis Kalogeris
  • 17,044
  • 15
  • 59
  • 113
  • Hey alkis, those are definitely the steps I took to find what I'm currently sending. – ComlyW Jul 22 '15 at 18:34
  • Here is a screen shot of what I see - http://snag.gy/3h0ir.jpg. I really focused on the Form Data, but I just tried to include ALL of the Request Headers (except cookies) and I ended in the same place (printed out the login page) – ComlyW Jul 22 '15 at 18:45
  • Try again with the complete user agent (just in case). Include the referrer too. – Alkis Kalogeris Jul 22 '15 at 19:02
  • If the above doesn't work, then the problem must be the https. Check this http://stackoverflow.com/questions/7744075/how-to-connect-via-https-using-jsoup – Alkis Kalogeris Jul 22 '15 at 19:06
  • Yikes, that's a scary road to go down. Before I do, I have two questions: A) Should the Request Headers be .header(arg1, arg2) or .data(arg1, arg2) B) when I print the loginForm.cookies() I only get two (NSC_DpoFe_Bqqt-TTM-ofx and ASP.NET_SessionId) not NSC_DpoFe_Bqqt-TTM-pme, ASPSESSIONIDCWTSDTQS, or .ASPXFORMSAUTH like I see in the Chrome browser. Any ideas? – ComlyW Jul 22 '15 at 20:04
  • `data()` refers to parameters (either in the body-POST or in the url-GET). This will be your FormData. What you need to put in the Headers section should be added by `.header()`. If you check the implementation you will see for example that `userAgent()` calls `header()` and user-agent is part of the Requset Headers. About the second part, well it might be because the https, but I'm not sure. – Alkis Kalogeris Jul 22 '15 at 20:28
  • Try setting this http://jsoup.org/apidocs/org/jsoup/Connection.html#validateTLSCertificates(boolean) to false – Alkis Kalogeris Jul 22 '15 at 20:42
  • Hey alkis, thanks for the help so far. I don't think it's a SSL problem. Please check out the edit to my question. If you have any ideas, that would be awesome. Thanks – ComlyW Jul 23 '15 at 19:08
0

The answer was painfully simple - the headers with the underscores had 2 underscores, I was only using 1. Doh

ComlyW
  • 77
  • 9