0

I'm trying to get the HTML content of a certain page (that requires a login to view) with PHP. I have the login info of course, and I'm even ABLE to login to the page via curl, as the following code demonstrates (I took out the URLs and login info in this question, but it does actually work for me):

        <html>
    <head>
    </head>
    <body><?php

    $login_url = 'https://www.MyTopSecretDomainPage/login.asp';
       $remotePageUrl = 'http://www.MyTopSecretDomainPage/TheOtherPageThatIWantToGetTheContentsOf.asp?variablesThatAreImport=Things';
    //These are the post data username and password
    $post_data = 'action=login&cookieexists=true&redirect=&page=&partner=&email=someThing@gmail.com&password=TopSecretPassword&userid_to_cookie=1&saveID=yes';

    //Create a curl object
    $ch = curl_init();

    //Set the useragent
    $agent = $_SERVER["HTTP_USER_AGENT"];
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);

    //Set the URL
    curl_setopt($ch, CURLOPT_URL, $login_url );

    //This is a POST query
    curl_setopt($ch, CURLOPT_POST, 1 );

    //Set the post data
    curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);

    //We want the content after the query
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);



    /*
    Set the cookie storing files
    Cookie files are necessary since we are logging and session data needs to be saved
    */

    curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');

    //Execute the action to login
    $postResult = curl_exec($ch);



    //here I'm trying to get the data another page
    curl_setopt_array(
        $ch, array(
        CURLOPT_URL => $remotePageUrl ,
        CURLOPT_RETURNTRANSFER => true
    ));

    $output = curl_exec($ch);
   //I first echo the $post_result 'cause I need to log in, but that causes me to redirect, not sure how else to do it so I can afterwards get the other data?
 echo($postResult);
    ?>
    <script>

    </script>
    </body>
    </html>

most of this code was based on other answers I found here, but those other answers didn't address the problem I am having, which is that after it logs in to the other website successfully, it automatically redirects to that website, even though I turned CURLOPT_FOLLOWLOCATION off, as you can see. I suspect that somehow built into the other website it automatically redirects the user, but I need to still access content from a different page via PHP, and once I'm redirected -- no more php page!!!!!

I think if I can somehow load the login page (and login) on some hidden window or something and after it redirects on that other window, maybe I can then get the content of the other page I want, but I'm not sure if that's the correct solution, or if that would even work at all.

SO: I have successfully logged in to the other page, but how do I actually get the content of a different page on that site, since it redirects me automatically?

  • 1
    *"and once I'm redirected -- no more php page"* – I don't get what that means, and it seem central to your issue. Please explain better. – deceze Oct 07 '18 at 09:17
  • `curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);` probably won't help. Set this to `true` so redirects are followed – Scuzzy Oct 07 '18 at 09:32
  • @Scuzzy I don't want it to redirect, that's the thing – B''H Bi'ezras -- Boruch Hashem Oct 07 '18 at 18:46
  • @deceze it means that my PHP page is no longer there, as it has redirected me to the next page, and therefore theres no way I can use that PHP page to get the content of a second page – B''H Bi'ezras -- Boruch Hashem Oct 07 '18 at 18:46
  • @Scuzzy I just tried that and it still redirects me to the page after login, which is not what I want, 'cause then theres no way to get the content of the 2nd page – B''H Bi'ezras -- Boruch Hashem Oct 07 '18 at 18:51
  • Is it possible that the content of `$postResult` may have a redirect via some other means, eg meta refresh and that by echoing it, you're executing the redirect? `CURLOPT_FOLLOWLOCATION` won't redirect your user, it will tell curl to follow 301/302 before populating the result response. – Scuzzy Oct 07 '18 at 20:57
  • @Scuzzy yes that's the point, it does redirect on the page itseslf somehow, and that's what I need to stop – B''H Bi'ezras -- Boruch Hashem Oct 07 '18 at 21:55
  • If `$remotePageUrl` is supposed to be the url after the login redirect, but you've prevented the redirect, then there's obviously something about curl and what I've said that both of us arn't getting from each other. I don't understand why you'd need to call curl exec twice. – Scuzzy Oct 07 '18 at 22:22
  • 1
    @Scuzzy I don't know anything about curl, all I'm trying to do is to get the content of a certain page that erquires login with PHP, and I took most of the code above from another answer, tried to modify it a bit, and ran into some problems – B''H Bi'ezras -- Boruch Hashem Oct 07 '18 at 22:36
  • My assumption is after you post the login, there's a session cookie that will return and the location header is sent, this may be all you need to retain to maintain the session, and thus flowing the redirect may be unnecessary, but without knowing more about the third party implementation, this is just a guess :) – Scuzzy Oct 07 '18 at 22:59
  • @Scuzzy Thanks but I don't really understand at all... I tried doing a file_put_contents after requesting the next file, and this is what I got if taht helps: and aferwards it redirects me to the page (obviously) so I'm not sure how to actually save the CONTENTS of the page its redirecting me to (which isn't even the right page but that would be close enough) – B''H Bi'ezras -- Boruch Hashem Oct 07 '18 at 23:15
  • You might need to look at https://stackoverflow.com/questions/41978957/get-header-from-php-curl-response/41979193 – Scuzzy Oct 07 '18 at 23:43
  • @Scuzzy thanks I was able to retrive the header but what should I do with it now? – B''H Bi'ezras -- Boruch Hashem Oct 07 '18 at 23:46
  • Ignore what I said about the header, it's wrong. Ok so the hope is at this point your `cookie.txt` file will have something in it, and then basically, just do another curl request to the page you're actually wanting to get the contents for, don't print `$postResult` because then its making YOUR browser redirect, think of CURL as a second browser, and you're using a second browser behind the scenes to fetch the content using its cookie store for the login/session. – Scuzzy Oct 07 '18 at 23:56
  • @Scuzzy so how should I get the information if not by print? Can I save it to a file? So far when I try to get the other page it returns the redirect string – B''H Bi'ezras -- Boruch Hashem Oct 07 '18 at 23:59
  • You could use regular expressions to extract the URL from that string https://3v4l.org/Steu5 `preg_match( '/window\.top\.location\s*=\s*"([^"]+)";/', $postResult, $match )` – Scuzzy Oct 08 '18 at 00:18
  • @Scuzzy but then what? Thats the string I'm trying to get to in the first place, and it showed me that response, what do I do after I get it again? If I make another curl request to the same URL it'll proably just give me that URL again – B''H Bi'ezras -- Boruch Hashem Oct 08 '18 at 00:42
  • But at this point, the assumption is your curl "browser" is logged in and you should see the content you're expecting (confirm if `cookie.txt` has content now). – Scuzzy Oct 08 '18 at 00:57
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181449/discussion-between-scuzzy-and-user2016831). – Scuzzy Oct 08 '18 at 00:57

0 Answers0