you're doing several things wrong, for example,
you're trying to parse HTML with regex, that's bound to fail.
also, your regex-html-parser doesn't account for html encoding, so if there's any encoded characters in the input data, your code
will send the wrong data.
(for example, if 1 if the csrf tokens contains an &, it will be html encoded as &, which you must then html decode back to an &,
but your getInputs() function makes no attempt to detect/decode html encoded characters)
a little further down, you attempt to send both the username and password, in 1 go, to https://accounts.google.com/ServiceLoginAuth -
that's not how it works, you must send them in 2 batches, first the username, then in a different request, the password,
also the URLs are dynamic, it's different for each cookie session, but in your code, you have the urls hardcoded,
stop doing that, instead make a request to https://gmail.com/ , it will http-location-redirect you a few times to a dynamic url,
that url's html will contain a <form> with the id gaia_loginform, the form's "action" attribute will dictate where you send the username,
it will also have a bunch of hidden <input> fields wich you need to parse out and add to the request.
that request, if successful, will http-location-redirect you a few more times, to another dynamic url, which
dictates where you're supposed to send the password, along with more hidden <input> fields..
if that request is successful, you've logged in. but use a proper DOM parser, like DOMDocument, don't use regex for parsing HTML.
luckily for you, i needed to login to gmail programmatically some time back too, here's how i did it, using hhb_curl -
EDIT: warning, when gmail detects that something is "weird" with your login, it will
sometimes, apparently completely randomly, ask you to verify that it's
really the account owner logging in. 1 of the ways to verify you, is
to provide your recovery email (because only the account owner should
have that information, right?), and the original code i posted here on SO
will just crash when asked to verify identity. here's an updated code
that takes a third parameter, the recovery email, and verifies
identity automatically when asked to:
https://gist.github.com/divinity76/544d7cadd3e88e057ea3504cb8b3bf7e
still, for historical reasons, and because i'm too lazy to keep the SO answer code updated, here's the original code i posted here:
<?php
declare(strict_types = 1);
// header ( "content-type: text/plain;charset=utf8" );
require_once ('hhb_.inc.php');
function loginGmail(string $username, string $password): \hhb_curl {
$hc = new hhb_curl ( '', true );
$hc->setopt_array ( array (
CURLOPT_TIMEOUT => 20, // i just have a shitty connection :(
CURLOPT_CONNECTTIMEOUT => 10
) );
if (0) {
$hc->setopt_array ( array (
CURLOPT_USERAGENT => 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1'
) );
}
$html = $hc->exec ( 'https://gmail.com' )->getStdOut ();
$domd = @DOMDocument::loadHTML ( $html );
$inputs = getDOMDocumentFormInputs ( $domd, true, false ) ['gaia_loginform'];
// hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut (), $inputs ) & die();
$loginUrl = $domd->getElementById ( "gaia_loginform" )->getAttribute ( "action" );
$inputs ['Email'] = $username;
$html = $hc->setopt_array ( array (
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => http_build_query ( $inputs ),
CURLOPT_URL => $loginUrl
) )->exec ()->getStdOut ();
$domd = @DOMDocument::loadHTML ( $html );
$inputs = getDOMDocumentFormInputs ( $domd, true, false ) ['gaia_loginform'];
// hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut (), $inputs );
$loginUrl = $domd->getElementById ( "gaia_loginform" )->getAttribute ( "action" );
$inputs ['Passwd'] = $password;
try {
$starttime = microtime ( true );
$html = $hc->setopt_array ( array (
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => http_build_query ( $inputs ),
CURLOPT_URL => $loginUrl
) )->exec ()->getStdOut ();
} finally{
// hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut (), $inputs, (microtime ( true ) - $starttime) ) & die ();
}
$domd = @DOMDocument::loadHTML ( $html );
$xp = new DOMXPath ( $domd );
$loginErrors = $xp->query ( '//span[contains(@class,"error-msg")]' );
$loginErrorText = '';
foreach ( $loginErrors as $tmp ) {
$tmp = trim ( $tmp->textContent );
if (strlen ( $tmp )) {
$loginErrorText .= ' - ' . $tmp;
}
}
if (! empty ( $loginErrorText )) {
throw new \RuntimeException ( 'errors loggin in: ' . $loginErrorText );
} else {
// logged in! :D
}
// now we need to enable HTML view, it's a <form> POST request, but we can't use getDOMDocumentFormInputs (bug?)
$found = false;
foreach ( $domd->getElementsByTagName ( "form" ) as $form ) {
if (false === stripos ( $form->textContent, "Gmail's basic HTML view, which doesn't require JavaScript" )) {
continue;
}
$found = true;
$url = $form->getAttribute ( "action" );
if (! parse_url ( $url, PHP_URL_HOST )) {
$url = $hc->getinfo ( CURLINFO_EFFECTIVE_URL ) . $url;
}
// hhb_var_dump ( $url ) & die ();
$inputs = [ ];
foreach ( $form->getElementsByTagName ( "input" ) as $input ) {
$name = $input->getAttribute ( "name" );
if (empty ( $name )) {
continue;
}
$inputs [$name] = $input->getAttribute ( "value" );
}
// hhb_var_dump ( $inputs ) & die ();
break;
}
if (! $found) {
throw new \RuntimeException ( 'failed to find HTML version request form!' );
}
$html = $hc->setopt_array ( array (
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => http_build_query ( $inputs ),
CURLOPT_URL => $url
) )->exec ()->getStdOut ();
hhb_var_dump ( $hc->getStdErr (), $hc->getStdOut (), $inputs ); // & die ();
return $hc;
}
function rightTrim($str, $needle, $caseSensitive = true) {
$strPosFunction = $caseSensitive ? "strpos" : "stripos";
if ($strPosFunction ( $str, $needle, strlen ( $str ) - strlen ( $needle ) ) !== false) {
$str = substr ( $str, 0, - strlen ( $needle ) );
}
return $str;
}
function getDOMDocumentFormInputs(\DOMDocument $domd, bool $getOnlyFirstMatches = false, bool $getElements = true): array {
// :DOMNodeList?
if (! $getOnlyFirstMatches && ! $getElements) {
throw new \InvalidArgumentException ( '!$getElements is currently only implemented for $getOnlyFirstMatches (cus im lazy and nobody has written the code yet)' );
}
$forms = $domd->getElementsByTagName ( 'form' );
$parsedForms = array ();
$isDescendantOf = function (\DOMNode $decendant, \DOMNode $ele): bool {
$parent = $decendant;
while ( NULL !== ($parent = $parent->parentNode) ) {
if ($parent === $ele) {
return true;
}
}
return false;
};
// i can't use array_merge on DOMNodeLists :(
$merged = function () use (&$domd): array {
$ret = array ();
foreach ( $domd->getElementsByTagName ( "input" ) as $input ) {
$ret [] = $input;
}
foreach ( $domd->getElementsByTagName ( "textarea" ) as $textarea ) {
$ret [] = $textarea;
}
foreach ( $domd->getElementsByTagName ( "button" ) as $button ) {
$ret [] = $button;
}
return $ret;
};
$merged = $merged ();
foreach ( $forms as $form ) {
$inputs = function () use (&$domd, &$form, &$isDescendantOf, &$merged): array {
$ret = array ();
foreach ( $merged as $input ) {
// hhb_var_dump ( $input->getAttribute ( "name" ), $input->getAttribute ( "id" ) );
if ($input->hasAttribute ( "disabled" )) {
// ignore disabled elements?
continue;
}
$name = $input->getAttribute ( "name" );
if ($name === '') {
// echo "inputs with no name are ignored when submitted by mainstream browsers (presumably because of specs)... follow suite?", PHP_EOL;
continue;
}
if (! $isDescendantOf ( $input, $form ) && $form->getAttribute ( "id" ) !== '' && $input->getAttribute ( "form" ) !== $form->getAttribute ( "id" )) {
// echo "this input does not belong to this form.", PHP_EOL;
continue;
}
if (! array_key_exists ( $name, $ret )) {
$ret [$name] = array (
$input
);
} else {
$ret [$name] [] = $input;
}
}
return $ret;
};
$inputs = $inputs (); // sorry about that, Eclipse gets unstable on IIFE syntax.
$hasName = true;
$name = $form->getAttribute ( "id" );
if ($name === '') {
$name = $form->getAttribute ( "name" );
if ($name === '') {
$hasName = false;
}
}
if (! $hasName) {
$parsedForms [] = array (
$inputs
);
} else {
if (! array_key_exists ( $name, $parsedForms )) {
$parsedForms [$name] = array (
$inputs
);
} else {
$parsedForms [$name] [] = $tmp;
}
}
}
unset ( $form, $tmp, $hasName, $name, $i, $input );
if ($getOnlyFirstMatches) {
foreach ( $parsedForms as $key => $val ) {
$parsedForms [$key] = $val [0];
}
unset ( $key, $val );
foreach ( $parsedForms as $key1 => $val1 ) {
foreach ( $val1 as $key2 => $val2 ) {
$parsedForms [$key1] [$key2] = $val2 [0];
}
}
}
if ($getElements) {
return $parsedForms;
}
$ret = array ();
foreach ( $parsedForms as $formName => $arr ) {
$ret [$formName] = array ();
foreach ( $arr as $ele ) {
$ret [$formName] [$ele->getAttribute ( "name" )] = $ele->getAttribute ( "value" );
}
}
return $ret;
}