0

Hey Guys i am trying to scrape some data from aliexpress but whenever i want to access any url, it ask me to login before accessing the page.i don't know how to automatically login into website, some of you may use use cookies but i don't know how to use cookies, Here is my code :

import requests
from bs4 import BeautifulSoup
import csv
from selenium import webdriver

g = csv.writer(open('aliexpressnew.csv', 'a',newline='',encoding="utf-8"))
#g.writerow(['Product Name','Price','Category','Subcategory'])

links = [
        "https://www.aliexpress.com/category/205838503/iphones.html?spm=2114.search0103.0.0.6ab01fbbfe33Rm&site=glo&g=n&needQuery=n&tag="

        ]


for i in links:
    getlink = i

    while getlink != 0:
        chromepath = 'C:\\Users\Faisal\Desktop\python\chromedriver.exe'
        driver = webdriver.Chrome(chromepath)
        driver.get(getlink)
        soup = BeautifulSoup(driver.page_source, 'html.parser')


        a





            if itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog'):
                if itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog').find('img').get('src'):
                    image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog').find('img').get('src')
                else:

                    image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog').find('img').get('image-src')

            else :
                if itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind ').find('img').get('src'):
                    image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind ').find('img').get('src')
                else:

                    image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind ').find('img').get('image-src')


            image3 = 'http:'+ str(image)





            print(title)
            print(price)
            #print(rating2)
            print(image3)

            g.writerow([title,price,subcat2,image])


        next1 = soup.find(class_='ui-pagination-navi util-left')
        if next1.find(class_="page-end ui-pagination-next ui-pagination-disabled"):

            getlink=0

        else:   

            next22 = next1.find(class_='page-next ui-pagination-next')
            next3 = "http:" + next22.get('href')
            getlink = next3
        driver.close()
Muhammad Faisal
  • 131
  • 1
  • 2
  • 8

3 Answers3

0

First, you need to authenticate after opening your website through the Selenium driver. Actually you don't really need cookies to do that.

You first need to inspect element to find IDs to reach them with your driver then use send_keys to fill in the input :

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

delay = 10 // seconds before timout

chromepath = 'C:\\Users\Faisal\Desktop\python\chromedriver.exe'
driver = webdriver.Chrome(chromepath)

driver.get(ALI_EXPRESS_LINK)

# In order to wait the full loading of the page
# (Actually waits for the input of the login part, you can find the id by inspecting element, see attached picture)
WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.ID, "fm-login-id")))

element = driver.find_element_by_id("fm-login-id")
element.send_keys(YOUR_LOGIN_ID)

# Doing the same for the password
element = driver.find_element_by_id("fm-login-password")
element.send_keys(YOUR_PASSWORD)

# Then click the submit button
driver.find_element_by_class_name("password-login").click()

Don't forget to define :

  • ALI_EXPRESS_LINK

  • YOUR_LOGIN_ID

  • YOUR_PASSWORD

:)

Attached : enter image description here

LaSul
  • 2,231
  • 1
  • 20
  • 36
0

It sounds like you're getting the username password browser prompt that appears before any page content appears, if that's the case you can navigate to the following uri:

http://<username>:<password>@your-url-here.com

for example:

http://foo:bar@example.com

Stephen K
  • 85
  • 7
0

You can load a chrome profile automatically with the credentials stored to avoid to do login manually

How to open URL through default Chrome profile using Python Selenium Webdriver

You have to add an chrome options to the webdriver

options = webdriver.ChromeOptions()
# paths chrome in windows
options.add_argument("user-data-dir=C:/Users/NameUser/AppData/Local/Google/Chrome/User Data")
options.add_argument("profile-directory=Default")

driver = webdriver.Chrome(chromepath, chrome_options=options)

Make sure you have stored credentials logged into the website when you start chrome normally

  • can you please explain it little more because i did not understand your answer there. i am newbie, can you tell me exactly what should i do and what should i add in my code i post above ? – Muhammad Faisal Jul 17 '19 at 11:17
  • Edited, the idea is load a chrome profile where you already are logged in the website, if you dont use Windows you look for where are stored the profile data of chrome browser in your operative system – Francisco Rodeño Sanchez Jul 17 '19 at 11:54