Why does python show me text in Chinese?

Question

I am using requests and bs4 to scrape some data from a Chinese website that also has an English version. I wrote this to see if I get the right data:

import requests
from bs4 import BeautifulSoup

page = requests.get('http://dotamax.com/hero/rate/')
soup = BeautifulSoup(page.content, "lxml")
for i in soup.find_all('span'):
    print i.text

And I do, the only problem is that the text is in Chinese, although it is in English when I look at the page source. Why do I get Chinese instead of English. How to fix that?

I included the link now. – Chen Guevara Oct 07 '16 at 18:56 — Chen Guevara, Oct 07 '16 at 18:56

n1c9 · Accepted Answer · 2020-10-19T18:41:16.577

The website appears to check the GET request for an Accept-Language parameter. If the request doesn't have one, it shows the Chinese version. However, this is an easy fix - use headers as described in the requests documentation:

import requests
from bs4 import BeautifulSoup

headers = {'Accept-Language': 'en-US,en;q=0.8'}

page = requests.get('http://dotamax.com/hero/rate/', headers=headers)
soup = BeautifulSoup(page.content, "lxml")
for i in soup.find_all('span'):
    print i.text

produces:

Anti-Mage
Axe
Bane
Bloodseeker
Crystal Maiden
Drow Ranger
...

etc.

Usually when a request shows up differently in your browser and in the requests content, it has to do with the type of request and headers you're using. One really useful tip for web-scraping that I wish I had realized much earlier on is that if you hit F12 and go to the "Network" tab on Chrome or Firefox, you can get a lot of useful information that you can use for debugging:

score -1 · Answer 2 · answered Oct 07 '16 at 19:07

-1

you have to tell the server which language you like in the http headers:

    import requests
    from bs4 import BeautifulSoup
    header={
        'Accept-Language': 'en-US'
    }
    page = requests.get('http://dotamax.com/hero/rate/',headers=header)
    soup = BeautifulSoup(page.content, "html5lib")
    for i in soup.find_all('span'):
        print(i.text)

answered Oct 07 '16 at 19:07

kiviak

1,083
9
10

This has already been provided in the other answer, what is in yours that is different? – Padraic Cunningham Oct 07 '16 at 19:09
hehe,when one reach the end in olympic,other players should quit the game?joke@Padraic Cunningham – kiviak Oct 07 '16 at 19:17
When same answer already there maybe the player should not add the same answer again. – Padraic Cunningham Oct 07 '16 at 19:19
@Padraic Cunningham the truth is when i was editing my answer,there was no here – kiviak Oct 07 '16 at 19:22

Why does python show me text in Chinese?

2 Answers2

Linked