Python urllib2.urlopen(url).read() is different from source code seen in Firefox

Question

When I use urllib2.urlopen(url).read() I read a source code slightly different from what I read in Firefox. In source code seen in Firefox some special characters, such as quotation marks ("), apostrophe ('), etc are converted to %22, %27 etc.

When I use urllib2.urlopen(url).read(), special characters are readable in clear text. I would like to see the source code of a web page with Python as I see it with Firefox (with% 22,% 27, etc).

Thank you and sorry for my english.

maybe checkout [selenium](http://selenium-python.readthedocs.org/), and this similar q&a: [How to get real source code of html page?](http://stackoverflow.com/questions/23657849/how-to-get-real-source-code-of-html-page) — chickity china chinese chicken, Sep 06 '17 at 01:06
Sorry, I don't want to use selenium, I have just read it. Another ways? I read the same complete source with Python, but some characters have a different encode ( ' = %27 ). Why? — Luigi, Sep 06 '17 at 01:12

score 1 · Answer 1 · answered Sep 06 '17 at 01:09

1

Perhaps that is urlencoded.

You can try to escape the result.

data = urllib2.urlopen(url).read()
print(urllib.quote(data))

answered Sep 06 '17 at 01:09

Julio Daniel Reyes

5,489
1
19
23

Thank you, but it isn't the result that I want. In the source code of a web page the characeter ' is used in the attributes of a tag, to open and close a value, but, for example, inside the attribute href the character appears like %27. With Firefox I can see this difference, using urllib2 not. Urllib2 read ' and %27 in the same way. I don't want to use selenium. Thank you for the answer – Luigi Sep 06 '17 at 01:32

Python urllib2.urlopen(url).read() is different from source code seen in Firefox

1 Answers1