1

When I use urllib2.urlopen(url).read() I read a source code slightly different from what I read in Firefox. In source code seen in Firefox some special characters, such as quotation marks ("), apostrophe ('), etc are converted to %22, %27 etc.

When I use urllib2.urlopen(url).read(), special characters are readable in clear text. I would like to see the source code of a web page with Python as I see it with Firefox (with% 22,% 27, etc).

Thank you and sorry for my english.

Luigi
  • 11
  • 3
  • maybe checkout [selenium](http://selenium-python.readthedocs.org/), and this similar q&a: [How to get real source code of html page?](http://stackoverflow.com/questions/23657849/how-to-get-real-source-code-of-html-page) – chickity china chinese chicken Sep 06 '17 at 01:06
  • Sorry, I don't want to use selenium, I have just read it. Another ways? I read the same complete source with Python, but some characters have a different encode ( ' = %27 ). Why? – Luigi Sep 06 '17 at 01:12

1 Answers1

1

Perhaps that is urlencoded.

You can try to escape the result.

data = urllib2.urlopen(url).read()
print(urllib.quote(data))
Julio Daniel Reyes
  • 5,489
  • 1
  • 19
  • 23
  • Thank you, but it isn't the result that I want. In the source code of a web page the characeter ' is used in the attributes of a tag, to open and close a value, but, for example, inside the attribute href the character appears like %27. With Firefox I can see this difference, using urllib2 not. Urllib2 read ' and %27 in the same way. I don't want to use selenium. Thank you for the answer – Luigi Sep 06 '17 at 01:32