In python, by using an HTML parser, is it possible to get the document.lastModified
property of a web page. I'm trying to retrieve the date at which the webpage/document was last modified by the owner.
Asked
Active
Viewed 629 times
-2

Vishwa Iyer
- 841
- 5
- 14
- 33
2 Answers
1
A somewhat related question "I am downloading a file using Python urllib2. How do I check how large the file size is?", suggests that the following (untested) code should work:
import urllib2
req = urllib2.urlopen("http://example.com/file.zip")
total_size = int(req.info().getheader('last-modified'))
You might want to add a default value as the second parameter to getheader()
, in case it isn't set.
1
You can also look for a last-modified
date in the HTML code, most notably in the meta
-tags. The htmldate module does just that.
Here is how it could work:
1. Install the package:
pip/pip3/pipenv (your choice) -U htmldate
2. Retrieve a web page, parse it and output the date:
from htmldate import find_date
find_date('http://blog.python.org/2016/12/python-360-is-now-available.html')
(disclaimer: I'm the author)

adbar
- 93
- 5
-
Nice answer. Could you provide a sample of code in plain text to illustrate this approach? I invite you to read this guide to improve your question [answer] – Guillaume Raymond Jan 14 '20 at 13:17
-
Thank you for the input! I added further instructions. – adbar Jan 14 '20 at 13:21