2

This is Part 2 of this question and thanks very much for David's answer. What if I need to extract dates which are bounded by two keywords?

Example:

text = "One 09 Jun 2011 Two 10 Dec 2012 Three 15 Jan 2015 End"

Case 1 bounding keyboards: "One" and "Three"
Result expected: ['09 Jun 2011', '10 Dec 2012']

Case 2 bounding keyboards: "Two" and "End"
Result expected: ['10 Dec 2012', '15 Jan 2015']

Thanks!

Community
  • 1
  • 1
ohho
  • 50,879
  • 75
  • 256
  • 383
  • @Horace did you ask your question twice? http://stackoverflow.com/questions/2770040/python-regex-of-a-date-in-some-text – Kiril May 05 '10 at 03:09
  • @Lirik, the second question adds one more condition, so I better separate the two. – ohho May 05 '10 at 03:22

2 Answers2

3

You can do this with two regular expressions. One regex gets the text between the two keywords. The other regex extracts the dates.

match = re.search(r"\bOne\b(.*?)\bThree\b", text, re.DOTALL)
if match:
    betweenwords = match.group(1)
    dates = re.findall(r'\d\d (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{4}', betweenwords) 
Jan Goyvaerts
  • 21,379
  • 7
  • 60
  • 72
  • it works, thx! except re.findal(..., text) should be re.findall(..., betweenwords) btw, is the first and last "\b" required in the first regex? – ohho May 05 '10 at 04:56
  • I have corrected the `findall` parameter. All 4 `\b` are required if you want your words to be matched as whole words. E.g. `\bEnd\b` cannot match `Ending`. If you don't care whether your two keywords are whole or partial words, then you can omit all 4 `\b`. – Jan Goyvaerts May 06 '10 at 10:02
0

Do you really need to worry about the keywords? Can you ensure that the keywords will not change?

If not, the exact same solution from the previous question can solve this:

>>> import re
>>> text = "One 09 Jun 2011 Two 10 Dec 2012 Three 15 Jan 2015 End"
>>> match = re.findall(r'\d\d\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4}', text)
>>> match
['09 Jun 2011', '10 Dec 2012', '15 Jan 2015']

If you really only need two of the dates, you could just use list slicing:

>>> match[:2]
['09 Jun 2011', '10 Dec 2012']
>>> match[1:]
['10 Dec 2012', '15 Jan 2015']
jathanism
  • 33,067
  • 9
  • 68
  • 86
  • The keywords (user-defined) are important for excluding some dates which are not inside the relevant part of a document. – ohho May 05 '10 at 02:44
  • So the keywords will be different and will have variable lengths? You'll have to use greedy matching. Alpha only, or alphanumeric? These are all important considerations when building your patterns. – jathanism May 05 '10 at 02:50
  • Please consider the bounding keywords to be 2 constant strings. – ohho May 05 '10 at 03:24