1

I have a vector of birth dates as character strings formatted "10-Feb-85".

When I use the as.Date() function in R it assumes the two digit year is after 2000 (none of these birth dates are after the year 2000).

example: as.Date(x = "10-Feb-52", format = "%d-%b-%y")

returns: 2052-02-10

I'm not proficient in regular expressions but I think that this is an occasion for a regular expression to insert a "19" after the second "-" or before the last two digits.

I've found a regex that counts forward three characters and inserts a letter:

gsub(pattern = "^(.{3})(.*)$", replacement = "\\1d\\2", x = "abcefg")

But I'm not sure how to count two from the end.

Any help is appreciated.

timothy.s.lau
  • 1,001
  • 1
  • 10
  • 22
  • 1
    from `?strptime` : On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’. – jeremycg Feb 10 '17 at 17:04
  • @WiktorStribiżew , I am familiar enough with the data. There might be a few from the 1800's but I doubt it. – timothy.s.lau Feb 10 '17 at 17:05
  • You may try the `lubridate`: `lubridate::dmy(c("10-Feb-85", "10-Feb-15")) [1] "1985-02-10" "2015-02-10"` – Uwe Feb 10 '17 at 17:34
  • 1
    @UweBlock: That way you add `19` or `20`, and they are added at the beginning, while the task is to add only `19` at the last 2 digits that are after the second hyphen. – Wiktor Stribiżew Feb 10 '17 at 18:02
  • 1
    Also, the [`cron` solution](http://stackoverflow.com/questions/9508747/add-correct-century-to-dates-with-year-provided-as-year-without-century-y) suggested as a dupe source inserts either `20` or `19`, which is not what is expected here. – Wiktor Stribiżew Feb 10 '17 at 18:03

1 Answers1

1

insert a "19" after the second "-" or before the last two digits.

Before the last two digits:

gsub(pattern = "-(\\d{2})$", replacement = "-19\\1", x = "10-Feb-52")

See the R demo. Here, - is matched first, then 2 digits ((\\d{2})) - that are at the end of string ($) - are matched and captured into Group 1.

After the second -:

gsub(pattern = "^((?:[^-]*-){2})", replacement = "\\119", x = "10-Feb-52")

See another demo. Here, 2 sequences ({2}) of 0+ chars other than - ([^-]*) are matched from the start of the string (^) and captured into group 1. The replacement contains a backreference that restores the captured text in the replacement result.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563