2

I am trying to replace one regex pattern with another regex pattern.

st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'

pattern = re.compile('\d+x\d+') # for st_srt
re.sub(pattern, 'S\1E\2',st_srt)

I know the use of S\1E\2 is wrong here. The reason am using \1 and \2 is to catch the value 01 and 02 and use it in S\1E\2.

My desired output is:

st_srt = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'

So, what is the correct way to achieve this.

RanRag
  • 48,359
  • 38
  • 114
  • 167
  • 1
    You're not replacing a regex with another regex, you're using a regex to replace a string with another string. Very important difference. Trust me, using a regex to process *other* regexes is a nightmare you don't need. – Justin Morgan - On strike Mar 30 '12 at 21:22
  • @JustinMorgan : Thanks for your input, but than what is correct way to replace one regex with another or to achieve my desired output using a regex based solution. – RanRag Mar 30 '12 at 21:24
  • What I think you're saying is that you want to capture a group in your search string and use the group in the replace string. – alan Mar 30 '12 at 21:26
  • @Noob: See my answer, that should solve your problem. But it's important to understand that you're not trying to replace one regex with another. `Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt` is your *input string*, not your regex. When people refer to a "regex", they usually mean the *pattern* that you're using, which in this case is `\d+x\d+`. – Justin Morgan - On strike Mar 30 '12 at 21:27
  • @Noob: are you trying to use the st_mkv string as a pattern for changing the st_srt string? – alan Mar 30 '12 at 21:33
  • @alan: I need `Awake.01x02` to become `Awake.S01E02` where `S01E02` format is from `st_mkv`. So if i change `st_mkv` to `Awake.S01-S02` i need my st_srt to become `Awake.S01-S02`. – RanRag Mar 30 '12 at 21:37

4 Answers4

3

You need to capture what you're trying to preserve. Try this:

pattern = re.compile(r'(\d+)x(\d+)') # for st_srt
st_srt = re.sub(pattern, r'S\1E\2', st_srt)
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
  • (1) You should use raw strings. (2) You shouldn't call `re.sub` if you're not going to do anything with its return-value. ;-) – ruakh Mar 30 '12 at 21:29
  • Now am getting `'Awake.S\x01E\x02.iNTERNAL.WEBRiP.XViD-GeT.srt'`. – RanRag Mar 30 '12 at 21:29
  • @ruakh - Thanks, I cut & pasted his python code on the assumption that it was correct. I'm a regex guy, not a python guy. Can you help with the syntax? – Justin Morgan - On strike Mar 30 '12 at 21:32
  • And if I change my input string to `st_srt = 'Awake.01x03.iNTERNAL.WEBRiP.XViD-GeT.srt'` I need my regex output to be `'Awake.S01E03.iNTERNAL.WEBRiP.XViD-GeT.srt'` but if I use your solution the output is `'Awake.S\x01E\x02.iNTERNAL.WEBRiP.XViD-GeT.srt'`. So it is same for any value of `st_srt`. – RanRag Mar 30 '12 at 21:34
  • @Noob - I did a little research on python raw strings. Try it now. – Justin Morgan - On strike Mar 30 '12 at 21:35
  • @JustinMorgan: thanks it worked like a charm. Wondering why it didn't worked earlier. – RanRag Mar 30 '12 at 21:39
  • This isn't using 'st_mkv' as the source of the replace string for 'st_srt', which is what Noob said he needed in comments above. – alan Mar 30 '12 at 21:56
  • 2
    @Noob If you don't use raw strings, Python will interpret the backslashes as Python string escape sequences. `\1` and `\2` are synonymous with `\x01` and `\x02`, which are the Start-of-Header and Start-of-Text characters, which are not what you want. With raw strings, Python interprets the backslashes as backslashes, and the regex engine interprets them properly. – Taymon Mar 30 '12 at 21:58
  • @alan: Initially he didn't mentioned that. – RanRag Mar 30 '12 at 22:00
  • @RanRag: Initially, no. He clarified in comments later. Noob: you may want to edit the question to add that requirement. – alan Mar 30 '12 at 22:04
2

Well, it looks like you already accepted an answer, but I think this is what you said you're trying to do, which is get the replace string from 'st_mkv', then use it in 'st_srt':

import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'

replace_pattern = re.compile(r'Awake\.([^.]+)\.')
m = replace_pattern.match(st_mkv)
replace_string = m.group(1)

new_srt = re.sub(r'^Awake\.[^.]+\.', 'Awake.{0}.'.format(replace_string), st_srt)
print new_srt
alan
  • 4,752
  • 21
  • 30
1

Try using this regex:

([\w+\.]+){5}\-\w+

copy the stirngs into here: http://www.gskinner.com/RegExr/

and paste the regex at the top.

It captures the names of each string, leaving out the extension.

You can then go ahead and append the extension you want, to the string you want.

EDIT:

Here's what I used to do what you're after:

import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt' // dont actually need this one
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv' 
replace_pattern = re.compile(r'([\w+\.]+){5}\-\w+')
m = replace_pattern.match(st_mkv)

new_string = m.group(0)
new_string += '.srt'

>>> new_string
'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'
Alex
  • 4,844
  • 7
  • 44
  • 58
0
import re

st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'

st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'

pattern = re.compile(r'(\d+)x(\d+)')

st_srt_new = re.sub(pattern, r'S\1E\2', st_srt)

print st_srt_new
RanRag
  • 48,359
  • 38
  • 114
  • 167