2

Apologies for the simple question. I don't clean text or use regex often.

I have a large number of text files in which I want to remove every line until my regex finds a match. There's usually about 15 lines of fluff before I find a match. I was hoping for a perl one-liner that would look like this:

perl -p -i -e "s/.*By.unanimous.vote//g" *.txt

But this doesn't work.

Thanks

Vincent
  • 15,809
  • 7
  • 37
  • 39
  • In what way does it not work? – Cfreak Jun 08 '11 at 14:17
  • 1
    Including or excluding the line that matches? – Qtax Jun 08 '11 at 14:19
  • Are those supposed to be literal `.` characters in `By.unanimous.vote`, or should they be escaped? – Justin Morgan - On strike Jun 08 '11 at 14:38
  • 1. The expression I posted only removes text that is before the match, but *on the same line*. It does not remove previous lines. 2. Not critical for my application, but I suppose we can erase the match too. 3. Literal `.` characters. They should not be escaped. Thanks all for looking into this! – Vincent Jun 08 '11 at 19:57

5 Answers5

4

Solution using the flip-flop operator:

perl -pi -e '$_="" unless /By.unanimous.vote/ .. 1' input-files

Shorter solution that also uses the x=!! pseudo operator:

per -pi -e '$_ x=!! (/By.unanimous.vote/ .. 1)' input-files
Community
  • 1
  • 1
mob
  • 117,087
  • 18
  • 149
  • 283
  • How does the flip-flop work? I read linked page but confused about meaning of `/pattern/ .. 1`. Does it go in reverse order i.e. line containing `By.unanimous.vote` to 1st line? – user13107 Oct 15 '13 at 16:03
3

Have a try with:

If you want to get rid until the last By.unanimous.vote

perl -00 -pe "s/.*By.unanimous.vote//s" inputfile > outputfile

If you want to get rid until the first By.unanimous.vote

perl -00 -pe "s/.*?By.unanimous.vote//s" inputfile > outputfile
Toto
  • 89,455
  • 62
  • 89
  • 125
  • In that case use `.*?` without /g and with /s – Qtax Jun 08 '11 at 14:21
  • @Qtax: You're right if OP wants to get rid until the first occurrence. – Toto Jun 08 '11 at 14:37
  • You still need /s, without it this will not work. Or does `.` follow the input record separator for its [^newline] matching? – Qtax Jun 08 '11 at 14:43
  • 1
    @Qtax: `-00` defines the record separator to null so there's no needs to `/s` – Toto Jun 08 '11 at 14:47
  • @M42, apparently `.` does not care about the record separator, `perl -00 -e "print qq!foo\nbar! =~ /(.*)/"` prints `foo`. – Qtax Jun 08 '11 at 14:59
1

You haven't said whether you want to keep the By.unanimous.vote part, but it sounds to me like you want:

s/[\s\S]*?(?=By\.unanimous\.vote)//

Note the missing g flag and the lazy *? quantifier, because you want to stop matching once you hit that string. This should preserve By.unanimous.vote and everything after it. The [\s\S] matches newlines. In Perl, you can also do this with:

s/.*?(?=By\.unanimous\.vote)//s
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
1

Try something like:

perl -pi -e "$a=1 if !$a && /By\.unanimous\.vote/i; s/.*//s if !$a" *.txt

Should remove the lines before the matched line. If you want to remove the matching line also you can do something like:

perl -pi -e "$a=1 if !$a && s/.*By\.unanimous\.vote.*//is; s/.*//s if !$a" *.txt

Shorter versions:

perl -pi -e "$a++if/By\.unanimous\.vote/i;$a||s/.*//s" *.txt
perl -pi -e "$a++if s/.*By\.unanimous\.vote.*//si;$a||s/.*//s" *.txt
Qtax
  • 33,241
  • 9
  • 83
  • 121
0

Solution using awk

awk '/.*By.unanimous.vote/{a=1} a==1{print}' input > output
Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130