1

Input:

OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.

I want to filter only hosts which has matched "Warnings".

Output:

abc123 
abc1234
bcd111

I have tried the below regex it matched all.

([\w]+)\s+:\s+:\s+Warning

Is it possible to avoid duplicates using regex?

Vasanth
  • 201
  • 1
  • 12

5 Answers5

3

When you hear "unique" in Perl, think "hash":

#!/usr/bin/perl
use warnings;
use strict;

my %uniq;
while (<>) {
    /:?(\S+?)[:\s]+Warning/ and $uniq{$1} = 1;
}

print "$_\n" for keys %uniq;

BTW, You input and regex don't lead to the output you indicated. I changed the regex, but I'm not sure your input sample is correct. Is the placement of colons really so wild?

choroba
  • 231,213
  • 25
  • 204
  • 289
1
OUT\s*:?([^:]*):(?=.*?\bWarning\b)(?:(?!OUT).)*(?!.*?\1[:\s]*Warning)

You can try this.See demo.Grab the capture.

http://regex101.com/r/sK8oK9/12

vks
  • 67,027
  • 10
  • 91
  • 124
0

You can use this perl one-liner:

perl -lane 'if (/\bWarning\b/) { @F[1] =~ s/(\W+)//g; print "@F[1]" }' file
abc123
abc123
abc1234
abc1234
abc1234
bcd111
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

use this pattern w/ gs option

OUT\s*:?([^:]+):\s*:\s*Warning(?!.*?\1\s*:\s*:\s*Warning)  

Demo

alpha bravo
  • 7,838
  • 1
  • 19
  • 23
0

This is more of a supplement/complement to @choroba's response above since he nailed it with "when you hear 'unique' think 'hash'". You should accept @choroba's answer :-)

Here I simplified the regex part of your question into a call to grep in order to focus on uniqueness, changed the data in your file a bit (so it could fit here) and saved it as dups.log:

# dups.log 
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Succeeded.

This one-liner give the output below:

perl -E '++$seen{$_} for grep{/Warning/} <>; print %seen' dups.log

OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)

This is pretty much the same output you'd get with uniq log_with_dups.log | grep Warning. It works because perl creates a hash key from each line it reads on STDIN adding a key to the hash and incrementing its value (with ++$seen{$_}) each time it sees the key. For perl "same key" here means a line that is a duplicate. Try printing values %seen or using -MDDP and p %seen to get a sense of what is going on.

To get your output @choroba's regex adds the capture (instead of the whole line) to the hash:

perl -nE '/:?(\S+?)[:\s]+Warning/ && ++$seen{$1} }{ say for keys %seen' dups.log


but, just as with the whole line method above, the regex will create only one copy of the key (from the match and capture) and then increment it with ++ so in the you get "unique" keys à la uniq in the %seen hash.

It's a neat perl trick you never forget :-)

References:

  • The SO question has some good explanations of the perl idiom for uniq using a hash as per @choroba.
  • This is touched on in perlfaq4 which describes the %seen{} hash trick.
  • Perlmaven shows how to make your own "home made" uniq using this approach.
  • ...
Community
  • 1
  • 1
G. Cito
  • 6,210
  • 3
  • 29
  • 42