This is more of a supplement/complement to @choroba's response above since he nailed it with "when you hear 'unique' think 'hash'". You should accept @choroba's answer :-)
Here I simplified the regex part of your question into a call to grep
in order to focus on uniqueness, changed the data in your file a bit (so it could fit here) and saved it as dups.log
:
# dups.log
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Succeeded.
This one-liner give the output below:
perl -E '++$seen{$_} for grep{/Warning/} <>; print %seen' dups.log
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
This is pretty much the same output you'd get with uniq log_with_dups.log | grep Warning
. It works because perl
creates a hash key from each line it reads on STDIN adding a key to the hash and incrementing its value (with ++$seen{$_}
) each time it sees the key. For perl
"same key" here means a line that is a duplicate. Try printing values %seen
or using -MDDP
and p %seen
to get a sense of what is going on.
To get your output @choroba's regex adds the capture (instead of the whole line) to the hash:
perl -nE '/:?(\S+?)[:\s]+Warning/ && ++$seen{$1} }{ say for keys %seen' dups.log
but, just as with the whole line method above, the regex will create only one copy of the key (from the match and capture) and then increment it with
++
so in the you get "unique" keys à la
uniq
in the
%seen
hash.
It's a neat perl trick you never forget :-)
References:
- The SO question has some good explanations of the perl idiom for
uniq
using a hash as per @choroba.
- This is touched on in perlfaq4 which describes the
%seen{}
hash trick.
- Perlmaven shows how to make your own "home made"
uniq
using this approach.
- ...