0

I need to parse a bunch of single-line text that follow a similar format:

This is an example instance [link[filename1|path_to_file1]] that I need to parse [link[filename2|path_to_file2]]. Here is another [link[filename3|path_to_file3]] and so on [link[filename4|path_to_file4]]

I want to regex for every text before and after the | symbol.

I have the following regex:

while ( my @row = $sth->fetchrow_array() ) {
    my $line = $row[4];
    if($line =~ /\[link\[(.*?)\|(.*?)\]?\]/g){
            print "MATCH1: $1\n";
            print "MATCH2: $2\n";
        }
}

Unfortunately, it will only return me the first instance (filename1, path_to_file1). How can I change it so all instances are captured?

I would like the following output:

MATCH1:filename1
MATCH2:path_to_file1
MATCH1:filename2
MATCH2:path_to_file2
MATCH1:filename3
MATCH2:path_to_file3
MATCH1:filename4
MATCH2:path_to_file4
Steve
  • 1,047
  • 1
  • 9
  • 13
  • You need to be more clear in your question regarding the final _output_ that you want to see. I wanted to help you here but didn't really understand what you wanted. – Jonathan Benn Jun 30 '16 at 18:25
  • Ok, thank you. And what's the content of `$line`? Is it _one_ set of `"[link[filename1|path_to_file1]]"` or _two_ (or more) sets, like `"[link[filename1|path_to_file1]] … [link[filename2|path_to_file2]]"`? – PerlDuck Jun 30 '16 at 18:27
  • @JonathanBenn updated – Steve Jun 30 '16 at 18:33
  • 2
    Maybe you need a `while` rather than `if`? – Wiktor Stribiżew Jun 30 '16 at 18:34
  • @PerlDog `$line` is the example text at the beginning, so multiple sets in a single line. – Steve Jun 30 '16 at 18:34
  • @WiktorStribiżew Exactly! Just tried it out. So: `while($line =~ /\[link\[(.*?)\|(.*?)\]?\]/g){ ... }` – PerlDuck Jun 30 '16 at 18:35
  • @PerlDog you should probably make your comment into a full answer :) – Jonathan Benn Jun 30 '16 at 18:37
  • Or close as a dupe of http://stackoverflow.com/questions/6374783/matching-a-regular-expression-multiple-times-with-perl, or http://stackoverflow.com/questions/11208924/regex-match-all-occurrences – Wiktor Stribiżew Jun 30 '16 at 18:40
  • @JonathanBenn Thank you for the suggestion but Wiktor spotted it first, the OP has an answer, and the questions Wiktor refers to are indeed dupes of this one (or the other way round, for that matter). – PerlDuck Jun 30 '16 at 18:48
  • 1
    @WiktorStribiżew I just got reminded of the value of some rules here. I spent a few minutes reading through the question, and then through the comments to see whether somebody answered it. So, let's either post the answer or indeed mark it as a duplicate -- so that people don't spend time. (Not that my few minutes are so precious :) – zdim Jun 30 '16 at 19:05
  • @WiktorStribiżew Now when I read my comment I am not sure whether it sounds like some criticism or such-- I meant to say it'd be good to mark is as duplicate (or that you post the answer). – zdim Jun 30 '16 at 19:27

1 Answers1

0

Use regexp match in the list context:

while ( my @row = $sth->fetchrow_array() ) {
    my $line = $row[4];
    my @captures = $line =~ /\[link\[(.*?)\|(.*?)\]?\]/g;
    while(@captures){
        my $filename = shift @captures;
        my $path = shift @captures;
        print "MATCH1: $filename\n";
        print "MATCH2: $captures\n";
    }
}