1

I asked in another topic about matching numbers like 123. This was too narrow and as I get deeper into Regex I see that you really have to define anything. So I asked for exponential notation and got an answer in this post: /^keyword\s+(-?(?:\d+|\d*\.\d*)(?:[Ee]-?(?:\d+|\d*\.\d*))?)/. I tried to understand this but failed so far.

So I ask more specific now. I need to match numbers, I give some examples here:

13
-999
83.12300
.151
-.213
1e14
124e2
-9e-4

You got it, the regular math stuff.

And to be even more specific I give you my Perl code for this. I am searching for keyword on a line and need to get a value from this line. I'd like to get this value in one Regex because my workaround with the or-statement || seems to cause problems.

my $value;
open(FILE,"data.dat") or die "error on opening data: $!\n";
while (my $line = <FILE>) {
        if (($line =~ /^keyword\s+(-?(?:\d+|\d*\.\d*)(?:[Ee]-?(?:\d+|\d*\.\d*))?)/x) || ($line =~ /^keyword\s*(\d*\.\d*)/)) {
                $value = $1;
        };
}
close(FILE);

Edit

Thx to all for the hints so far.

Community
  • 1
  • 1
EverythingRightPlace
  • 1,197
  • 12
  • 33

4 Answers4

2

Go to cpan and get Regexp::Common.

Use it like this

use Regexp::Common;

my $re = $RE{num}{real};

if ( $line =~ /^keyword\s+($re)/ ) {
  $value = $1;
}

Much easier than do-it-yourself regular expression rolling.

Borodin
  • 126,100
  • 9
  • 70
  • 144
xcramps
  • 1,203
  • 1
  • 9
  • 9
  • Also thx for this hint but I really want to understand this. Regex seems to be extremly powerful, just need to get my brain behind it – EverythingRightPlace Jul 31 '13 at 11:39
  • @Borodin Thx for your opinion but you can beliefe me that I have read something. Ofc I am not that experienced like others... I have an example from a tutorial which states `/\d\d:\d\d/` which should math something like `12:15`. Just to show my assumption with the colon. Sorry that I didn't have the time to read hours and hours of tutorials, I plan to fetch this later. But atm I have a specific problem and try to solve this. Thank you. – EverythingRightPlace Jul 31 '13 at 11:57
  • 1
    @bashophil: For now, you should solve your *"specific problem"* by taking xcramps' advice and using `Regexp::Common`. I have improved his answer to include sample code. – Borodin Jul 31 '13 at 12:17
1

The second regex in your code seems to be redundant, you can safely remove it. The first regex should match all your testcases. Is there anything it doesn't seem to be working with?

You should also tweak your regex, because currently it considers -.e-. to be a number. This comes from having \d*\.\d* which matches .. You could try (?:\d+(?:\.\d*)?|\.\d+) instead of what you have, which would match either 1) digits, 2) digits followed by a decimal and possibly more digits, or 3) a decimal followed by digits.

Dan
  • 10,531
  • 2
  • 36
  • 55
  • Thx Dan. The first regex (in or-statement) works for exp notation. But it doesn't work for e.g. `12.124`. So I tried to add the second regex which works for this number if I use it alone. But in the or-statement it isn't working. – EverythingRightPlace Jul 31 '13 at 11:42
  • @bashophil: You are getting very confused. Your first regex matches `12.124 just fine, except for the `keyword` at the beginning that you have mysteriously failed to explain. – Borodin Jul 31 '13 at 11:49
  • @Borodin Please read my whole post. I search for a keyword and try to get a number (no matter which format) from this line. – EverythingRightPlace Jul 31 '13 at 11:54
  • @Borodin I checked it again: `if ($line =~ /^keyword\s+(-? (?:\d+|\d*\.\d*) (?:[Ee]-? (?:\d+|\d*\.\d*) ) ?)/x) {` gets the exp notation. With `if ($line =~ /^keyword\s*(\d*\.\d*)/) {` I am able to get something like `12.124` but if I combine the two statements with the or-statement it isn't working for this floating point number. – EverythingRightPlace Jul 31 '13 at 12:06
  • Is there space between the `keyword` and `12.124` in your test string? One regex uses `\s+` and the other uses `\s*`. – Dan Jul 31 '13 at 12:16
  • Also, and I'm not sure about this because I'm out of practice with Perl, but at the end of your first regex, `) ) ?)/x`, are you sure it doesn't have to be `) )? )/x` Can you even have whitespace between a quantifier and the thing it's associated with? – Dan Jul 31 '13 at 12:17
  • @Dan: The difference of `\s+` and `\s*` is no problem, there is always at least one space on the line. The `(d+)|(\d*\.\d*)` (I added the brackets because otherwise it makes no sense for me; we want to match integers too) seems to be happy with the integer part and matches just this. So it's not very greedy in this case. It isn't matching stuff like the example of `12.124`. Btw the spaces in the regex are no problem at all if you choose the `/x`-option. Greetings – EverythingRightPlace Aug 08 '13 at 08:51
1

There is another way to do this, and you don't need regular expressions for it. You can use looks_like_number from Scalar::Util

Here's an example: How do I tell if a variable has a numeric value in Perl? I pasted it here for you.


Example:

#!/usr/local/bin/perl

use warnings;
use strict;

use Scalar::Util qw(looks_like_number);

my @exprs = qw(1 5.25 0.001 1.3e8 foo bar 1dd);

foreach my $expr (@exprs) {
    print "$expr is", looks_like_number($expr) ? '' : ' not', " a number\n";
}

Gives this output:

1 is a number
5.25 is a number
0.001 is a number
1.3e8 is a number
foo is not a number
bar is not a number
1dd is not a number

edit: @borodin's comment

You would use it in a way like this:

my $value;
open(FILE,"data.dat") or die "error on opening data: $!\n";
while (my $line = <FILE>) {
        if (($line =~ /^keyword +(.*)/)) {
             my $number = $1;
             if ( looks_like_number($number) ) { 
                 $value = $number;
             }
        };
}

edit: if you have to have a regex, you can an expression like this:

 #!/bin/perl
 use strict;
 use warnings;

 my @numbers = ( 'keyword 13',
                 ' word   25',
                 'keyword -999',
                 'keyword 83.12300',
                 'keyword  .151',
                 'keyword -.213',
                 'keyword 1e14',
                 'keyword 124e2',
                 'keyword -9e-4 ',
                 ' keyword  e43e',
                 'keyword 4.5.6',
                 'keyword 4..e',
                 'keyword NaN',
                 'keyword Inf');

 for (@numbers) {

      if ( /^keyword +(-?((\d+\.?\d*)|(\d*\.?\d+))([Ee]-?\d+)?)/ ) {

         print "$1 is a number\n";

     } else {
         print "$_ does not match keyword or is not a number\n";
     }

 }
Community
  • 1
  • 1
hmatt1
  • 4,939
  • 3
  • 30
  • 51
  • That won't pull a number out of the middle of a string, which is what seems to be required. – Borodin Jul 31 '13 at 12:27
  • @borodin, see edit. I guess my point is that we shouldn't need to make a regex to match a number since that functionality already exists in the core. – hmatt1 Jul 31 '13 at 12:36
  • But now you are relying on the number being the *whole* of the rest of the string after the `keyword`. Suppose you want to find the numbers in `keyword 3.14+.67+7e4`. All `looks_like_number` will do is, given a string, tell you whether or not the *whole thing* looks like a number. – Borodin Jul 31 '13 at 12:42
  • You could match `/^keyword (.*?)+(.*?)+(.*?)/` and validate each number if that's what you expect. You could change `.*` to `\S*` if you're worried about whitespace. You still need a regular expression, but I think it makes your code more readable to you check to see if it's a number with the subroutine than with a long regular expression. – hmatt1 Jul 31 '13 at 12:50
0

Thank to your informative postings and the stuff I read the last days, I was able to understand more of the regex-structure. So for this rather simple task I don't want to use additional modules/packages and want to stick to regex. I made some tests and changes to leave out redundancy and adjust to my task. So I won't have several numbers on one line and there can be whitespace on the line. Also, the end of a number is defined by a semicolon. To summarize, I post my final code. Thanks all for the help.

#!/usr/bin/perl

use strict;
use warnings;

my @numbers=(
"keyword 152;",
"keyword 12.23;",
"keyword -2.001;",
"keyword .123;",
"keyword -12.;",
"keyword 55.44.33;",
"keyword 3e14;",
"keyword -3.000e0014;",
"keyword 5e-04;",
"   keyword     5e-04;  ",
"keyword 5e-04  ;",
"keyword .1e2;",
"keyword 9.e3;",
"keyword -0.01E-03;",
"keyword 1.3e-03;",
"keyword 1dd;",
"keyword -12E3e1;",
"keyword -.e.;",
"keyword -.e-.;");

for (@numbers) {

if (    /\s* keyword \s+        # stuff before matched number
    ( -?            # optional minus sign
      (?:           # no saving of group in brackets
        (?:\d+\.?\d*)       # match trailing digit and possible floating point number
        |           # or
        (?:\.\d+)       # no trailing digit and forced fpn
      )
    (?:[Ee]-?\d+)?      # optional exponential notation
    )           # end of group to be matched
    ;\s*            # stuff after matched number
    /x) {

print "<<__$_\__>>\n\t $1 \n";
} else { 
print "<<__$_\__>>\n\t !!!!! no matching here !!!!!\n";
}
}

Output:

<<__keyword 152;__>>
     152 
<<__keyword 12.23;__>>
     12.23 
<<__keyword -2.001;__>>
     -2.001 
<<__keyword .123;__>>
     .123 
<<__keyword -12.;__>>
     -12. 
<<__keyword 55.44.33;__>>
     !!!!! no matching here !!!!!
<<__keyword 3e14;__>>
     3e14 
<<__keyword -3.000e0014;__>>
     -3.000e0014 
<<__keyword 5e-04;__>>
     5e-04 
<<__    keyword     5e-04;  __>>
     5e-04 
<<__keyword 5e-04   ;__>>
     !!!!! no matching here !!!!!
<<__keyword .1e2;__>>
     .1e2 
<<__keyword 9.e3;__>>
     9.e3 
<<__keyword -0.01E-03;__>>
     -0.01E-03 
<<__keyword 1.3e-03;__>>
     1.3e-03 
<<__keyword 1dd;__>>
     !!!!! no matching here !!!!!
<<__keyword -12E3e1;__>>
     !!!!! no matching here !!!!!
<<__keyword -.e.;__>>
     !!!!! no matching here !!!!!
<<__keyword -.e-.;__>>
     !!!!! no matching here !!!!!

PS: I have read that the ?: might not save ressources while the code is running and it makes the regex not very eye-friendly, so one might leave this out.

EverythingRightPlace
  • 1,197
  • 12
  • 33