What does this Perl regex mean: m/(.*?):(.*?)$/g?

16,724

Solution 1

Break it up into parts:

$lines =~ m/ (.*?)      # Match any character (except newlines)
                        # zero or more times, not greedily, and
                        # stick the results in $1.
             :          # Match a colon.
             (.*?)      # Match any character (except newlines)
                        # zero or more times, not greedily, and
                        # stick the results in $2.
             $          # Match the end of the line.
           /gx;

So, this will match strings like ":" (it matches zero characters, then a colon, then zero characters before the end of the line, $1 and $2 are empty strings), or "abc:" ($1 = "abc", $2 is an empty string), or "abc:def:ghi" ($1 = "abc" and $2 = "def:ghi").

And if you pass in a line that doesn't match (it looks like this would be if the string does not contain a colon), then it won't process the code that's within the brackets. But if it does match, then the code within the brackets can use and process the special $1 and $2 variables (at least, until the next regular expression shows up, if there is one within the brackets).

Solution 2

There is a tool to help understand regexes: YAPE::Regex::Explain.

Ignoring the g modifier, which is not needed here:

use strict;
use warnings;
use YAPE::Regex::Explain;

my $re = qr/(.*?):(.*?)$/;
print YAPE::Regex::Explain->new($re)->explain();

__END__

The regular expression:

(?-imsx:(.*?):(.*?)$)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

See also perldoc perlre.

Solution 3

It was written by someone who either knows too much about regular expressions or not enough about the $' and $` variables.

THis could have been written as

if ($lines =~ /:/) {
    ... # use $` ($PREMATCH)  instead of $1
    ... # use $' ($POSTMATCH) instead of $2
}

or

if ( ($var1,$var2) = split /:/, $lines, 2 and defined($var2) ) {
    ... # use $var1, $var2 instead of $1,$2
}

Solution 4

(.*?) captures any characters, but as few of them as possible.

So it looks for patterns like <something>:<somethingelse><end of line>, and if there are multiple : in the string, the first one will be used as the divider between <something> and <somethingelse>.

Solution 5

That line says to perform a regular expression match on $lines with the regex m/(.*?):(.*?)$/g. It will effectively return true if a match can be found in $lines and false if one cannot be found.

An explanation of the =~ operator:

Binary "=~" binds a scalar expression to a pattern match. Certain operations search or modify the string $_ by default. This operator makes that kind of operation work on some other string. The right argument is a search pattern, substitution, or transliteration. The left argument is what is supposed to be searched, substituted, or transliterated instead of the default $_. When used in scalar context, the return value generally indicates the success of the operation.

The regex itself is:

m/    #Perform a "match" operation
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
:     #Match a literal colon character
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
$     #Match the end of string
/g    #Perform the regex globally (find all occurrences in $line)

So if $lines matches against that regex, it will go into the conditional portion, otherwise it will be false and will skip it.

Share:
16,724
Admin
Author by

Admin

Updated on June 11, 2022

Comments

  • Admin
    Admin almost 2 years

    I am editing a Perl file, but I don't understand this regexp comparison. Can someone please explain it to me?

    if ($lines =~ m/(.*?):(.*?)$/g) { } .. 
    

    What happens here? $lines is a line from a text file.

  • brian d foy
    brian d foy over 13 years
    If you want to use /:/, use the /p flag and the ${^PREMATCH} and ${^POSTMATCH} variables from Perl 5.10. I'd prefer split, though, since that's what's actually happening.