Regular expression to capture last occurrence of a pattern

15,557

Solution 1

you could use this pattern

(\w+)(?=\s*=)

Demo

(               # Capturing Group (1)
  \w            # <ASCII letter, digit or underscore>
  +             # (one or more)(greedy)
)               # End of Capturing Group (1)
(?=             # Look-Ahead
  \s            # <whitespace character>
  *             # (zero or more)(greedy)
  =             # "="
)               # End of Look-Ahead

Solution 2

\b(\w+)\s*= would suffice for your examples. It matches a word optionally immediately followed by whitespace, immediately followed by =. The \b reduces backtracking.

\b(\w+)[^\w=]*= matches your "verbal expression" more precisely. For example, it will match abc in abc !@# = def.

  • \b matches between a \w and \W.
  • \w matches a non-word character.
  • \W matches a character that's not a word character.
  • \s matches a whitespace character.
  • [^\w=] matches a non-word character other than =.

Solution 3

Jack's answer is probably the best, but I can't wrap my head around how it works. I like breaking things down into smaller chunks.

use warnings;
use strict;

my @strings = ( "abc def = ghi",
                "abc def ghi = jkl",
                "abc def ghi=jkl mno"
                );
#
foreach (@strings) {
    my $last = get_last($_);
    print "$last\n";
}

sub get_last {
    my $string = shift;
    # group things as left side or right side
    my $left_side;
    my $right_side;
    if ($string =~ /(.*)=(.*)/) {
        $left_side = $1;
        $right_side = $2;
    }

    # split things according to whitespace and store in an array
    my @left_side = split (/\s+/, $left_side);

    # return the last element of that array
    return $left_side[-1];
}
Share:
15,557
Allen
Author by

Allen

Updated on August 02, 2022

Comments

  • Allen
    Allen over 1 year

    I tried several ways for last occurrence, but they are not working. The following is my case,

    abc def = ghi
    abc def ghi = jkl
    abc def ghi=jkl mno
    

    For the first line, my capture target is "def". For the second line, my capture target is "ghi", and for the 3rd line, my capture target is "ghi". The target can be verbally expressed as "the last occurrence of word before equal sign".

    How does the regular expression of Perl should look like?

  • ikegami
    ikegami about 9 years
    The look-ahead's only function here is to slow down the matching.
  • Allen
    Allen about 9 years
    I was using look-behind as I think it is look behind from "=" sign to find the closest word. But the look-behind does not work for me.
  • Allen
    Allen about 9 years
    I thought [^\w=] is meaning does not match word character and "=" sign.
  • ikegami
    ikegami about 9 years
    @Allen, You can't use lookbehind here because the length of what the lookbehinds matches must be independent of the input. Even if it did work, using a lookbeind would just have slowed things down for nothing. I'm disappointed you accepted this subpar solution
  • ikegami
    ikegami about 9 years
    @Allen, You are correct that it doesn't do that, but you might as well have said "[^\w=] doesn't swim". Saying one thing it doesn't do is useless, so I said what it does do (matches a char that's not a word char and not =).
  • Admin
    Admin about 9 years
    Yeah, it is a bit compact... In essence, grabbing an indexed element directly from a splitted array--ie split(/\s*=\s*/, $str)[0] is a syntax error. So, you have to wrap the split in parentheses, like so: (split(/\s*=\s*/, $str))[0]. The rest of it is splitting that element on /\s+/ and grabbing the last element of that resulting array. Hopefully, that clears things up. Your writeup is good, too. :)