Regular expression to capture last occurrence of a pattern

regex perl

15,557

Solution 1

you could use this pattern

(\w+)(?=\s*=)

Demo

(               # Capturing Group (1)
  \w            # <ASCII letter, digit or underscore>
  +             # (one or more)(greedy)
)               # End of Capturing Group (1)
(?=             # Look-Ahead
  \s            # <whitespace character>
  *             # (zero or more)(greedy)
  =             # "="
)               # End of Look-Ahead

Solution 2

\b(\w+)\s*= would suffice for your examples. It matches a word optionally immediately followed by whitespace, immediately followed by =. The \b reduces backtracking.

\b(\w+)[^\w=]*= matches your "verbal expression" more precisely. For example, it will match abc in abc !@# = def.

\b matches between a \w and \W.
\w matches a non-word character.
\W matches a character that's not a word character.
\s matches a whitespace character.
[^\w=] matches a non-word character other than =.

Solution 3

Jack's answer is probably the best, but I can't wrap my head around how it works. I like breaking things down into smaller chunks.

use warnings;
use strict;

my @strings = ( "abc def = ghi",
                "abc def ghi = jkl",
                "abc def ghi=jkl mno"
                );
#
foreach (@strings) {
    my $last = get_last($_);
    print "$last\n";
}

sub get_last {
    my $string = shift;
    # group things as left side or right side
    my $left_side;
    my $right_side;
    if ($string =~ /(.*)=(.*)/) {
        $left_side = $1;
        $right_side = $2;
    }

    # split things according to whitespace and store in an array
    my @left_side = split (/\s+/, $left_side);

    # return the last element of that array
    return $left_side[-1];
}

15,557

Author by

Allen

Updated on August 02, 2022

Comments

Allen over 1 year
I tried several ways for last occurrence, but they are not working. The following is my case,
```
abc def = ghi
abc def ghi = jkl
abc def ghi=jkl mno
```
For the first line, my capture target is "def". For the second line, my capture target is "ghi", and for the 3rd line, my capture target is "ghi". The target can be verbally expressed as "the last occurrence of word before equal sign".

How does the regular expression of Perl should look like?
ikegami about 9 years

The look-ahead's only function here is to slow down the matching.
Allen about 9 years

I was using look-behind as I think it is look behind from "=" sign to find the closest word. But the look-behind does not work for me.
Allen about 9 years

I thought [^\w=] is meaning does not match word character and "=" sign.
ikegami about 9 years

@Allen, You can't use lookbehind here because the length of what the lookbehinds matches must be independent of the input. Even if it did work, using a lookbeind would just have slowed things down for nothing. I'm disappointed you accepted this subpar solution
ikegami about 9 years

@Allen, You are correct that it doesn't do that, but you might as well have said "[^\w=] doesn't swim". Saying one thing it doesn't do is useless, so I said what it does do (matches a char that's not a word char and not =).
Admin about 9 years

Yeah, it is a bit compact... In essence, grabbing an indexed element directly from a splitted array--ie split(/\s*=\s*/, $str)[0] is a syntax error. So, you have to wrap the split in parentheses, like so: (split(/\s*=\s*/, $str))[0]. The rest of it is splitting that element on /\s+/ and grabbing the last element of that resulting array. Hopefully, that clears things up. Your writeup is good, too. :)