Regular expression to capture last occurrence of a pattern
Solution 1
you could use this pattern
(\w+)(?=\s*=)
( # Capturing Group (1)
\w # <ASCII letter, digit or underscore>
+ # (one or more)(greedy)
) # End of Capturing Group (1)
(?= # Look-Ahead
\s # <whitespace character>
* # (zero or more)(greedy)
= # "="
) # End of Look-Ahead
Solution 2
\b(\w+)\s*=
would suffice for your examples. It matches a word optionally immediately followed by whitespace, immediately followed by =
. The \b
reduces backtracking.
\b(\w+)[^\w=]*=
matches your "verbal expression" more precisely. For example, it will match abc
in abc !@# = def
.
-
\b
matches between a\w
and\W
. -
\w
matches a non-word character. -
\W
matches a character that's not a word character. -
\s
matches a whitespace character. -
[^\w=]
matches a non-word character other than=
.
Solution 3
Jack's answer is probably the best, but I can't wrap my head around how it works. I like breaking things down into smaller chunks.
use warnings;
use strict;
my @strings = ( "abc def = ghi",
"abc def ghi = jkl",
"abc def ghi=jkl mno"
);
#
foreach (@strings) {
my $last = get_last($_);
print "$last\n";
}
sub get_last {
my $string = shift;
# group things as left side or right side
my $left_side;
my $right_side;
if ($string =~ /(.*)=(.*)/) {
$left_side = $1;
$right_side = $2;
}
# split things according to whitespace and store in an array
my @left_side = split (/\s+/, $left_side);
# return the last element of that array
return $left_side[-1];
}
Allen
Updated on August 02, 2022Comments
-
Allen over 1 year
I tried several ways for last occurrence, but they are not working. The following is my case,
abc def = ghi abc def ghi = jkl abc def ghi=jkl mno
For the first line, my capture target is "def". For the second line, my capture target is "ghi", and for the 3rd line, my capture target is "ghi". The target can be verbally expressed as "the last occurrence of word before equal sign".
How does the regular expression of Perl should look like?
-
ikegami about 9 yearsThe look-ahead's only function here is to slow down the matching.
-
Allen about 9 yearsI was using look-behind as I think it is look behind from "=" sign to find the closest word. But the look-behind does not work for me.
-
Allen about 9 yearsI thought [^\w=] is meaning does not match word character and "=" sign.
-
ikegami about 9 years@Allen, You can't use lookbehind here because the length of what the lookbehinds matches must be independent of the input. Even if it did work, using a lookbeind would just have slowed things down for nothing. I'm disappointed you accepted this subpar solution
-
ikegami about 9 years@Allen, You are correct that it doesn't do that, but you might as well have said "
[^\w=]
doesn't swim". Saying one thing it doesn't do is useless, so I said what it does do (matches a char that's not a word char and not=
). -
Admin about 9 yearsYeah, it is a bit compact... In essence, grabbing an indexed element directly from a
split
ted array--iesplit(/\s*=\s*/, $str)[0]
is a syntax error. So, you have to wrap thesplit
in parentheses, like so:(split(/\s*=\s*/, $str))[0]
. The rest of it is splitting that element on/\s+/
and grabbing the last element of that resulting array. Hopefully, that clears things up. Your writeup is good, too. :)