Perl one liner to extract a multi-line pattern
Solution 1
The regex does not match even the single line. What do you think the double parentheses do?
You probably wanted
m/^\s*(\w+)\s+(\w+?)\s*\([\w0-9,*\s]+\)\s{/gm
Update: The specification has changed. The regex has (almost) not, but you have to change the code slightly:
perl -0777 -nle 'print "$1\n" while m/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/gm'
Another update:
Explanation:
- The switches are described in
perlrun
: zero, n, l, e -
The regex can be auto-explained by YAPE::Regex::Explain
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/)->explain' The regular expression: (?-imsx:^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w+? word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \( '(' ---------------------------------------------------------------------- [\w0-9,*\s]+ any character of: word characters (a-z, A-Z, 0-9, _), '0' to '9', ',', '*', whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \) ')' ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- { '{' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
- The /gm switches are explained in perlre
Solution 2
Use the Flip-Flop Operator for a One-Liner
Perl makes this really easy with the flip-flop operator, which will allow you to print out all the lines between two regular expressions. For example:
$ perl -ne 'print if /^abcd25/ ... /\bhj \) {/' /tmp/foo
abcd25
ef_gh
( fg*_h
hj_b*
hj ) {
However, a simple one-liner like this won't differentiate between matches where you want to reject specific matches between the delimiting patterns. That calls for a more complex approach.
More Complicated Comparisons Benefit from Conditional Branching
One-liners aren't always the best choice, and regular expressions can get out of hand quickly if they become too complex. In such situations, you're better off writing an actual program that can use conditional branching rather than trying to use an over-clever regular expression match.
One way to do this is to build up your match with a simple pattern, and then reject any match that doesn't match some other simple pattern. For example:
#!/usr/bin/perl -nw
# Use flip-flop operator to select matches.
if (/^abcd25/ ... /\bhj \) {/) {
push @string, $_
};
# Reject multi-line patterns that don't include a particular expression
# between flip-flop delimiters. For example, "( fg" will match, while
# "^fg" won't.
if (/\bhj \) {/) {
$string = join("", @string);
undef @string;
push(@matches, $string) if $string =~ /\( fg/;
};
END {print @matches}
When run against the OP's updated corpus, this correctly yields:
abcd25
ef_gh
( fg*_h
hj_b*
hj ) {
abcd25 ef_gh ( fg*_h hj_b* hj ) {
Comments
-
Gil almost 2 years
I have a pattern in a file as follows which can/cannot span over multiple lines :
abcd25 ef_gh ( fg*_h hj_b* hj ) {
What I have tried :
perl -nle 'print while m/^\s*(\w+)\s+(\w+?)\s*(([\w-0-9,* \s]))\s{/gm'
I dont know what the flags mean here but all I did was write a
regex
for the pattern and insert it in the pattern space .This matches well if the the pattern is in a single line as :abcd25 ef_gh ( fg*_h hj_b* hj ) {
But fails exclusively in the multiline case !
I started with perl yesterday but the syntax is way too confusing . So , as suggested by one of our fellow SO mate ,I wrote a
regex
and inserted it in the code provided by him .I hope a
perl
monk can help me in this case . Alternative solutions are welcome .Input file :
abcd25 ef_gh ( fg*_h hj_b* hj ) { abcd25 ef_gh fg*_h hj_b* hj ) { jhijdsiokdù ()lmolmlxjk; abcd25 ef_gh ( fg*_h hj_b* hj ) {
Expected output :
abcd25 ef_gh ( fg*_h hj_b* hj ) { abcd25 ef_gh ( fg*_h hj_b* hj ) {
The input file can have multiple patterns which coincides with the start and end pattern of the required pattern. Thanks in advance for the replies.
-
Gil over 11 yearsI am not sure what double parentheses does :( I wrote the regex via a simulator ;)
-
Gil over 11 yearsNow the single line match is ok but still stuck at multiline !
-
Gil over 11 yearsYes, But this will interfere with other patterns in the file .
-
Todd A. Jacobs over 11 years@Geekasaur Sorry, but this exactly matches your corpus and your expected output, as currently defined in your question. Please update your question if you have other and/or additional requirements.
-
pavel over 11 years@Geekasaur: the above pattern also works with multi line input!
-
Gil over 11 yearsgnome : Sorry for not being specific . I will update the question to transmit a better idea .
-
Todd A. Jacobs over 11 years@Geekasaur If you change start-of-line to start-of-word, how does
perl -ne 'print if /^abcd25/ ... /\bhj \) {/' /tmp/foo
not do what you want? -
Gil over 11 yearsYes It does extract the pattern but throws in unwanted matches too ! May be if you add a brief description to your code ,I can tweak my regex a bit like ,should the end pattern of the match be at the beginning of the line .
-
Gil over 11 years@pavel Thanks ,Indeed it does :) Can you add a brief description to the flags used and what perl does in this scenario ?