How can I extract a string between matching braces in Perl?
Solution 1
This can certainly be done with regex at least in modern versions of Perl:
my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;
print join "\n" => @array;
The regex matches a curly brace block that contains either non curly brace characters, or a recursion into itself (matches nested braces)
Edit: the above code works in Perl 5.10+, for earlier versions the recursion is a bit more verbose:
my $re; $re = qr/ \{ (?: [^{}]* | (??{$re}) )* \} /x;
my @array = $str =~ /$re/xg;
Solution 2
Use Text::Balanced
Solution 3
I second ysth's suggestion to use the Text::Balanced
module. A few lines will get you on your way.
use strict;
use warnings;
use Text::Balanced qw/extract_multiple extract_bracketed/;
my $file;
open my $fileHandle, '<', 'file.txt';
{
local $/ = undef; # or use File::Slurp
$file = <$fileHandle>;
}
close $fileHandle;
my @array = extract_multiple(
$file,
[ sub{extract_bracketed($_[0], '{}')},],
undef,
1
);
print $_,"\n" foreach @array;
OUTPUT
{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}
{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}
{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}
{ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}}
{ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}}
{
ABC|*|XYZ:abc:pqr {GHI 0 68 0}
{{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
}
Solution 4
You can always count braces:
my $depth = 0;
my $out = "";
my @list=();
foreach my $fr (split(/([{}])/,$data)) {
$out .= $fr;
if($fr eq '{') {
$depth ++;
}
elsif($fr eq '}') {
$depth --;
if($depth ==0) {
$out =~ s/^.*?({.*}).*$/$1/s; # trim
push @list, $out;
$out = "";
}
}
}
print join("\n==================\n",@list);
This is old, plain Perl style (and ugly, probably).
Solution 5
I don't think pure regular expressions are what you want to use here (IMHO this might not even be parsable using regex).
Instead, build a small parser, similar to what's shown here: http://www.perlmonks.org/?node_id=308039 (see the answer by shotgunefx (Parson) on Nov 18, 2003 at 18:29 UTC)
UPDATE It seems it might be doable with a regex - I saw a reference to matching nested parentheses in Mastering Regular Expressions (that's available on Google Books and thus can be googled for if you don't have the book - see Chapter 5, section "Matching balanced sets of parentheses")
Srilesh
Updated on June 15, 2022Comments
-
Srilesh almost 2 years
My input file is as below :
HEADER {ABC|*|DEF {GHI 0 1 0} {{Points {}}}} {ABC|*|DEF {GHI 0 2 0} {{Points {}}}} {ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}} {ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}} {ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}} { ABC|*|XYZ:abc:pqr {GHI 0 68 0} {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}} } TRAILER
I want to extract the file into an array as below :
$array[0] = "{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}" $array[1] = "{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}" $array[2] = "{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}" .. .. $array[5] = "{ ABC|*|XYZ:abc:pqr {GHI 0 68 0} {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}} }"
Which means, I need to match the first opening curly brace with its closing curly brace and extract the string in between.
I have checked the below link, but this doesnt apply to my question. Regex to get string between curly braces "{I want what's between the curly braces}"
I am trying but would really help if someone can assist me with their expertise ...
Thanks Sri ...
-
Srilesh about 14 yearsTried this, but I get the error Sequence (?0...) not recognized in regex; marked by <-- HERE in m/( \{ (?: [^{}]* | (?0 <-- HERE ) )* \} )/
-
Srilesh about 14 yearsBased on ysth's suggestion, i used Text::Balanced, but I was getting only the first match. Thanks for helping me here, I need to use the extract_multiple sub too. Thank you ..
-
Srilesh about 14 yearsThanks ysth, this is the best solution !!
-
Srilesh about 14 yearsThanks zig, your response is very helpful.
-
Ether about 14 years@Srilesh: if you like this answer best, please click the outlined checkmark to the left of the answer.
-
Eric Strom about 14 years@Srilesh => the code I posted required perl 5.10+, i have edited my answer to include a version that will work in older perls.
-
Srilesh about 14 yearsSolutions provided by @ysth, @Zaid, @leonbloy works fine for me, but @eric's solution has very good performance. I am applying the recursion on a 10MB file and the result is really fast compared to the others. Choosing your answer to be the best solution here. Thank you very much.