How can I extract a string between matching braces in Perl?

regex perl parsing matching braces

18,615

Solution 1

This can certainly be done with regex at least in modern versions of Perl:

my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;

print join "\n" => @array;

The regex matches a curly brace block that contains either non curly brace characters, or a recursion into itself (matches nested braces)

Edit: the above code works in Perl 5.10+, for earlier versions the recursion is a bit more verbose:

my $re; $re = qr/ \{ (?: [^{}]* | (??{$re}) )* \} /x;

my @array = $str =~ /$re/xg;

Solution 2

Use Text::Balanced

Solution 3

I second ysth's suggestion to use the Text::Balanced module. A few lines will get you on your way.

use strict;
use warnings;
use Text::Balanced qw/extract_multiple extract_bracketed/;

my $file;
open my $fileHandle, '<', 'file.txt';

{ 
  local $/ = undef; # or use File::Slurp
  $file = <$fileHandle>;
}

close $fileHandle;

my @array = extract_multiple(
                               $file,
                               [ sub{extract_bracketed($_[0], '{}')},],
                               undef,
                               1
                            );

print $_,"\n" foreach @array;

OUTPUT

{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}
{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}
{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}
{ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}}
{ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}}
{
    ABC|*|XYZ:abc:pqr {GHI 0 68 0}
        {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
        }

Solution 4

You can always count braces:

my $depth = 0;
my $out = "";
my @list=();
foreach my $fr (split(/([{}])/,$data)) {
    $out .= $fr;
    if($fr eq '{') {
        $depth ++;
    }
    elsif($fr eq '}') {
        $depth --;
        if($depth ==0) {
            $out =~ s/^.*?({.*}).*$/$1/s; # trim
            push @list, $out;
            $out = "";
        }
    }
}
print join("\n==================\n",@list);

This is old, plain Perl style (and ugly, probably).

Solution 5

I don't think pure regular expressions are what you want to use here (IMHO this might not even be parsable using regex).

Instead, build a small parser, similar to what's shown here: http://www.perlmonks.org/?node_id=308039 (see the answer by shotgunefx (Parson) on Nov 18, 2003 at 18:29 UTC)

UPDATE It seems it might be doable with a regex - I saw a reference to matching nested parentheses in Mastering Regular Expressions (that's available on Google Books and thus can be googled for if you don't have the book - see Chapter 5, section "Matching balanced sets of parentheses")

View more solutions

18,615

Author by

Srilesh

Updated on June 15, 2022

Comments

Srilesh almost 2 years

My input file is as below :

HEADER 
{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}

{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}

{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}

{ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}}

{ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}}

{
    ABC|*|XYZ:abc:pqr {GHI 0 68 0}
        {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
        }
TRAILER

I want to extract the file into an array as below :

$array[0] = "{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}"

$array[1] = "{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}"

$array[2] = "{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}"

..
..

$array[5] = "{
    ABC|*|XYZ:abc:pqr {GHI 0 68 0}
        {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
        }"

Which means, I need to match the first opening curly brace with its closing curly brace and extract the string in between.

I have checked the below link, but this doesnt apply to my question. Regex to get string between curly braces "{I want what's between the curly braces}"

I am trying but would really help if someone can assist me with their expertise ...

Thanks Sri ...

Srilesh about 14 years

Tried this, but I get the error Sequence (?0...) not recognized in regex; marked by <-- HERE in m/( \{ (?: [^{}]* | (?0 <-- HERE ) )* \} )/
Srilesh about 14 years

Based on ysth's suggestion, i used Text::Balanced, but I was getting only the first match. Thanks for helping me here, I need to use the extract_multiple sub too. Thank you ..
Srilesh about 14 years

Thanks ysth, this is the best solution !!
Srilesh about 14 years

Thanks zig, your response is very helpful.
Ether about 14 years

@Srilesh: if you like this answer best, please click the outlined checkmark to the left of the answer.
Eric Strom about 14 years

@Srilesh => the code I posted required perl 5.10+, i have edited my answer to include a version that will work in older perls.
Srilesh about 14 years

Solutions provided by @ysth, @Zaid, @leonbloy works fine for me, but @eric's solution has very good performance. I am applying the recursion on a 10MB file and the result is really fast compared to the others. Choosing your answer to be the best solution here. Thank you very much.