How do we extract parts of a string in Perl?

13,642

Solution 1

It does not have to be regex, but in Perl it is so damn convenient:

my $str = "[ timestamp | integer | string ] Some other string here";
my ($timestamp, $integer, $string, $other)
   = ($str =~ /\[(.*?)\|(.*?)\|(.*?)\](.*)/);

Solution 2

You could do it just like Java:

  • String.substring is substr.
  • String.lastIndexOf is rindex.
  • String.trim is sub trim { my $s = $_[0]; $s =~ s/^\s+//; $s =~ s/\s+\z//; $s }.
  • + is ..

But that method finds the last | and ], not the second and next respectively. It'll fail if either of those chars occur later in the string. I'd use

my ($ts, $i, $s, $rest) =
   map trim($_),
      /^\[ ([^|]*) \| ([^|]*) \| ([^\]]*) \] (.*)/sx;

Solution 3

If the strings you are matching don't contain other vertical bars, you could use a regular expression:

$fullstring = '[ timestamp | integer | string ] Some other string here';
($string) = ($fullstring =~ /\| *([^|\]]*?) *]/);

Solution 4

Regular expressions are a natural Perl-ish way of doing things. In this case, we want the string between the last '|' and the first ']', minus any whitespace surrounding it.

my $string = ($line =~ m/
    \|  #The | character
    \s* #Arbitrary whitespace
    (   #Capture
        [^\|\]]*? #Some number of characters that are not | or ]
    )
    \s* #More whitespace
    \]  # The ] character
    /x)[0];

The idiom (m/(reg)ex/)[0] is used to extract the first capture group from the regular expression. Otherwise, an array of capture groups is returned and converted to a scalar (the length of the array).

The /x modifier on the regular expression causes whitespace and #comments to be ignored.

The *? token within the regular expression means "non-greedy" matching. Otherwise, the trailing whitespace would be captured, too.

Solution 5

Line can be parsed by splitting on |[] chars, and then trimming spaces for extracted values

my @arr = map { s/^\s+ | \s+$//xg; $_ }  split / [\Q[]|\E] /x, $line;

after that $arr[0] is timestamp $arr[1] is integer, and so on.

Share:
13,642
Cratylus
Author by

Cratylus

Updated on June 13, 2022

Comments

  • Cratylus
    Cratylus almost 2 years

    I am new in Perl. I have a string of this format:
    [ timestamp | integer | string ] Some other string here

    Sample string:

    [ 2013/05/28 21:39:02 | 2212 | MALFUNCTION  ] Please check for malfunction
    

    The timestamp is actually a timestamp e.g. 2013/05/28 20:38:02
    The integer is a number and the string can be a specific word out of a sequence of words.
    I am interested in extracting the string part of this.

    In Java I would do it as simple as:

    String s = sentence.substring(line.lastIndexOf("|") + 1, line.lastIndexOf("]")).trim();  
    

    This just loops over the string character by character and gets the part of interest.
    But I don't how how this kind of "problems" are solved in Perl.
    How would I do this? Only via regular expressions?