How do we extract parts of a string in Perl?
Solution 1
It does not have to be regex, but in Perl it is so damn convenient:
my $str = "[ timestamp | integer | string ] Some other string here";
my ($timestamp, $integer, $string, $other)
= ($str =~ /\[(.*?)\|(.*?)\|(.*?)\](.*)/);
Solution 2
You could do it just like Java:
-
String.substring
issubstr
. -
String.lastIndexOf
isrindex
. -
String.trim
issub trim { my $s = $_[0]; $s =~ s/^\s+//; $s =~ s/\s+\z//; $s }
. -
+
is.
.
But that method finds the last |
and ]
, not the second and next respectively. It'll fail if either of those chars occur later in the string. I'd use
my ($ts, $i, $s, $rest) =
map trim($_),
/^\[ ([^|]*) \| ([^|]*) \| ([^\]]*) \] (.*)/sx;
Solution 3
If the strings you are matching don't contain other vertical bars, you could use a regular expression:
$fullstring = '[ timestamp | integer | string ] Some other string here';
($string) = ($fullstring =~ /\| *([^|\]]*?) *]/);
Solution 4
Regular expressions are a natural Perl-ish way of doing things. In this case, we want the string between the last '|' and the first ']', minus any whitespace surrounding it.
my $string = ($line =~ m/
\| #The | character
\s* #Arbitrary whitespace
( #Capture
[^\|\]]*? #Some number of characters that are not | or ]
)
\s* #More whitespace
\] # The ] character
/x)[0];
The idiom (m/(reg)ex/)[0]
is used to extract the first capture group from the regular expression. Otherwise, an array of capture groups is returned and converted to a scalar (the length of the array).
The /x
modifier on the regular expression causes whitespace and #comments to be ignored.
The *?
token within the regular expression means "non-greedy" matching. Otherwise, the trailing whitespace would be captured, too.
Solution 5
Line can be parsed by splitting on |[]
chars, and then trimming spaces for extracted values
my @arr = map { s/^\s+ | \s+$//xg; $_ } split / [\Q[]|\E] /x, $line;
after that $arr[0]
is timestamp
$arr[1]
is integer
, and so on.
Cratylus
Updated on June 13, 2022Comments
-
Cratylus almost 2 years
I am new in
Perl
. I have a string of this format:
[ timestamp | integer | string ] Some other string here
Sample string:
[ 2013/05/28 21:39:02 | 2212 | MALFUNCTION ] Please check for malfunction
The
timestamp
is actually a timestamp e.g.2013/05/28 20:38:02
The integer is a number and the string can be a specific word out of a sequence of words.
I am interested in extracting the string part of this.In
Java
I would do it as simple as:String s = sentence.substring(line.lastIndexOf("|") + 1, line.lastIndexOf("]")).trim();
This just loops over the string character by character and gets the part of interest.
But I don't how how this kind of "problems" are solved inPerl
.
How would I do this? Only via regular expressions?