How to split a string with multiple patterns in perl?
Solution 1
Use a character class in the regex delimiter to match on a set of possible delimiters.
my $string= "10:10:10, 12/1/2011";
my @string = split /[:,\s\/]+/, $string;
foreach(@string) {
print "$_\n";
}
Explanation
The pair of slashes
/.../
denotes the regular expression or pattern to be matched.The pair of square brackets
[...]
denotes the character class of the regex.Inside is the set of possible characters that can be matched: colons
:
, commas,
, any type of space character\s
, and forward slashes\/
(with the backslash as an escape character).The
+
is needed to match on 1 or more of the character immediately preceding it, which is the entire character class in this case. Without this, the comma-space would be considered as 2 separate delimiters, giving you an additional empty string in the result.
Solution 2
Wrong tool!
my $string = "10:10:10, 12/1/2011";
my @fields = $string =~ /([0-9]+)/g;
Solution 3
You can split on non-digits;
#!/usr/bin/perl
use strict;
use warnings;
use 5.014;
my $string= "10:10:10, 12/1/2011";
say for split /\D+/, $string;
Solution 4
If numbers are what you want, extract numbers:
my @numbers = $string =~ /\d+/g;
say for @numbers;
Capturing parentheses are not required, as specified in perlop:
The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.
Solution 5
my $string= "10:10:10, 12/1/2011";
my @string = split(m[(?:firstpattern|secondpattern|thirdpattern)+], $string);
my @string = split(m[(?:/| |,|:)+], $string);
print join "\n", @string;
quinekxi
Updated on July 17, 2022Comments
-
quinekxi almost 2 years
I want to split a string with multiple patterns:
ex.
my $string= "10:10:10, 12/1/2011"; my @string = split(/firstpattern/secondpattern/thirdpattern/, $string); foreach(@string) { print "$_\n"; }
I want to have an output of:
10 10 10 12 1 2011
What is the proper way to do this?
-
quinekxi over 12 yearsWorked perfectly well! Thanks. Btw, do you mind to explain this code? /[:,\s\/]+/
-
quinekxi over 12 yearsThank you for the additional input, that simply explains everything! :D
-
TLP over 12 years
/| |,|:
better written as[/ ,:]
-
Joel Berger over 12 yearsI hadn't known about the behavior you highlighted, thanks, and good for golf too!
-
Joel Berger over 12 years@TLP, is it? IIRC alternations get compiled into a trie internally, does a character class? Not saying you are wrong, really a question.
-
TLP over 12 years@JoelBerger I don't know about the internals, but I think it's more readable. Here's a benchmark:
perl -wE "use Benchmark qw(cmpthese); $a=qq(10:10:10, 12/1/2011); cmpthese(100000, { Piped => sub { my @r = split (m[(?:/| |,|:)+], $a); }, Class => sub { my @r = split (m[(?:[/ ,:])+], $a); } });"
Piped 142450/s -- -27% // Class 194175/s 36% --
Looks like character class is 36% faster. -
TLP over 12 yearsOops, didn't see that the m delimiter was brackets. Strange that it didn't complain. Well, with
m##
, the results go up to 45% faster. -
quinekxi over 12 yearsI didn't know I could use this kind of approach. Good thinking! Thank you so much!
-
TLP over 12 years@quinekxi You're welcome.
split
is a very nice tool, but works best with uniform delimiters, I feel. In this case, the common element is numbers, so it's easier to work with them. -
quinekxi over 12 yearsYes, I know, I'm sorry, I didn't know there is another approach on it.
-
quinekxi over 12 years@TLP Yes, actually I used this approach but I didn't mark this as the answer just to comply on the original question. Anyway, thanks for your idea. I am glad I've got such great ideas from strangers you like.
-
TLP over 12 years@quinekxi Many of my answers are not the solutions the OPs asked for, but the one I thought they really wanted. Your question was really "How do I best extract the numbers from this string?" So that's the answer you got. :)
-
quinekxi over 12 years@TLP Sorry for giving a wrong question to the problem I have. I'm still on the learning process of being a programmer-wanna-be. I like you to be my mentor sort of. LOL, its just a comment. :D
-
TLP over 12 years@quinekxi You're always welcome to post new questions. It is a good way to learn.
-
ikegami over 12 years@quinekxi, No need to apologise, you didn't do anything wrong. A good reply usually comes from considering the bigger picture. Questions are often too specific.
-
quinekxi over 12 yearsThanks though for giving me something to think of and consider another solution.
-
Jacob over 12 yearsWhy are you linking to the 5.10.0 version of the page, instead of the version agnostic perldoc.perl.org/perlre.html#Metacharacters ?
-
reinierpost over 12 years@Brad Gilbert: Because that was the first one Google gave me, and I'm using 5.10 myself, and portability can potentially be an issue, and I didn't realize there was a version-agnostic version. Thanks for supplying the link.
-
KingsInnerSoul about 10 yearsI know it is an old thread, but I am wondering how I should add []() to the list of delimiters? It seems to get rid of the []() when I just add it there.
-
stevenl about 10 years@KingsInnerSoul, Add a backslash in front of each of those, just like I have for the slash above
-
James O'Brien over 9 yearsThis answer is more general - it can be used with entire words as well