How to split a string with multiple patterns in perl?

80,162

Solution 1

Use a character class in the regex delimiter to match on a set of possible delimiters.

my $string= "10:10:10, 12/1/2011";
my @string = split /[:,\s\/]+/, $string;

foreach(@string) {
    print "$_\n";
}

Explanation

  • The pair of slashes /.../ denotes the regular expression or pattern to be matched.

  • The pair of square brackets [...] denotes the character class of the regex.

  • Inside is the set of possible characters that can be matched: colons :, commas ,, any type of space character \s, and forward slashes \/ (with the backslash as an escape character).

  • The + is needed to match on 1 or more of the character immediately preceding it, which is the entire character class in this case. Without this, the comma-space would be considered as 2 separate delimiters, giving you an additional empty string in the result.

Solution 2

Wrong tool!

my $string = "10:10:10, 12/1/2011";
my @fields = $string =~ /([0-9]+)/g;

Solution 3

You can split on non-digits;

#!/usr/bin/perl
use strict;
use warnings;
use 5.014;

my $string= "10:10:10, 12/1/2011";
say for split /\D+/, $string;

Solution 4

If numbers are what you want, extract numbers:

my @numbers = $string =~ /\d+/g;
say for @numbers;

Capturing parentheses are not required, as specified in perlop:

The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

Solution 5

my $string= "10:10:10, 12/1/2011";

my @string = split(m[(?:firstpattern|secondpattern|thirdpattern)+], $string);

my @string = split(m[(?:/| |,|:)+], $string);

print join "\n", @string;
Share:
80,162
quinekxi
Author by

quinekxi

Updated on July 17, 2022

Comments

  • quinekxi
    quinekxi almost 2 years

    I want to split a string with multiple patterns:

    ex.

    my $string= "10:10:10, 12/1/2011";
    
    my @string = split(/firstpattern/secondpattern/thirdpattern/, $string);
    
    foreach(@string) {
        print "$_\n";
    }
    

    I want to have an output of:

    10
    10
    10
    12
     1
    2011
    

    What is the proper way to do this?

  • quinekxi
    quinekxi over 12 years
    Worked perfectly well! Thanks. Btw, do you mind to explain this code? /[:,\s\/]+/
  • quinekxi
    quinekxi over 12 years
    Thank you for the additional input, that simply explains everything! :D
  • TLP
    TLP over 12 years
    /| |,|: better written as [/ ,:]
  • Joel Berger
    Joel Berger over 12 years
    I hadn't known about the behavior you highlighted, thanks, and good for golf too!
  • Joel Berger
    Joel Berger over 12 years
    @TLP, is it? IIRC alternations get compiled into a trie internally, does a character class? Not saying you are wrong, really a question.
  • TLP
    TLP over 12 years
    @JoelBerger I don't know about the internals, but I think it's more readable. Here's a benchmark: perl -wE "use Benchmark qw(cmpthese); $a=qq(10:10:10, 12/1/2011); cmpthese(100000, { Piped => sub { my @r = split (m[(?:/| |,|:)+], $a); }, Class => sub { my @r = split (m[(?:[/ ,:])+], $a); } });" Piped 142450/s -- -27% // Class 194175/s 36% -- Looks like character class is 36% faster.
  • TLP
    TLP over 12 years
    Oops, didn't see that the m delimiter was brackets. Strange that it didn't complain. Well, with m##, the results go up to 45% faster.
  • quinekxi
    quinekxi over 12 years
    I didn't know I could use this kind of approach. Good thinking! Thank you so much!
  • TLP
    TLP over 12 years
    @quinekxi You're welcome. split is a very nice tool, but works best with uniform delimiters, I feel. In this case, the common element is numbers, so it's easier to work with them.
  • quinekxi
    quinekxi over 12 years
    Yes, I know, I'm sorry, I didn't know there is another approach on it.
  • quinekxi
    quinekxi over 12 years
    @TLP Yes, actually I used this approach but I didn't mark this as the answer just to comply on the original question. Anyway, thanks for your idea. I am glad I've got such great ideas from strangers you like.
  • TLP
    TLP over 12 years
    @quinekxi Many of my answers are not the solutions the OPs asked for, but the one I thought they really wanted. Your question was really "How do I best extract the numbers from this string?" So that's the answer you got. :)
  • quinekxi
    quinekxi over 12 years
    @TLP Sorry for giving a wrong question to the problem I have. I'm still on the learning process of being a programmer-wanna-be. I like you to be my mentor sort of. LOL, its just a comment. :D
  • TLP
    TLP over 12 years
    @quinekxi You're always welcome to post new questions. It is a good way to learn.
  • ikegami
    ikegami over 12 years
    @quinekxi, No need to apologise, you didn't do anything wrong. A good reply usually comes from considering the bigger picture. Questions are often too specific.
  • quinekxi
    quinekxi over 12 years
    Thanks though for giving me something to think of and consider another solution.
  • Jacob
    Jacob over 12 years
    Why are you linking to the 5.10.0 version of the page, instead of the version agnostic perldoc.perl.org/perlre.html#Metacharacters ?
  • reinierpost
    reinierpost over 12 years
    @Brad Gilbert: Because that was the first one Google gave me, and I'm using 5.10 myself, and portability can potentially be an issue, and I didn't realize there was a version-agnostic version. Thanks for supplying the link.
  • KingsInnerSoul
    KingsInnerSoul about 10 years
    I know it is an old thread, but I am wondering how I should add []() to the list of delimiters? It seems to get rid of the []() when I just add it there.
  • stevenl
    stevenl about 10 years
    @KingsInnerSoul, Add a backslash in front of each of those, just like I have for the slash above
  • James O'Brien
    James O'Brien over 9 years
    This answer is more general - it can be used with entire words as well