How do I extract words from a comma-delimited string in Perl?

10,785

Solution 1

Why not use the split function:

@parts = split(/,/,$myline);

split splits a string into a list of strings using the regular expression you supply as a separator.

Solution 2

Isn't it easier to use my @parts = split(/,/, $myline) ?

Solution 3

Although split is a good way to solve your problem, a capturing regex in list context also works well. It's useful to know about both approaches.

my $line = 'ca,cb,cc,cd,ce';
my @words = $line =~ /(\w+)/g;

Solution 4

Look into the CSV PM's you can download from CPAN, i.e. Text::CSV or Text::CSV_XS.

This will get you what you need and also account for any comma seperated values that happen to be quoted.

Using these modules make it easy to split the data out and parse through it...

For example:

my @field = $csv->fields;

Solution 5

If the number of elements is variable, then you're not going to do it in the way you're aiming for. Loop through the string using the global flag:

while($myline =~ /(\w+)\b/g) {
    # do something with $1
}

I am going to guess that your real data is more complex than 'ca,cb,cc,cd,ce', however if it isn't then the use of regular expressions probably isn't warranted. You'd be better off splitting the string on the delimiting character:

my @things = split ',', $myline;
Share:
10,785
Dmytro Leonenko
Author by

Dmytro Leonenko

Updated on June 11, 2022

Comments

  • Dmytro Leonenko
    Dmytro Leonenko almost 2 years

    I have a line:

    $myline = 'ca,cb,cc,cd,ce';
    

    I need to match ca into $1, cb into $2, etc..

    Unfortunately

    $myline =~ /(?:(\w+),?)+/;
    

    doesn't work. With pcretest it only matches 'ce' into $1. How to do it right?
    Do I need to put it into the while loop?

  • Dmytro Leonenko
    Dmytro Leonenko over 14 years
    You're right. It's much better to use split in my case. Why don't I thought of it?
  • Adam Bellaire
    Adam Bellaire over 14 years
    A notable difference is that split will preserve empty entries, giving undef in spots where commas are adjacent. The regex method ignores these places as they don't contain one or more word characters.