Using regex to match multiple comma separated words

24,319

Solution 1

This regex

,?[a-zA-Z][a-zA-Z0-9]*,?

Matches 'words' optionally enclose with commas. No spaces between commas and the 'word' are permitted and the word must start with an alphanumeric.

See here for a demo.

To ascertain that at least one comma is matched, use the alternation syntax:

(,[a-zA-Z][a-zA-Z0-9]*|[a-zA-Z][a-zA-Z0-9]*,)

Unfortunately no regex engine that i am aware of supports cascaded matching. However, since you usually operate with regexen in the context of programming environments, you could repeatedly match against a regex and take the matched substring for further matches. This can be achieved by chaining or iterated function calls using speical delimiter chars (which must be guaranteed not to occur in the test strings).

Example (Javascript):

"red, 1 ,yellow, 4, red1, 1yellow yellow"
    .replace(/(,?[a-zA-Z][a-zA-Z0-9]*,?)/g, "<$1>")
        .replace(/<[^,>]+>/g, "")
            .replace(/>[^>]+(<|$)/g, "> $1")
                 .replace(/^[^<]+</g, "<")

In this example, the (simple) regex is tested for first. The call returns a sequence of preliminary matches delimted by angle brackets. Matches that do not contain the required substring (, in this case) are eliminated, as is all intervening material.

This technique might produce code that is easier to maintain than a complicated regex.

However, as a rule of thumb, if your regex gets too complicated to be easily maintained, a good guess is that it hasn't been the right tool in the first place (Many engines provide the x matching modifier that allows you to intersperse whitespace - namely line breaks and spaces - and comments at will).

Solution 2

The issue with your expression is that: - \w resolves to this: [a-zA-Z0-9_]. This includes numeric data which you do not want. - You have the comma at the end, this will match foo, but not ,foo.

To fix this, you can do something like so: (,\s*[a-z]+)|([a-z]+\s*,). An example is available here.

Share:
24,319
user3153443
Author by

user3153443

Updated on June 11, 2020

Comments

  • user3153443
    user3153443 about 4 years

    I am trying to find the appropriate regex pattern that allows me to pick out whole words either starting with or ending with a comma, but leave out numbers. I've come up with ([\w]+,) which matches the first word followed by a comma, so in something like:

    red,1,yellow,4

    red, will match, but I am trying to find a solution that will match like like the following:

    red, 1 ,yellow, 4

    I haven't been able to find anything that can break strings up like this, but hopefully you'll be able to help!