C# - Regex Match whole words

15,994

Solution 1

If you were looking for all words including 'TEST', you should use

@"(?<TM>\w*TEST\w*)"

\w includes word characters and is short for [A-Za-z0-9_]

Solution 2

Keep it simple: why not just try \w*TEST\w* as the match pattern.

Solution 3

I get the results you are expecting with the following:

string s = @"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";

var m = Regex.Matches(s, @"(\w*TEST\w*)", RegexOptions.IgnoreCase);

Solution 4

Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:

/\b[a-z]+\b/i

BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.

.NET Alternative:

var re = new Regex(@"\b[a-z]+\b", RegexOptions.IgnoreCase);
Share:
15,994
tvr
Author by

tvr

Updated on June 11, 2022

Comments

  • tvr
    tvr almost 2 years

    I need to match all the whole words containing a given a string.

    string s = "ABC.MYTESTING
    XYZ.YOUTESTED
    ANY.TESTING";
    
    Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
    MatchCollection mc = r.Matches(s);
    

    I need the result to be:

    MYTESTING
    YOUTESTED
    TESTING
    

    But I get:

    TESTING
    TESTED
    .TESTING
    

    How do I achieve this with Regular expressions.

    Edit: Extended sample string.

  • mousio
    mousio about 13 years
    This matches a 1-letter word, not both words.
  • tvr
    tvr about 13 years
    Hmm. How do I specify that? I tried this but doesn't work: Regex r = new Regex("\b(?<TM>[!\..]*TEST.*)\b", ...);
  • Brad Christie
    Brad Christie about 13 years
    @mousino: Indeed i did miss a quantifier, but will match both words.
  • Brad Christie
    Brad Christie about 13 years
    @tvr: Also, if you want only words starting with "TEST", use \btest[a-z]+\b, e.g. ideone.com/8KNQz
  • tvr
    tvr about 13 years
    @Brad Thanks for the sample code. This is small part of a larger regular expression and I cannot change now..
  • mousio
    mousio about 13 years
    +1 for pointing me to a online mini IDE and debugging tool – and your first sentence was the best answer to the OP's original question
  • Brad Christie
    Brad Christie about 13 years
    @mousio: Sometimes less is more. ;-)
  • Brad Christie
    Brad Christie about 13 years
    @tvr: be aware \w matches [0-9a-zA-Z_]. If you don't want numbers or underscores, stick with \b.
  • Alan Moore
    Alan Moore about 13 years
    The Multiline option isn't needed here, but IgnoreCase might be. And regarding [!\..]*, see my answer.
  • Alan Moore
    Alan Moore about 13 years
    +1 for verbatim strings and a (probably) correct regex, but RegexOptions.Multiline serves no purpose here.
  • manojlds
    manojlds about 13 years
    Yes, but I was just going with the pattern provided by the OP. The other patterns provided are better.
  • Alan Moore
    Alan Moore about 13 years
    @Brad: It matches a lot more than that, but the important thing is that it doesn't match non-word characters.
  • Alan Moore
    Alan Moore about 13 years
    Take it from me: it's never a good idea to use regexes from a question without validating them. Or any code, for that matter. Or from other answers. I've been burned that way too many times. :-/
  • arcain
    arcain about 13 years
    @alan Right you are, and now removed. That snuck in from my LINQPad script.
  • Alan Moore
    Alan Moore about 13 years
    Yeah, RegexBuddy always sneaks that in, too. Very annoying.