C# - Regex Match whole words
15,994
Solution 1
If you were looking for all words including 'TEST', you should use
@"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]
Solution 2
Keep it simple: why not just try \w*TEST\w*
as the match pattern.
Solution 3
I get the results you are expecting with the following:
string s = @"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, @"(\w*TEST\w*)", RegexOptions.IgnoreCase);
Solution 4
Try using \b
. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /
, and the i
is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(@"\b[a-z]+\b", RegexOptions.IgnoreCase);
Author by
tvr
Updated on June 11, 2022Comments
-
tvr almost 2 years
I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING XYZ.YOUTESTED ANY.TESTING"; Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...); MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING YOUTESTED TESTING
But I get:
TESTING TESTED .TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
-
mousio about 13 yearsThis matches a 1-letter word, not both words.
-
tvr about 13 yearsHmm. How do I specify that? I tried this but doesn't work: Regex r = new Regex("\b(?<TM>[!\..]*TEST.*)\b", ...);
-
Brad Christie about 13 years@mousino: Indeed i did miss a quantifier, but will match both words.
-
Brad Christie about 13 years@tvr: Also, if you want only words starting with "TEST", use
\btest[a-z]+\b
, e.g. ideone.com/8KNQz -
tvr about 13 years@Brad Thanks for the sample code. This is small part of a larger regular expression and I cannot change now..
-
mousio about 13 years+1 for pointing me to a online mini IDE and debugging tool – and your first sentence was the best answer to the OP's original question
-
Brad Christie about 13 years@mousio: Sometimes less is more. ;-)
-
Brad Christie about 13 years@tvr: be aware
\w
matches [0-9a-zA-Z_]. If you don't want numbers or underscores, stick with\b
. -
Alan Moore about 13 yearsThe
Multiline
option isn't needed here, butIgnoreCase
might be. And regarding[!\..]*
, see my answer. -
Alan Moore about 13 years+1 for verbatim strings and a (probably) correct regex, but
RegexOptions.Multiline
serves no purpose here. -
manojlds about 13 yearsYes, but I was just going with the pattern provided by the OP. The other patterns provided are better.
-
Alan Moore about 13 years@Brad: It matches a lot more than that, but the important thing is that it doesn't match non-word characters.
-
Alan Moore about 13 yearsTake it from me: it's never a good idea to use regexes from a question without validating them. Or any code, for that matter. Or from other answers. I've been burned that way too many times. :-/
-
arcain about 13 years@alan Right you are, and now removed. That snuck in from my LINQPad script.
-
Alan Moore about 13 yearsYeah, RegexBuddy always sneaks that in, too. Very annoying.