Regex expression to match whole word with special characters not working ?

16,568

Solution 1

If you have non-word characters then you cannot use \b. You can use the following

@"(?<=^|\s)" + pattern + @"(?=\s|$)"

Edit: As Tim mentioned in comments, your regex is failing precisely because \b fails to match the boundary between % and the white-space next to it because both of them are non-word characters. \b matches only the boundary between word character and a non-word character.

See more on word boundaries here.

Explanation

@"
(?<=        # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
               # Match either the regular expression below (attempting the next alternative only if this one fails)
      ^           # Assert position at the beginning of the string
   |           # Or match regular expression number 2 below (the entire group fails if this one fails to match)
      \s          # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
)
temp%       # Match the characters “temp%” literally
(?=         # Assert that the regex below can be matched, starting at this position (positive lookahead)
               # Match either the regular expression below (attempting the next alternative only if this one fails)
      \s          # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   |           # Or match regular expression number 2 below (the entire group fails if this one fails to match)
      $           # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"

Solution 2

If the pattern can contain characters that are special to Regex, run it through Regex.Escape first.

This you did, but do not escape the string that you search through - you don't need that.

Solution 3

output = Regex.Replace(output, "(?<!\w)-\w+", "")
output = Regex.Replace(output, " -"".*?""", "")
Share:
16,568

Related videos on Youtube

Gurucharan Balakuntla Maheshku
Author by

Gurucharan Balakuntla Maheshku

“My objective is to provide solutions which can change the way people think and have a great impact on them” Software Engineering and Technology leader with 14+ years of experience in the IT industry leading and managing mid-scale to large scale digital systems (as Project Management, Product Management, People Management) across cross-functional teams and delivering highly performant, scalable, and secure systems. Well-versed in managing geographically distributed diverse teams across startups, MNC's and Enterprises. Deep knowledge in ​web architecture ​and ​technologies ​to solve challenging and complex business problems. Vast experience in recruiting and growing a team, instilling the right culture, motivating and retaining the best talents. I have served in diverse domains like Digital Web Apps, eCommerce, Payments, Enterprise applications, CRM, etc. with strong ​product and project management abilities.

Updated on June 04, 2022

Comments

  • Gurucharan Balakuntla Maheshku
    Gurucharan Balakuntla Maheshku almost 2 years

    I was going through this question C#, Regex.Match whole words

    It says for match whole word use "\bpattern\b" This works fine for match whole word without any special characters since it is meant for word characters only!

    I need an expression to match words with special characters also. My code is as follows

    class Program
    {
        static void Main(string[] args)
        {
            string str = Regex.Escape("Hi temp% dkfsfdf hi");
            string pattern = Regex.Escape("temp%");
            var matches = Regex.Matches(str, "\\b" + pattern + "\\b" , RegexOptions.IgnoreCase);
            int count = matches.Count;
        }
    }
    

    But it fails because of %. Do we have any workaround for this? There can be other special characters like 'space','(',')', etc

  • Tim Pietzcker
    Tim Pietzcker over 12 years
    True, but not the (only) reason for his problem.
  • Tim Pietzcker
    Tim Pietzcker over 12 years
    More exactly, if you have non-alphanumeric characters are the start or end of your search word, you can't use \b because that anchor matches between an alnum character and a non-alnum character.
  • Gurucharan Balakuntla Maheshku
    Gurucharan Balakuntla Maheshku over 12 years
    @Yadala - Simply superb ! Its almost there except that it has one problem. Assume string is "Hi this is stackoverflow" and pattern is "this " , then it says no matches. This happens because of an empty space after the actual string in pattern. How can we handle this ? Ideally speaking it should say one match found !
  • Narendra Yadala
    Narendra Yadala over 12 years
    @GuruC If you have white-space in your search string, how can it still be whole word search? I just verified this in Notepad++, if I select Whole word search and search for "this " in "Hi this is stackoverflow"..it does not give any matches.
  • Gurucharan Balakuntla Maheshku
    Gurucharan Balakuntla Maheshku over 12 years
    @Yadala - But I want the behavior that way. Even in notepad++ if you use option regular expression in search, it matches spaces also. You can use something like visual studio to check the behavior which works superb.
  • Narendra Yadala
    Narendra Yadala over 12 years
    @GuruC Whole words are bound by some character, generally it is white space in the start and the end. You need to define your whole word boundaries clearly before proceeding. You cannot select both Whole Word and Regular Expression options simultaneously in Notepad++ search.