Replace multiple words in a string from a list of words

10,173

Solution 1

string cleaned = Regex.Replace(input, "\\b" + string.Join("\\b|\\b",BAD_WORDS) + "\\b", "")

Solution 2

This is a great task for Linq, and also the Split method. Try this:

return string.Join(" ", input.Split(' ').Where(w => !BAD_WORDS.Contains(w)));

Solution 3

You could use StartWith and EndsWith methods like:

while (input.Contains(w) || input.StartsWith(w) || input.EndsWith(w) || input.IndexOf(w) > 0)
{
   input = input.Replace(w, " ");
}

Hope this will fix your problem.

Solution 4

Put the fake space's before and after the string varaible input. That way it will detect the first and last words.

input = " " + input + " ";

 foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

Then trim the string:

input = input.Trim();

Solution 5

You can store words from text to one list. Then just check all words if they are in bad list, something like this :

List<string> myWords = input.Split(' ').ToList();
List<string> badWords = GetBadWords();

myWords.RemoveAll(word => badWords.Contains(word));
string Result = string.Join(" ", myWords);
Share:
10,173
Rafael Herscovici
Author by

Rafael Herscovici

Updated on June 27, 2022

Comments

  • Rafael Herscovici
    Rafael Herscovici almost 2 years

    i have a list of words:

    string[] BAD_WORDS = { "xxx", "o2o" } // My list is actually a lot bigger about 100 words

    and i have some text (usually short , max 250 words), which i need to REMOVE all the BAD_WORDS in it.

    i have tried this:

        foreach (var word in BAD_WORDS)
        {
            string w = string.Format(" {0} ", word);
            if (input.Contains(w))
            {
                while (input.Contains(w))
                {
                    input = input.Replace(w, " ");
                }
            }
        }
    

    but, if the text starts or ends with a bad word, it will not be removed. i did it with the spaces, so it will not match partial words for example "oxxx" should not be removed, since it is not an exact match to the BAD WORDS.

    anyone can give me advise on this?

  • Tudor
    Tudor over 11 years
    Don't you mean OR not AND? With your test it must simultaneously start, end and contain the word.
  • Rafael Herscovici
    Rafael Herscovici over 11 years
    that is a good idea, that will fix my code, but isnt there a nicer solution to this? the code seems a little weird do me, i wrote it because i had no other idea.
  • shannon
    shannon over 11 years
    Hold a moment, I missed something... working... There, fixed. :)
  • shannon
    shannon over 11 years
    Hee... :) Thanks Dementic. Do as I say, not as I do. I was only trying to say that all the nesting and LINQing and looping had a simple older/tried-and-true method.
  • Jon Hanna
    Jon Hanna over 11 years
    As long as spaces suffice. This won't catch the words at the start or end, if followed by a newline, if followed by punctuation etc. If that case needs to be dealt with, the regex-based answers will do a better job.
  • Jon Hanna
    Jon Hanna over 11 years
    +1 for catching words at start or other boundary conditions. As a bonus, if the replace needs to be done multiple times, the regex produced can be cached for repeated use. I'd use Regex.Escape though just in case BAD_WORDS contained something significant to the regex syntax.
  • Tim S.
    Tim S. over 11 years
    Maybe not perfect code as others have pointed out improvements, but +1 for using regex word boundaries instead of splitting.
  • Rafael Herscovici
    Rafael Herscovici over 11 years
    this will still catch partial words (badword = 'aoooo', actual word='aoooome', it will remove the 'aoooo'.
  • Rafael Herscovici
    Rafael Herscovici over 11 years
    You are trying to replace w which you have removed from the code. without the w, it will replace partial word matches also.
  • Professor Zoom
    Professor Zoom over 4 years
    This is adding extra spaces between words and I don't know why
  • James Ellis-Jones
    James Ellis-Jones over 4 years
    The empty string was being joined with a space on both sides to the other items. I've edited the answer (and it's now neater!)