Replace multiple words in a string from a list of words
Solution 1
string cleaned = Regex.Replace(input, "\\b" + string.Join("\\b|\\b",BAD_WORDS) + "\\b", "")
Solution 2
This is a great task for Linq, and also the Split method. Try this:
return string.Join(" ", input.Split(' ').Where(w => !BAD_WORDS.Contains(w)));
Solution 3
You could use StartWith and EndsWith methods like:
while (input.Contains(w) || input.StartsWith(w) || input.EndsWith(w) || input.IndexOf(w) > 0)
{
input = input.Replace(w, " ");
}
Hope this will fix your problem.
Solution 4
Put the fake space's before and after the string varaible input
. That way it will detect the first and last words.
input = " " + input + " ";
foreach (var word in BAD_WORDS)
{
string w = string.Format(" {0} ", word);
if (input.Contains(w))
{
while (input.Contains(w))
{
input = input.Replace(w, " ");
}
}
}
Then trim the string:
input = input.Trim();
Solution 5
You can store words from text to one list. Then just check all words if they are in bad list, something like this :
List<string> myWords = input.Split(' ').ToList();
List<string> badWords = GetBadWords();
myWords.RemoveAll(word => badWords.Contains(word));
string Result = string.Join(" ", myWords);
Rafael Herscovici
Updated on June 27, 2022Comments
-
Rafael Herscovici almost 2 years
i have a list of words:
string[] BAD_WORDS = { "xxx", "o2o" } // My list is actually a lot bigger about 100 words
and i have some text (usually short , max 250 words), which i need to REMOVE all the
BAD_WORDS
in it.i have tried this:
foreach (var word in BAD_WORDS) { string w = string.Format(" {0} ", word); if (input.Contains(w)) { while (input.Contains(w)) { input = input.Replace(w, " "); } } }
but, if the text starts or ends with a bad word, it will not be removed. i did it with the spaces, so it will not match partial words for example "oxxx" should not be removed, since it is not an exact match to the BAD WORDS.
anyone can give me advise on this?
-
Tudor over 11 yearsDon't you mean OR not AND? With your test it must simultaneously start, end and contain the word.
-
Rafael Herscovici over 11 yearsthat is a good idea, that will fix my code, but isnt there a nicer solution to this? the code seems a little weird do me, i wrote it because i had no other idea.
-
shannon over 11 yearsHold a moment, I missed something... working... There, fixed. :)
-
shannon over 11 yearsHee... :) Thanks Dementic. Do as I say, not as I do. I was only trying to say that all the nesting and LINQing and looping had a simple older/tried-and-true method.
-
Jon Hanna over 11 yearsAs long as spaces suffice. This won't catch the words at the start or end, if followed by a newline, if followed by punctuation etc. If that case needs to be dealt with, the regex-based answers will do a better job.
-
Jon Hanna over 11 years+1 for catching words at start or other boundary conditions. As a bonus, if the replace needs to be done multiple times, the regex produced can be cached for repeated use. I'd use
Regex.Escape
though just in caseBAD_WORDS
contained something significant to the regex syntax. -
Tim S. over 11 yearsMaybe not perfect code as others have pointed out improvements, but +1 for using regex word boundaries instead of splitting.
-
Rafael Herscovici over 11 yearsthis will still catch partial words (badword = 'aoooo', actual word='aoooome', it will remove the 'aoooo'.
-
Rafael Herscovici over 11 yearsYou are trying to replace
w
which you have removed from the code. without thew
, it will replace partial word matches also. -
Professor Zoom over 4 yearsThis is adding extra spaces between words and I don't know why
-
James Ellis-Jones over 4 yearsThe empty string was being joined with a space on both sides to the other items. I've edited the answer (and it's now neater!)