How can I strip punctuation from a string?
Solution 1
new string(myCharCollection.Where(c => !char.IsPunctuation(c)).ToArray());
Solution 2
Why not simply:
string s = "sxrdct?fvzguh,bij."; var sb = new StringBuilder(); foreach (char c in s) { if (!char.IsPunctuation(c)) sb.Append(c); } s = sb.ToString();
The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...
Solution 3
Describes intent, easiest to read (IMHO) and best performing:
s = s.StripPunctuation();
to implement:
public static class StringExtension
{
public static string StripPunctuation(this string s)
{
var sb = new StringBuilder();
foreach (char c in s)
{
if (!char.IsPunctuation(c))
sb.Append(c);
}
return sb.ToString();
}
}
This is using Hades32's algorithm which was the best performing of the bunch posted.
Solution 4
Assuming "best" means "simplest" I suggest using something like this:
String stripped = input.replaceAll("\\p{Punct}+", "");
This example is for Java, but all sufficiently modern Regex engines should support this (or something similar).
Edit: the Unicode-Aware version would be this:
String stripped = input.replaceAll("\\p{P}+", "");
The first version only looks at punctuation characters contained in ASCII.
Solution 5
You can use the regex.replace method:
replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)
Since this returns a string, your method will look something like this:
string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");
You can replace "[?!]" with something more sophiticated if you want:
(\p{P})
This should find any punctuation.
Adam
Updated on July 05, 2022Comments
-
Adam almost 2 years
For the hope-to-have-an-answer-in-30-seconds part of this question, I'm specifically looking for C#
But in the general case, what's the best way to strip punctuation in any language?
I should add: Ideally, the solutions won't require you to enumerate all the possible punctuation marks.
Related: Strip Punctuation in Python
-
Adam over 15 yearsYup. It's powering the string operation I posted below.
-
Adam over 15 yearsI know; right? I hobby of mine is committing sins against code in Linq. But please, by all means, make it better.
-
Brian Low almost 14 yearsinteresting tidbit: the following are not punctuation: $^+|<>=
-
Tom Anderson over 13 yearsPlease seek psychiatric help.
-
Tom Anderson over 13 years+1 for using a unicode character class. Concise, precise, and nice.
-
Chris Marisic over 12 yearsWhy the
IEnumerable<char>
to array to bytes to string conversion, why not justnew String(s.ToArray())
? Or is that what new string will do under the hood anyway? -
Dermot over 11 yearsLinQ never ceases to amaze me.
-
Clément over 11 yearsThat's quadratic in the length in s; if you double the length, the code will be four times slower, because the + operator for string has to make a copy of the string :/
-
Saeed Neamati over 10 yearsBrilliant. Less is more.
-
Admin over 10 yearsNote that this approach also lets you replace punctuation with (for example) whitespace. Useful for tokenizing.
-
Stuart Dobson about 10 yearsdoesnt work on $ or ^, maybe more. I'm sticking with ^[a-zA-Z][a-zA-Z0-9]*$
-
dnennis over 8 yearsPhoneNumberTextBox.Text = new string(PhoneNumberTextBox.Text.Where(c => !char.IsPunctuation(c)).ToArray()).Replace(" ","");
-
Razvan Dumitru about 8 yearsfor $ or ^ u can use
!char.IsSymbol(c)
validation. just for the record -
JProgrammer over 6 yearsC# doesn't have the
Punct
class but it does haveP