How can I strip punctuation from a string?

80,080

Solution 1

new string(myCharCollection.Where(c => !char.IsPunctuation(c)).ToArray());

Solution 2

Why not simply:

string s = "sxrdct?fvzguh,bij.";
var sb = new StringBuilder();

foreach (char c in s)
{
   if (!char.IsPunctuation(c))
      sb.Append(c);
}

s = sb.ToString();

The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...

Solution 3

Describes intent, easiest to read (IMHO) and best performing:

 s = s.StripPunctuation();

to implement:

public static class StringExtension
{
    public static string StripPunctuation(this string s)
    {
        var sb = new StringBuilder();
        foreach (char c in s)
        {
            if (!char.IsPunctuation(c))
                sb.Append(c);
        }
        return sb.ToString();
    }
}

This is using Hades32's algorithm which was the best performing of the bunch posted.

Solution 4

Assuming "best" means "simplest" I suggest using something like this:

String stripped = input.replaceAll("\\p{Punct}+", "");

This example is for Java, but all sufficiently modern Regex engines should support this (or something similar).

Edit: the Unicode-Aware version would be this:

String stripped = input.replaceAll("\\p{P}+", "");

The first version only looks at punctuation characters contained in ASCII.

Solution 5

You can use the regex.replace method:

 replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)

Since this returns a string, your method will look something like this:

 string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");

You can replace "[?!]" with something more sophiticated if you want:

(\p{P})

This should find any punctuation.

Share:
80,080
Adam
Author by

Adam

Updated on July 05, 2022

Comments

  • Adam
    Adam almost 2 years

    For the hope-to-have-an-answer-in-30-seconds part of this question, I'm specifically looking for C#

    But in the general case, what's the best way to strip punctuation in any language?

    I should add: Ideally, the solutions won't require you to enumerate all the possible punctuation marks.

    Related: Strip Punctuation in Python

  • Adam
    Adam over 15 years
    Yup. It's powering the string operation I posted below.
  • Adam
    Adam over 15 years
    I know; right? I hobby of mine is committing sins against code in Linq. But please, by all means, make it better.
  • Brian Low
    Brian Low almost 14 years
    interesting tidbit: the following are not punctuation: $^+|<>=
  • Tom Anderson
    Tom Anderson over 13 years
    Please seek psychiatric help.
  • Tom Anderson
    Tom Anderson over 13 years
    +1 for using a unicode character class. Concise, precise, and nice.
  • Chris Marisic
    Chris Marisic over 12 years
    Why the IEnumerable<char> to array to bytes to string conversion, why not just new String(s.ToArray())? Or is that what new string will do under the hood anyway?
  • Dermot
    Dermot over 11 years
    LinQ never ceases to amaze me.
  • Clément
    Clément over 11 years
    That's quadratic in the length in s; if you double the length, the code will be four times slower, because the + operator for string has to make a copy of the string :/
  • Saeed Neamati
    Saeed Neamati over 10 years
    Brilliant. Less is more.
  • Admin
    Admin over 10 years
    Note that this approach also lets you replace punctuation with (for example) whitespace. Useful for tokenizing.
  • Stuart Dobson
    Stuart Dobson about 10 years
    doesnt work on $ or ^, maybe more. I'm sticking with ^[a-zA-Z][a-zA-Z0-9]*$
  • dnennis
    dnennis over 8 years
    PhoneNumberTextBox.Text = new string(PhoneNumberTextBox.Text.Where(c => !char.IsPunctuation(c)).ToArray()).Replace(" ","");
  • Razvan Dumitru
    Razvan Dumitru about 8 years
    for $ or ^ u can use !char.IsSymbol(c) validation. just for the record
  • JProgrammer
    JProgrammer over 6 years
    C# doesn't have the Punct class but it does have P