How do I strip non-alphanumeric characters (including spaces) from a string?

44,700

Solution 1

In your regex, you have excluded the spaces from being matched (and you haven't used Regex.Replace() which I had overlooked completely...):

result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");

should work. The + makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.

If you want to keep non-ASCII letters/digits, too, use the following regex:

@"[^\p{L}\p{N}]+"

which leaves

BonjourmesélèvesGutenMorgenliebeSchüler

instead of

BonjourmeslvesGutenMorgenliebeSchler

Solution 2

You can use Linq to filter out required characters:

  String source = "Hello there(hello#)";

  // "Hellotherehello"
  String result = new String(source
    .Where(ch => Char.IsLetterOrDigit(ch))
    .ToArray());

Or

  String result = String.Concat(source
    .Where(ch => Char.IsLetterOrDigit(ch)));  

And so you have no need in regular expressions.

Solution 3

Or you can do this too:

    public static string RemoveNonAlphanumeric(string text)
    {
        StringBuilder sb = new StringBuilder(text.Length);

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                sb.Append(text[i]);
        }

        return sb.ToString();
    }

Usage:

string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ñ $ 123 ٠١٢٣٤");

//text: textLaLalol123

Solution 4

The mistake made above was using Replace incorrectly (it doesn't take regex, thanks CodeInChaos).

The following code should do what was specified:

Regex reg = new Regex(@"[^\p{L}\p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");

This gives:

regexed = "Hellotherehello"

Solution 5

And as a replace operation as an extension method:

public static class StringExtensions
{
    public static string ReplaceNonAlphanumeric(this string text, char replaceChar)
    {
        StringBuilder result = new StringBuilder(text.Length);

        foreach(char c in text)
        {
            if(c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                result.Append(c);
            else
                result.Append(replaceChar);
        }

        return result.ToString();
    } 
}

And test:

[TestFixture]
public sealed class StringExtensionsTests
{
    [Test]
    public void Test()
    {
        Assert.AreEqual("text_LaLa__lol________123______", "text LaLa (lol) á ñ $ 123 ٠١٢٣٤".ReplaceNonAlphanumeric('_'));
    }
}
Share:
44,700

Related videos on Youtube

James
Author by

James

I work in Mixed Reality at Microsoft.

Updated on July 09, 2022

Comments

  • James
    James almost 2 years

    How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?

    I want to keep a-z, A-Z, 0-9 and nothing more (not even " " spaces).

    "Hello there(hello#)".Replace(regex-i-want, "");
    

    should give

    "Hellotherehello"
    

    I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", ""); but the spaces remain.

    • CodesInChaos
      CodesInChaos over 12 years
      How about first defining what exactly you mean by alpha numeric? Do you just want A-Z,a-z,0-9? Unicode has plenty more letters and numbers.
    • Anders Abel
      Anders Abel over 12 years
      With that edit, it looks much better - taking back my minus vote.
    • CodesInChaos
      CodesInChaos over 12 years
      Why do you have a space in your bracket? And string.Replace doesn't take a regex in the first place.
    • CodesInChaos
      CodesInChaos over 12 years
      Just to be absolutely clear: You don't want a letter like ä either?
    • James
      James over 12 years
      I answered my question taking your tips into account (see below).
  • James
    James over 12 years
    I tried this...it's very close but it seems to leave spaces in - I want them stripped too! Thanks.
  • Tim Pietzcker
    Tim Pietzcker over 12 years
    No, it doesn't. Unless you have special spaces in there like non-breakable space ASCII 160 (and the second version correctly removes those, too).
  • James
    James over 12 years
    Hmmm I tried the following: string t = "hello there - ( efrwef )"; string a = "New: " + t.Replace(@"[^\p{L}\p{N}]+", ""); and a ends up being "hello there - ( efrwef )" - completely unchanged - I know I'm doing something wrong here.
  • CodesInChaos
    CodesInChaos over 12 years
    string.Replace doesn't take a regex.
  • James
    James over 12 years
    AHHH that would explain all. So, how could I do what is described above with regex bits and pieces in C#?
  • CodesInChaos
    CodesInChaos over 12 years
    While I like the general approach, it doesn't fit the requirement of only allowing A-Z,a-z,0-9. It allows other letters and digits too.
  • CodesInChaos
    CodesInChaos over 12 years
    There is a regex class you can use.
  • CodesInChaos
    CodesInChaos over 12 years
    There are more than 10 digits in unicode too. ٠١٢٣٤ are some examples.
  • CodesInChaos
    CodesInChaos over 12 years
    Sorry, but it's still wrong. ToLower uses the current locale. So when you run in in Turkey, it won't allow I, but allows İ instead. en.wikipedia.org/wiki/Dotted_and_dotless_I
  • Adrianne
    Adrianne over 12 years
    @CodeInChaos wow... guess my laziness took me to do that. Fixed :)
  • ForceMagic
    ForceMagic over 11 years
    Welcome on SO. A little explanation always make your answer more valuable. On SO, people tend to like to know why, instead of just how. ;)
  • PostureOfLearning
    PostureOfLearning over 10 years
    'string.Replace()' does not take regex as an argument
  • K D
    K D over 10 years
    @PostureOfLearning Thank you for your remark but you should look at the question.. the quesiton is not about the replace method it is about the Regex. the usage of method is copied from the question it self provided with helpful regex. Kindly take back your vote :)
  • PostureOfLearning
    PostureOfLearning over 10 years
    I understand the question and I realize that the question also has invalid code. However, I accept invalid code in a question since they are trying to learn, but I find incorrect code in an answer not acceptable. It is an answer and should work. Your answer lead me in the wrong direction when looking to solve my own problem. Having said this, if you want to change it I'll be happy to take back the vote ;)
  • Marc L.
    Marc L. about 8 years
    Do yourself and SO a favor and remove this.
  • Marc L.
    Marc L. about 8 years
    Great addition! Would be interesting to know the relative performance of this to the Regex solution. Out of the gate, it reads a lot better.
  • Marc L.
    Marc L. about 8 years
    A quick test in LinqPad suggests there's negligible difference between this and even a compiled Regex solution. Readability wins for me.
  • Will Croxford
    Will Croxford about 6 years
    Looks really neat and readable, if performance same, I'm using it thanks. NB for new programmers like me, this means you need to add the line using System.Linq; at the top of the file for the C# compiler to recognise method Where.