String replace diacritics in C#

17,839

Solution 1

It seems you want to strip off diacritics and leave the base character. I'd recommend Ben Lings's solution here for this:

string input = "ŠĐĆŽ šđčćž";
string decomposed = input.Normalize(NormalizationForm.FormD);
char[] filtered = decomposed
    .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
    .ToArray();
string newString = new String(filtered);

Edit: Slight problem! It doesn't work for the Đ. The result is:

SĐCZ sđccz

Solution 2

Jon Skeet mentioned the following code on a newsgroup...

static string RemoveAccents (string input)
{
    string normalized = input.Normalize(NormalizationForm.FormKD);
    Encoding removal = Encoding.GetEncoding(Encoding.ASCII.CodePage,
                                            new EncoderReplacementFallback(""),
                                            new DecoderReplacementFallback(""));
    byte[] bytes = removal.GetBytes(normalized);
    return Encoding.ASCII.GetString(bytes);
}

EDIT

Maybe I am crazy, but I just ran the following...

Dim Input As String = "ŠĐĆŽ-šđčćž"
Dim Builder As New StringBuilder()

For Each Chr As Char In Input
    Builder.Append(Chr)
Next

Console.Write(Builder.ToString())

And the output was SDCZ-sdccz

Share:
17,839

Related videos on Youtube

ilija veselica
Author by

ilija veselica

Updated on June 04, 2022

Comments

  • ilija veselica
    ilija veselica almost 2 years

    I'd like to use this method to create user-friendly URL. Because my site is in Croatian, there are characters that I wouldn't like to strip but replace them with another. For example, this string:

    ŠĐĆŽ šđčćž

    needs to be:

    sdccz-sdccz

    So, I would like to make two arrays, one that will contain characters that are to be replaced and other array with replacement characters:

    string[] character = { "Š", "Đ", "Č", "Ć", "Ž", "š", "đ", "č", "ć", "ž" };
    string[] characterReplace = { "s", "d", "c", "c", "z", "s", "d", "c", "c", "z" };
    

    Finally, this two arrays should be use in some method that will take string, find matches and replace them. In php I used preg_replace function to deal with this. In C# this doesn't work:

    s = Regex.Replace(s, character, characterReplace);
    

    Would appreciate if someone could help.

  • ilija veselica
    ilija veselica about 14 years
    I get following error: 'string' does not contain a definition for 'Normalise' and no extension method 'Normalise' accepting a first argument of type 'string' could be found (are you missing a using directive or an assembly reference?)
  • Mark Byers
    Mark Byers about 14 years
    @ile: Apparently there was an error in the solution I copied this from. I have fixed it now. Unfortunately though this method fails for Đ, so either you will have to handle that case specially, or just do it the way you originally suggested.
  • Mark Byers
    Mark Byers about 14 years
    This removes the Đ completely.
  • ilija veselica
    ilija veselica about 14 years
    I see... but this is very simple solution and I will use this and use special method to replace Đ and đ. Thanks!
  • Timothy Baldridge
    Timothy Baldridge about 14 years
    But character arrays are not. Create a character array and modify the values in it.
  • Josh Stodola
    Josh Stodola about 14 years
    @Mark You are right, but see my edit, it is somewhat unbelievable
  • Ahmad Mageed
    Ahmad Mageed about 14 years
    hmm I tried that VB.NET code locally and I get the original string.
  • Josh Stodola
    Josh Stodola about 14 years
    @Ahmad I bet it is somehow related to localization settings. I must say that I was daunted when it produced the desired output.
  • user1713059
    user1713059 almost 5 years
    This also removes "ł" (polish).
  • ahaw
    ahaw over 3 years
    Id doesn't work for 'ł' and '€' characters

Related