How to replace special characters with their equivalent (such as " á " for " a") in C#?

12,628

Solution 1

You could try something like

var decomposed = "áéö".Normalize(NormalizationForm.FormD);
var filtered = decomposed.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
var newString = new String(filtered.ToArray());

This decomposes accents from the text, filters them and creates a new string. Combining diacritics are in the Non spacing mark unicode category.

Solution 2

string text = {text to replace characters in};

Dictionary<char, char> replacements = new Dictionary<char, char>();

// add your characters to the replacements dictionary, 
// key: char to replace
// value: replacement char

replacements.Add('ç', 'c');
...

System.Text.StringBuilder replaced = new System.Text.StringBuilder();
for (int i = 0; i < text.Length; i++)
{
    char character = text[i];
    if (replacements.ContainsKey(character))
    {
        replaced.Append(replacements[character]);
    }
    else
    {
        replaced.Append(character);
    }
}

// 'replaced' is now your converted text

Solution 3

For future reference, this is exactly what I ended up with:

temp = stringToConvert.Normalize(NormalizationForm.FormD);
            IEnumerable<char> filtered = temp;
            filtered = filtered.Where(c => char.GetUnicodeCategory(c) != System.Globalization.UnicodeCategory.NonSpacingMark);
            final = new string(filtered.ToArray());
Share:
12,628
jehuty
Author by

jehuty

Brazilian Graphics Software Developer working in the VFX industry in London.

Updated on June 28, 2022

Comments

  • jehuty
    jehuty almost 2 years

    I need to get the Portuguese text content out of an Excel file and create an xml which is going to be used by an application that doesn't support characters such as "ç", "á", "é", and others. And I can't just remove the characters, but replace them with their equivalent ("c", "a", "e", for example).

    I assume there's a better way to do it than check each character individually and replace it with their counterparts. Any suggestions on how to do it?

  • Gertjan
    Gertjan about 14 years
    Though it is the most simple solution (maybe not the most elegant) it does exactly what youwant. Would be nicer if you created a reusable function (with a static list of replacements). One downside of this approach is that you need to know ALL the possible characters you want to replace and have to add both upper and lowercase characters to the list (and that might take some trial and error). Also you are likely to make errors when copying the add statements to create new items (for example forget to replace one of the strings) which might cause confusion when errors occur.
  • binball
    binball about 12 years
    Hi Ben, thank you for the snippet but it doesn't handle well chars Ł and ł (keep them as original instead change to L and l).
  • AgentFire
    AgentFire almost 5 years
    It doesn't do anything with æ as well.