Converting String which contains Turkish characters to lowercase
Solution 1
You may
1) First, remove the accents :
the following comes from this topic :
Is there a way to get rid of accents and convert a whole string to regular letters? :
Use java.text.Normalizer to handle this for you.
string = Normalizer.normalize(string, Normalizer.Form.NFD);
This will separate all of the accent marks from the characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.
string = string.replaceAll("[^\\p{ASCII}]", "");
If your text is in unicode, you should use this instead:
string = string.replaceAll("\\p{M}", "");
For unicode, \P{M} matches the base glyph and \p{M} (lowercase) matches each accent.
2) Then, just put the remaining String
to lower case
string = string.toLowerCase();
Solution 2
String testString = "İĞŞÇ";
System.out.println(testString);
Locale trlocale = new Locale("tr-TR");
testString = testString .toLowerCase(trlocale);
System.out.println(testString);
Works like a charm :)
Admin
Updated on June 04, 2022Comments
-
Admin almost 2 years
I want to convert a string which contains Turkish characters to lowercase with Turkish characters mapped into English equivalents i.e.
"İĞŞÇ"
->"igsc"
.When I use
toLowerCase(new Locale("en", "US"))
function it converts for exampleİ
toi
but with dotted.How can I solve this problem? (I'm using Java 7)
Thank you.
-
Alain BECKER about 2 yearsThe least I can say is that your solution is... not universal. When I try it, I get "iğşç", while the OP asked for "igsc"...