Convert Latin 1 encoded UTF8 to Unicode

c# .net encoding

15,972

Encoding.UTF8.GetString(Encoding.GetEncoding("iso-8859-1").GetBytes(s))

Now you have a normal Unicode string containing Cyrillic.

Note that it is possible that your ‘Latin-1’ misencoded string might actually be a ‘Windows codepage 1252’ misencoded string; I can't tell from the given example as it doesn't use any of the characters that are different between the two encodings. If this is the case use GetEncoding(1252) instead.

Also this is assuming that it's the contents of the database at fault. If the database is supposed to be storing UTF-8 strings but you're pulling them out as if they were Latin-1 (or codepage 1252 due to that being the system codepage) then really you need to reconfigure your data access layer to set the right encoding. If you're using SQL Server, better to start using NVARCHAR.

15,972

Author by

Admin

Updated on June 25, 2022

Comments

Admin almost 2 years

I came upon trying to convert a database that is encoded in UTF8 from what it looks like, into a windows 1251 encoding (dont ask, but I need to do this). All of the Russian, encoded characters in the db show up as Ð°Ð±Ð²Ð³Ð´Ð. When I pull them out of the db into my C# app, into strings, I still see Ð°Ð±Ð²Ð³Ð´Ð. No matter what I try to do to interpret this string as UTF8 encoded string, it seems to be interpreted as latin1 single byte string, and I do not see my text show up as russian. What I basically need to do is convert this latin1 looking-utf8 encoded string into Unicode, so that I can convert it later to 1251, but I have not been able to do this successfully. Anyone got any ideas?
o3o almost 11 years

getBytes(s)) should be GetBytes(s))
Alreadytakenindeed over 4 years

You sir, are pure gold with that "better to start using NVARCHAR", saved me tons of time searching for how to encode/decode strings or alter database collation. Live long and prosper!!!