How to convert a string to UTF8 in Ruby
99,763
Solution 1
Your string seems to have been encoded the wrong way round:
"Développement".encode("iso-8859-1").force_encoding("utf-8")
#=> "Développement"
Solution 2
Seems your string thinks it is UTF-8, but in reality, it is something else, probably ISO-8859-1.
Define (force) the correct encoding first, then convert it to UTF-8.
In your example:
puts "Développement".encode('iso-8859-1').encode('utf-8')
An alternative is:
puts "\xC3".force_encoding('iso-8859-1').encode('utf-8') #-> Ã
If the Ã
makes no sense, then try another encoding.
Solution 3
"ruby 1.9: invalid byte sequence in UTF-8" described another good approach with less code:
file_contents.encode!('UTF-16', 'UTF-8')
Author by
ciembor
Updated on February 28, 2020Comments
-
ciembor about 4 years
I'm writing a crawler which uses Hpricot. It downloads a list of strings from some webpage, then I try to write it to the file. Something is wrong with the encoding:
"\xC3" from ASCII-8BIT to UTF-8
I have items which are rendered on a webpage and printed this way:
Développement
the
str.encoding
returnsUTF-8
, soforce_encoding('UTF-8')
doesn't help. How may I convert this to readable UTF-8? -
ciembor almost 11 yearsIt works good for most of cases. But sometimes:
U+201C from UTF-8 to ISO-8859-1 in CIDEM / ACC1Ó
U+20AC from UTF-8 to ISO-8859-1 in Citi’s Sustainable Development Investments
it doesn't. Also some names are converted but wrong and I can't seed it in a database withincomplete multibyte character
error message -
Stefan almost 11 yearsSorry, this was not meant as a fix. You should fix the problem by setting/detecting the correct encoding when reading the strings into your app.
-
Todd about 6 yearsThere is also the option of using
Encoding::UTF_8
instead of using more memory for the"utf-8"
string literal (or any other encoding string). -
Lucas Andrade over 5 yearsWorks for pdfs created with Wicked PDF gem