Rails 3, check CSV file encoding before import

12,319

Solution 1

You can use Charlock Holmes, a character encoding detecting library for Ruby.

https://github.com/brianmario/charlock_holmes

To use it, you just read the file, and use the detect method.

contents = File.read('test.xml')
detection = CharlockHolmes::EncodingDetector.detect(contents)
# => {:encoding => 'UTF-8', :confidence => 100, :type => :text}

You can also convert the encoding to UTF-8 if it is not in the correct format:

utf8_encoded_content = CharlockHolmes::Converter.convert contents, detection[:encoding], 'UTF-8'

This saves users from having to do it themselves before uploading it again.

Solution 2

For 1.9 it's obvious, you just tell it to expect utf8 and it will raise an error if it isn't:

begin
  lines = CSV.read('bad.csv', :encoding => 'utf-8')
rescue ArgumentError
  puts "My users don't listen to me!"
end
Share:
12,319
alex.bour
Author by

alex.bour

Updated on June 05, 2022

Comments

  • alex.bour
    alex.bour almost 2 years

    In my app (Rails 3.0.5, Ruby 1.8.7), I created an import tool to import CSV data from file.

    Problem: I asked my users to export the CSV file from Excel in UTF-8 encoding but they don't do it most of time.

    How can I just verify if the file is UTF-8 before importing ? Else the import will run but give strange results. I use FasterCSV to import.

    Exemple of bad CSV file:

    ;VallÈe du RhÙne;CÙte Rotie;
    

    Thanks.

  • pguardiario
    pguardiario over 11 years
    No but to me string encodings is the biggest difference between 1.8 and 1.9 so it seems like 1.9 is what you want.
  • Afzal Masood
    Afzal Masood over 9 years
    If you are getting file directly from file_field_tag in a variable, let say that variable is @csv_file, in that case instead of {lines = CSV.read('bad.csv', :encoding => 'utf-8')} use {line = CSV.read(@csv_file.tempfile, encoding: 'utf-8')}
  • zx1986
    zx1986 over 7 years
    LOL! I like that puts "My users don't listen to me!"