Pandas.read_csv() with special characters (accents) in column names �

68,011

Solution 1

I found the same problem with spanish, solved it with with "latin1" encoding:

import pandas as pd

 pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";", encoding='latin1')

Hope it helps!

Solution 2

You can change the encoding parameter for read_csv, see the pandas doc here. Also the python standard encodings are here.

I believe for your example you can use the utf-8 encoding (assuming that your language is French).

df = pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='utf-8')

Here's an example showing some sample output. All I did was make a csv file with one column, using the problem characters.

df = pd.read_csv('sample.csv', encoding='utf-8')

Output:

    IAS_lissé
0   1
1   2
2   3

Solution 3

Try using:

import pandas as pd    
df = pd.read_csv('file_name.csv', encoding='utf-8-sig')

Solution 4

Using utf-8 didn't work for me. E.g. this piece of code:

    bla = pd.DataFrame(data = [1, 2])
    bla.to_csv('funkyNamé , things.csv')
    blabla = pd.read_csv('funkyNamé , things.csv', delimiter=";", encoding='utf-8')
    blabla 

Ultimately returned: OSError: Initializing from file failed

I know you said you didn't want to modify the file. If you meant the file content vs the filename, I would rename the file to something without an accent, read the csv file under its new name, then reset the filename back to its original name.

    originalfilepath = r'C:\Users\myself\\funkyNamé , things.csv'
    originalfolder = r'C:\Users\myself'
    os.rename(originalfilepath, originalFolder+"\\tempName.csv")
    df = pd.read_csv(originalFolder+"\\tempName.csv", encoding='ISO-8859-1')
    os.rename(originalFolder+"\\tempName.csv", originalfilepath)

If you did mean "without modifying the filename, my apologies for not being helpful to you, and I hope this helps someone else.

Share:
68,011

Related videos on Youtube

farhawa
Author by

farhawa

I am a junior python developer. I am using python for machine-learning issues

Updated on January 29, 2021

Comments

  • farhawa
    farhawa over 3 years

    I have a csv file that contains some data with columns names:

    • "PERIODE"
    • "IAS_brut"
    • "IAS_lissé"
    • "Incidence_Sentinelles"

    I have a problem with the third one "IAS_lissé" which is misinterpreted by pd.read_csv() method and returned as �.

    What is that character?

    Because it's generating a bug in my flask application, is there a way to read that column in an other way without modifying the file?

    In [1]: import pandas as pd
    
    In [2]: pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";").columns
    
    Out[2]: Index([u'PERIODE', u'IAS_brut', u'IAS_liss�', u'Incidence_Sentinelles'], dtype='object')
    
    • Sohier Dane
      Sohier Dane over 7 years
      Looks like Pandas can't handle unicode characters in the column names. Try converting the column names to ascii. Note that you'll lose the accent.
    • bsplosion
      bsplosion almost 4 years
      The comment above is not true and wasn't true as of its posting - see any of the answers below for the proper way to handle non-ASCII (generally by setting encoding to utf-8 or latin1).
  • farhawa
    farhawa over 7 years
    Oups ! I got an error ( the same one ) 'utf8' codec can't decode byte 0xe9 in position 8: unexpected end of data
  • Kartik
    Kartik over 7 years
    That is because your data is not encoded to utf-8. Try latin1: pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='latin1')...
  • shawnheide
    shawnheide over 7 years
    Yeah, latin1 is what I had at first, but changed it to utf-8. @farhawa, if you want a better answer you can post your csv or a sample of it with the header so that we know what your encoding is.
  • Abhishek Pansotra
    Abhishek Pansotra almost 6 years
    Thanks..encoding 'ISO-8859-1' worked for me. My data had pound sign, semi colons etc.
  • Unis
    Unis almost 5 years
    Latin1 encoding also works for German umlauts (utf8 did not). Gracias!
  • max
    max over 3 years
    Worked! Thank you
  • Brian Keith
    Brian Keith over 3 years
    Thank you! This solved my issue with importing data for a Brazilian client!
  • jeffsdata
    jeffsdata over 2 years
    This ended up working for me. UTF-8 wasn't throwing an error - but it was turning "é" into "é". latin1 didn't work - it threw an error on "ś".