Pandas.read_csv() with special characters (accents) in column names �
Solution 1
I found the same problem with spanish, solved it with with "latin1" encoding:
import pandas as pd
pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";", encoding='latin1')
Hope it helps!
Solution 2
You can change the encoding
parameter for read_csv, see the pandas doc here. Also the python standard encodings are here.
I believe for your example you can use the utf-8
encoding (assuming that your language is French).
df = pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='utf-8')
Here's an example showing some sample output. All I did was make a csv file with one column, using the problem characters.
df = pd.read_csv('sample.csv', encoding='utf-8')
Output:
IAS_lissé
0 1
1 2
2 3
Solution 3
Try using:
import pandas as pd
df = pd.read_csv('file_name.csv', encoding='utf-8-sig')
Solution 4
Using utf-8 didn't work for me. E.g. this piece of code:
bla = pd.DataFrame(data = [1, 2])
bla.to_csv('funkyNamé , things.csv')
blabla = pd.read_csv('funkyNamé , things.csv', delimiter=";", encoding='utf-8')
blabla
Ultimately returned: OSError: Initializing from file failed
I know you said you didn't want to modify the file. If you meant the file content vs the filename, I would rename the file to something without an accent, read the csv file under its new name, then reset the filename back to its original name.
originalfilepath = r'C:\Users\myself\\funkyNamé , things.csv'
originalfolder = r'C:\Users\myself'
os.rename(originalfilepath, originalFolder+"\\tempName.csv")
df = pd.read_csv(originalFolder+"\\tempName.csv", encoding='ISO-8859-1')
os.rename(originalFolder+"\\tempName.csv", originalfilepath)
If you did mean "without modifying the filename, my apologies for not being helpful to you, and I hope this helps someone else.
Related videos on Youtube
farhawa
I am a junior python developer. I am using python for machine-learning issues
Updated on January 29, 2021Comments
-
farhawa over 3 years
I have a
csv
file that contains some data with columns names:- "PERIODE"
- "IAS_brut"
- "IAS_lissé"
- "Incidence_Sentinelles"
I have a problem with the third one "IAS_lissé" which is misinterpreted by
pd.read_csv()
method and returned as �.What is that character?
Because it's generating a bug in my flask application, is there a way to read that column in an other way without modifying the file?
In [1]: import pandas as pd In [2]: pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";").columns Out[2]: Index([u'PERIODE', u'IAS_brut', u'IAS_liss�', u'Incidence_Sentinelles'], dtype='object')
-
Sohier Dane over 7 yearsLooks like Pandas can't handle unicode characters in the column names. Try converting the column names to ascii. Note that you'll lose the accent.
-
bsplosion almost 4 yearsThe comment above is not true and wasn't true as of its posting - see any of the answers below for the proper way to handle non-ASCII (generally by setting encoding to utf-8 or latin1).
-
farhawa over 7 yearsOups ! I got an error ( the same one )
'utf8' codec can't decode byte 0xe9 in position 8: unexpected end of data
-
Kartik over 7 yearsThat is because your data is not encoded to
utf-8
. Trylatin1
:pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='latin1')
... -
shawnheide over 7 yearsYeah,
latin1
is what I had at first, but changed it toutf-8
. @farhawa, if you want a better answer you can post your csv or a sample of it with the header so that we know what your encoding is. -
Abhishek Pansotra almost 6 yearsThanks..encoding 'ISO-8859-1' worked for me. My data had pound sign, semi colons etc.
-
Unis almost 5 yearsLatin1 encoding also works for German umlauts (utf8 did not). Gracias!
-
max over 3 yearsWorked! Thank you
-
Brian Keith over 3 yearsThank you! This solved my issue with importing data for a Brazilian client!
-
jeffsdata over 2 yearsThis ended up working for me. UTF-8 wasn't throwing an error - but it was turning "é" into "é". latin1 didn't work - it threw an error on "ś".