Writing/Reading special characters from CSV (Python 3.6)
- Never open text files without specifying an encoding (this is generally true).
- Always open CSV files with
newline=''
(this applies to the Pythoncsv
module)
So, assuming your CSV file is UTF-8-encoded, use:
with open('output.csv', 'r', encoding='UTF-8', newline='') as csvarchive:
entrada = csv.reader(csvarchive)
for reg in entrada:
# do something with the data row, it's already decoded
The same applies to writing the file:
with open('output.csv', 'w', encoding='UTF-8', newline='') as csvarchive:
writer = csv.writer(csvarchive)
# write data to the writer, it will be encoded automatically
There is no need to do any manual string encoding. Write string values to the csv
writer, file encoding will happen transparently.
Pacullamen
Updated on July 24, 2022Comments
-
Pacullamen almost 2 years
Let's assume that I need to write and then read a list of strings with polish words in a .csv in Python 3.6:
lista=['szczęśliwy','jabłko','słoń','kot']
Since it's not possible to write Unicode characters in the .csv, I encode the strings to utf-8, so data is saved like this in the file (all inside the first .csv cell):
b'szcz\xc4\x99\xc5\x9bliwy',b'jab\xc5\x82ko',b's\xc5\x82o\xc5\x84',b'kot'
But I am not able to decode the data from the output.csv file using this code:
with open('output.csv') as csvarchive: entrada = csv.reader(csvarchive) for reg in entrada: lista2=reg print(lista2) ["b'szcz\\xc4\\x99\\xc5\\x9bliwy'", "b'jab\\xc5\\x82ko'", "b's\\xc5\\x82o\\xc5\\x84'", "b'kot'"]
lista2
is still a list of strings but with the utf-8 codification and I am not able to recover the special characters.I tried several things like reading the file in
'rb'
mode, encoding and decoding again... But since I am new in these matters I didn't make it. It must have very easy solution. -
Mark Ransom over 6 yearsThis right here is the beauty of Python 3's treatment of Unicode. Specify the encoding once, then forget about it.
-
Pacullamen over 6 yearsLesson learned. I guess I have to get more familiar with all these matters. Thanks a lot Tomalak.