Writing/Reading special characters from CSV (Python 3.6)

12,350
  1. Never open text files without specifying an encoding (this is generally true).
  2. Always open CSV files with newline='' (this applies to the Python csv module)

So, assuming your CSV file is UTF-8-encoded, use:

with open('output.csv', 'r', encoding='UTF-8', newline='') as csvarchive:
    entrada = csv.reader(csvarchive)
    for reg in entrada:
        # do something with the data row, it's already decoded

The same applies to writing the file:

with open('output.csv', 'w', encoding='UTF-8', newline='') as csvarchive:
    writer = csv.writer(csvarchive)
    # write data to the writer, it will be encoded automatically

There is no need to do any manual string encoding. Write string values to the csv writer, file encoding will happen transparently.

Share:
12,350
Pacullamen
Author by

Pacullamen

Updated on July 24, 2022

Comments

  • Pacullamen
    Pacullamen almost 2 years

    Let's assume that I need to write and then read a list of strings with polish words in a .csv in Python 3.6:

    lista=['szczęśliwy','jabłko','słoń','kot']
    

    Since it's not possible to write Unicode characters in the .csv, I encode the strings to utf-8, so data is saved like this in the file (all inside the first .csv cell):

    b'szcz\xc4\x99\xc5\x9bliwy',b'jab\xc5\x82ko',b's\xc5\x82o\xc5\x84',b'kot'
    

    But I am not able to decode the data from the output.csv file using this code:

    with open('output.csv') as csvarchive:
        entrada = csv.reader(csvarchive)
        for reg in entrada:
            lista2=reg
    
    print(lista2)
    ["b'szcz\\xc4\\x99\\xc5\\x9bliwy'", "b'jab\\xc5\\x82ko'", "b's\\xc5\\x82o\\xc5\\x84'", "b'kot'"]
    

    lista2 is still a list of strings but with the utf-8 codification and I am not able to recover the special characters.

    I tried several things like reading the file in 'rb' mode, encoding and decoding again... But since I am new in these matters I didn't make it. It must have very easy solution.

  • Mark Ransom
    Mark Ransom over 6 years
    This right here is the beauty of Python 3's treatment of Unicode. Specify the encoding once, then forget about it.
  • Pacullamen
    Pacullamen over 6 years
    Lesson learned. I guess I have to get more familiar with all these matters. Thanks a lot Tomalak.