Remove unwanted commas from CSV using Python

10,690

Solution 1

You can define the separating and quoting characters with Python's CSV reader. For example:

With this CSV:

1,`Flat 5, Park Street`

And this Python:

import csv

with open('14144315.csv', 'rb') as csvfile:
    rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')
    for row in rowreader:
        print row

You will see this output:

['1', 'Flat 5, Park Street']

This would use commas to separate values but inverted commas for quoted commas

Solution 2

The CSV file was not generated properly. CSV files should have some form of escaping of text, usually using double-quotes:

1,John Doe,"City, State, Country",12345

Some CSV exports do this to all fields (this is an option when exporting from Excel/LibreOffice), but ambiguous fields (such as those including commas) must be escaped.

Either fix this manually or properly regenerate the CSV. Naturally, this cannot be fixed programatically.

Edit: I just noticed something about "inverted commas" being used for escaping - if that is the case see Jason Sperske's answer, which is spot on.

Share:
10,690

Related videos on Youtube

merlin_1980
Author by

merlin_1980

A finance professional with an interest in data visualisation and an amateur (newbie) programmer!

Updated on September 15, 2022

Comments

  • merlin_1980
    merlin_1980 over 1 year

    I need some help, I have a CSV file that contains an address field, whoever input the data into the original database used commas to separate different parts of the address - for example:

    Flat 5, Park Street

    When I try to use the CSV file it treats this one entry as two separate fields when in fact it is a single field. I have used Python to strip commas out where they are between inverted commas as it is easy to distinguish them from a comma that should actually be there, however this problem has me stumped.

    Any help would be gratefully received.

    Thanks.

    • Martin Samson
      Martin Samson over 11 years
      The problem is not how it is stored in the database but how the CSV file was generated. If you still have access to the DB, use python's built-in CSV module to re-generate the CSV file. It will then have properly escaped string sequences.
    • Yinda Yin
      Yinda Yin over 11 years
      The proper way to handle this is to enclose the strings in double-quotes. CSV readers treat commas within quoted strings as part of the string.
    • l4mpi
      l4mpi over 11 years
      Is the address format the same for every record? Meaning, does every line contain the same amount of "unwanted" commas? If yes you can fix this in a few lines with split, surrounding the whole address field with double-quotes - or simply edit the header line to use multiple fields for the address.