Remove unwanted commas from CSV using Python
Solution 1
You can define the separating and quoting characters with Python's CSV reader. For example:
With this CSV:
1,`Flat 5, Park Street`
And this Python:
import csv
with open('14144315.csv', 'rb') as csvfile:
rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')
for row in rowreader:
print row
You will see this output:
['1', 'Flat 5, Park Street']
This would use commas to separate values but inverted commas for quoted commas
Solution 2
The CSV file was not generated properly. CSV files should have some form of escaping of text, usually using double-quotes:
1,John Doe,"City, State, Country",12345
Some CSV exports do this to all fields (this is an option when exporting from Excel/LibreOffice), but ambiguous fields (such as those including commas) must be escaped.
Either fix this manually or properly regenerate the CSV. Naturally, this cannot be fixed programatically.
Edit: I just noticed something about "inverted commas" being used for escaping - if that is the case see Jason Sperske's answer, which is spot on.
Related videos on Youtube
merlin_1980
A finance professional with an interest in data visualisation and an amateur (newbie) programmer!
Updated on September 15, 2022Comments
-
merlin_1980 over 1 year
I need some help, I have a CSV file that contains an address field, whoever input the data into the original database used commas to separate different parts of the address - for example:
Flat 5, Park Street
When I try to use the CSV file it treats this one entry as two separate fields when in fact it is a single field. I have used Python to strip commas out where they are between inverted commas as it is easy to distinguish them from a comma that should actually be there, however this problem has me stumped.
Any help would be gratefully received.
Thanks.
-
Martin Samson over 11 yearsThe problem is not how it is stored in the database but how the CSV file was generated. If you still have access to the DB, use python's built-in CSV module to re-generate the CSV file. It will then have properly escaped string sequences.
-
Yinda Yin over 11 yearsThe proper way to handle this is to enclose the strings in double-quotes. CSV readers treat commas within quoted strings as part of the string.
-
l4mpi over 11 yearsIs the address format the same for every record? Meaning, does every line contain the same amount of "unwanted" commas? If yes you can fix this in a few lines with split, surrounding the whole address field with double-quotes - or simply edit the header line to use multiple fields for the address.
-