How to Delete Rows CSV in python
Solution 1
This solution uses fileinput
with inplace=True
, which writes to a temporary file and then automatically renames it at the end to your file name. You can't remove rows from a file but you can rewrite it with only the ones you want.
if the keyword argument
inplace=1
is passed tofileinput.input()
or to theFileInput
constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place.
fileA
h1,h2,h3
a,b,c
d,e,f
g,h,i
j,k,l
fileB
h1,h2,h3
a,b,c
1,2,3
g,h,i
4,5,6
import fileinput, sys, csv
with open('fileB', 'rb') as file_b:
r = csv.reader(file_b)
next(r) #skip header
seen = {(row[0], row[2]) for row in r}
f = fileinput.input('fileA', inplace=True) # sys.stdout is redirected to the file
print next(f), # write header as first line
w = csv.writer(sys.stdout)
for row in csv.reader(f):
if (row[0], row[2]) in seen: # write it if it's in B
w.writerow(row)
fileA
h1,h2,h3
a,b,c
g,h,i
Solution 2
As Lennart described, you can't modify a CSV file in-place as you iterate over it.
If you're really opposed to creating a third file, you might want to look into using a string buffer with StringIO, the idea being that you build up the new desired contents of file A in memory. At the end of your script, you can write the contents of the buffer over file A.
from cStringIO import StringIO
with open('fileB', 'rb') as file_b:
new_a_buf = StringIO()
writer = csv.writer(new_a_buf)
reader2 = csv.reader(file_b)
next(reader2)
for row in reader2:
if (row[0], row[2]) not in removal_list:
writer.writerow(row)
# At this point, the contents (new_a_buf) exist in memory
with open('fileA', 'wb') as file_a:
file_a.write(new_a_buf.getvalue())
Solution 3
CSV is not a database format. It is read and written as a whole. You can't remove rows in the middle. So the only way to do this without creating a third file is to read in the file completely in memory and then write it out, without the offending rows.
But in general it's better to use a third file.
justin
Updated on May 13, 2020Comments
-
justin almost 4 years
I'm trying to compare two csv files (fileA and fileB), and remove any rows from fileA that are not found in fileB. I want to be able to do this without creating a third file. I thought I could do this using the csv writer module but now I'm second guessing myself.
Currently, I'm using the following code to record my comparison data from file B:
removal_list = set() with open('fileB', 'rb') as file_b: reader1 = csv.reader(file_b) next(reader1) for row in reader1: removal_list.add((row[0], row[2]))
This is where I'm stuck and do not know how to delete the rows:
with open('fileA', 'ab') as file_a: with open('fileB', 'rb') as file_b: writer = csv.writer(file_a) reader2 = csv.reader(file_b) next(reader2) for row in reader2: if (row[0], row[2]) not in removal_list: # If row was not present in file B, Delete it from file A. #stuck here: writer.<HowDoIRemoveRow>(row)
-
Burhan Khalid about 11 yearsA word of caution here: you may exhaust the available memory for your system if your input files are large.
-
jamylak about 11 yearsYou may as well just write to a different file and rename it at the end, that is what my solution does
-
David Cain about 11 years@jamylak, I completely agree with you. And that's exactly what I would do in this situation. I just figured this would be useful in that in technically meets what the asker is looking for.
-
David Cain about 11 yearsA subtle improvement not addressed in the explanation: this code uses a set, a far more optimal data structure for answering "is this data present?" than a list (which must be iterated over each time).
-
jamylak about 11 years@David Op also used a set though
-
David Cain about 11 yearsD'oh. S/he clearly did. Well, small bit of advice- don't call it a removal "list", or bone-headed people like me will get confused as to the variable's type. =)
-
justin about 11 yearswhat version of python? I don't believe this syntax is 2.4 compatible
-
jamylak about 11 years@justin You tagged it as 2.7? You can just use
set((row[0], row[1]) for row in r)
instead -
justin about 11 yearsI did. I apologize. I didn't realize the house for my script is on a 2.4 box until now.