Remove entries from one CSV file that are already present in another
Solution 1
I'm assuming your csv
files are something like:
File1
123123,,
222333,,
File2
111222,Jones,Sally
111333,Johnson,Roger
123123,Doe,John
444555,Richardson,George
222333,Smith,Jane
223456,Alexander,Philip
You could try using the join
command, like so:
# join -t, -v 2 <(sort file1) <(sort file2)
111222,Jones,Sally
111333,Johnson,Roger
223456,Alexander,Philip
444555,Richardson,George
More information about the command can be found here: man join
join [OPTION]... FILE1 FILE2
-t CHAR
use CHAR as input and output field separator
-v FILENUM
like -a FILENUM, but suppress joined output lines
Solution 2
Try this:
awk 'BEGIN{FS=","};FNR==NR{a[$1];next};!($1 in a)' file1 file2 > file3
Solution 3
You can also try the following Python2 solution:
#!/usr/bin/env python2
import csv
with open('file_1') as f1:
file_1_list = [line[0] for line in csv.reader(f1)]
with open('file_2') as f2:
for line in csv.reader(f2):
if line[0] not in file_1_list:
print ' '.join(line)
Related videos on Youtube
![pgrason](https://i.stack.imgur.com/dI2wi.jpg?s=256&g=1)
pgrason
Updated on September 18, 2022Comments
-
pgrason almost 2 years
I have two files: 'file1' has employee ID numbers, 'file2' has the complete database of the employees. Here is what they look like:
file1
123123 222333
file2
111222 Jones Sally 111333 Johnson Roger 123123 Doe John 444555 Richardson George 222333 Smith Jane 223456 Alexander Philip
I want to compare the two files and eliminate the entries from
file2
that have ID numbers infile1
.I found this
awk
command which works perfectly:awk 'FNR==NR{a[$1];next};!($1 in a)' file1 file2 > file3
The result:
file3
111222 Jones Sally 111333 Johnson Roger 444555 Richardson George 223456 Alexander Philip
So this works as expected.
My problem is that the files are actually simplified
.csv
files, and I must use a comma as a separator rather than a space. I have tried everything I can think of to make this work (i.e-F,
,-F','
,-F","
everywhere in the command) and no success.How do I get this to work with
.csv
files?By the way, I am on MacBook Pro, OSX Lion!
-
Admin over 9 yearsDid you have a space after
-F
?
-
peterh over 9 yearsThe idea is okay, but a code snippet-only answer is not.
-
pgrason over 9 years"join" works, thanks. However, sometimes I want to use a different field in the files. So maybe the "awk" is better.
-
pgrason over 9 yearsThis works the way I want. I can chose which field to use as the key. Thanks. Will there be any problems with very large file1 & file2?
-
pgrason over 9 yearsI just tried this command on two large .csv files and it worked just as I wanted. Thanks!
-
devnull over 9 years@pgrason Define 'fields', if there is a common field in both, join should always work.
-
Matthias B almost 6 years@pgrason Is this the way you solved your problem? Then please accept the answer, so others know what worked for you.