awk determine unique rows based on subset of columns
Solution 1
To print a certain column, let's say column 1 and 4, simply run:
awk -F',' '{print $1, $4}' file
The command here is to ask awk
to print the designated column separated by the ','
You can change the column number, add / remove the $NUM to get different columns.
Combined with sort
, you will get the unique rows:
awk -F',' '{print $1, $4}' file | sort -u
The -u
flag removes duplicate items from the output.
Solution 2
You can do this by combining the column-values in the hash key, e.g. assuming your input is sorted, this one-pass solution works for column 1-3:
awk '!h[$1,$2,$3]++ { NF--; print }' FS=, OFS=, data.csv
Output:
Col1,Col2,Col3
A,10,50
A,10,05
B,20,30
B,20,03
C,30,100
C,30,111
C,40,111
C,30,123
For columns 1 and 4, do something like this:
awk '!h[$1,$4]++ { print $1, $4 }' FS=, OFS=, data.csv
Output:
Col1,Col4
A,2017
B,2017
C,2017
C,2016
C,2015
Related videos on Youtube
conor
Updated on September 18, 2022Comments
-
conor over 1 year
I'm looking to return rows in a csv that are unique for a specified subset of columns.
Example data:
Col1,Col2,Col3,Col4 A,10,50,2017 A,10,05,2017 B,20,30,2017 B,20,03,2017 C,30,100,2017 C,30,111,2017 C,30,100,2016 C,40,111,2016 C,30,123,2015
This command outputs the unique values of a single column (column 1 in this case):
awk -F , '{ a[$1]++ } END { for (b in a) { print b } }' file
returns
Col1 A B C
I'd like a command to do this for 2 columns or n columns.
Desired output for unique rows using 2 columns (Col1 and Col3):
Col1,Col3 A,50 A,05 B,30 B,03 C,100 C,111 C,123
For Col1 and Col4
Col1,Col4 A,2017 B,2017 C,2017 C,2016 C,2015
For 3 columns Col1, Col2, and Col3
Col1,Col2,Col3 A,10,50 A,10,05 B,20,30 B,20,03 C,30,100 C,30,111 C,40,111 C,30,123
-
conor almost 7 yearsWorks as requested and can be modified for n columns, e.g.
awk -F',' '{print $1, $3, $4}' file | sort |uniq
. Upvote cast but not displaying due to my low rep. -
grooveplex almost 7 yearsI believe you can also use
sort -u
instead ofsort | uniq
-
P.-H. Lin almost 7 years@grooveplex oh right! Thanks for the reminder!