Bash- is it possible to use -uniq for only one column of a line?

10,428

Solution 1

Try this:

sort -rnk3 myfile | awk -F"[. ]" '!a[$2]++'

awk removes the duplicates depending on the 2nd column. This is actually a famous awk syntax to remove duplicates. An array is maintained where the record of 2nd field is maintained. Every time before a record is printed, the 2nd field is checked in the array. If not present, it is printed, else its discarded since it is duplicate. This is achived using the ++. First time, when a record is encountered, this ++ will keep the count as 0 since its post-fix. SUbsequent occurences will increase the value which when negated becomes false.

Solution 2

Here you go:

sort -rnk3 file | awk -F'[. ]' '{ if (a[$2]++ == 0) print }' 

2.gu   Qxy  23
4.gui  Qxr  21
1.guT  QWS  18

This uses awk to check duplicate values in the second field where by the field separator is either a whitespace or a period. So this is what it treats the second field as:

$ awk -F'[. ]' '{ print $2 }' file

gu
gui
guT
gui

In awk the variable $0 represents the whole line, $1 represents the first field, and so on..

awk -F'[. ]' '{ if (a[$2]++ == 0) print }' the -F options let you specify the field separator, in this case it's either whitespace or a period.

Share:
10,428
teutara
Author by

teutara

Updated on June 13, 2022

Comments

  • teutara
    teutara almost 2 years
        1.gui  Qxx  16
        2.gu   Qxy  23
        3.guT  QWS  18
        4.gui  Qxr  21
    

    i want to sort a file depending a value in the 3rd column, so i use:

    sort -rnk3 myfile
    
    2.gu   Qxy  23
    4.gui  Qxr  21
    3.guT  QWS  18
    1.gui  Qxx  16
    

    now i have to output as: (the line starting with 3.gui is out because the line with 4.gui has a greater value)

    2.gu   Qxy  23
    4.gui  Qxr  21
    1.guT  QWS  18
    

    i can not use -head because i have millions of rows and i do not where to cut, i could not figure a way to use -uniq because it treats a line as whole and since i can not tell -uniq to look at first column, it counts a line which has unique it outputs it -which is normal-. i know -uniq can ignore a number of characters but as you can see from example first column might have various character count..

    please advice..