Counting occurrences in first column of a file

20,080

If the input is sorted, you can use uniq:

<infile cut -d' ' -f1 | uniq -c

If not, sort it first:

<infile cut -d' ' -f1 | sort -n | uniq -c

Output:

  3 1                                      
  1 3
  2 52

The output is swapped compared to your requirement, you can use awk '{ print $2, $1 }' to change that.

1 3 
3 1
52 2

There's also the awk idiom, which does not require sorted input:

awk '{h[$1]++}; END { for(k in h) print k, h[k] }'

Output:

1 3
52 2
3 1

As the output here comes from a hash it will not be ordered, pass to sort -n if that is needed:

awk '{h[$1]++} END { for(k in h) print k, h[k] }' | sort -n

If you're using GNU awk, you can do the sorting from within awk:

awk '{h[$1]++} END { n = asorti(h, d, "@ind_num_asc"); for(i=1; i<=n; i++) print d[i], h[d[i]] }'

In the last two cases the output is:

1 3
3 1
52 2
Share:
20,080

Related videos on Youtube

Arash
Author by

Arash

I believe If some days your dignity came down don't give up hope because the sun every evening sets to rise tomorrow's morning

Updated on September 18, 2022

Comments

  • Arash
    Arash over 1 year

    We have this file:

    1 2 
    1 3
    1 2
    3 3
    52 1
    52 300
    

    and 1000 more.

    I want to count the number of times each value occurs in the first column.

    1  3 
    3  1
    52 2
    

    This means we saw 1 three times.

    How can I do that, in Perl, AWK or Bash?

    • slhck
      slhck over 11 years
      Hi arashams! I saw you recently asked very similar questions that all revolve around the same topic. I'm sure the community would like to help you, but maybe you could show us what you've already tried and where exactly you got stuck? We require people to show a little effort before asking their questions – there isn't any learning involved from simply asking others to give you the code for a specific thing. Why not tell us what exactly the background of this is? Maybe there is an easier way to accomplish what you want, and we don't need to resort to dummy examples with some abstract numbers?
    • Arash
      Arash over 11 years
      tnx for your help. i'm working with bgpdump data and parsing them.
  • Arash
    Arash over 11 years
    could you plz explain the code??? awk '{h[$1]++} END { for(k in h) print k, h[k] }' | sort -n
  • Thor
    Thor over 11 years
    @arashams: The {h[$1]++} block is evaluated for each line. h is a hash and $1 is the first column and used as the key into h. So this tallies how often unique $1's are seen. The END block is executed at the end of input, and prints the keys and tallies. sort -n sorts the output numerically.