Counting occurrences in first column of a file
20,080
If the input is sorted, you can use uniq:
<infile cut -d' ' -f1 | uniq -c
If not, sort it first:
<infile cut -d' ' -f1 | sort -n | uniq -c
Output:
3 1
1 3
2 52
The output is swapped compared to your requirement, you can use awk '{ print $2, $1 }'
to change that.
1 3
3 1
52 2
There's also the awk idiom, which does not require sorted input:
awk '{h[$1]++}; END { for(k in h) print k, h[k] }'
Output:
1 3
52 2
3 1
As the output here comes from a hash it will not be ordered, pass to sort -n
if that is needed:
awk '{h[$1]++} END { for(k in h) print k, h[k] }' | sort -n
If you're using GNU awk, you can do the sorting from within awk:
awk '{h[$1]++} END { n = asorti(h, d, "@ind_num_asc"); for(i=1; i<=n; i++) print d[i], h[d[i]] }'
In the last two cases the output is:
1 3
3 1
52 2
Related videos on Youtube
Author by
Arash
I believe If some days your dignity came down don't give up hope because the sun every evening sets to rise tomorrow's morning
Updated on September 18, 2022Comments
-
Arash over 1 year
We have this file:
1 2 1 3 1 2 3 3 52 1 52 300
and 1000 more.
I want to count the number of times each value occurs in the first column.
1 3 3 1 52 2
This means we saw
1
three times.How can I do that, in Perl, AWK or Bash?
-
slhck over 11 yearsHi arashams! I saw you recently asked very similar questions that all revolve around the same topic. I'm sure the community would like to help you, but maybe you could show us what you've already tried and where exactly you got stuck? We require people to show a little effort before asking their questions – there isn't any learning involved from simply asking others to give you the code for a specific thing. Why not tell us what exactly the background of this is? Maybe there is an easier way to accomplish what you want, and we don't need to resort to dummy examples with some abstract numbers?
-
Arash over 11 yearstnx for your help. i'm working with bgpdump data and parsing them.
-
-
Arash over 11 yearscould you plz explain the code??? awk '{h[$1]++} END { for(k in h) print k, h[k] }' | sort -n
-
Thor over 11 years@arashams: The
{h[$1]++}
block is evaluated for each line.h
is a hash and$1
is the first column and used as the key intoh
. So this tallies how often unique$1
's are seen. TheEND
block is executed at the end of input, and prints the keys and tallies.sort -n
sorts the output numerically.