Calculate and divide by total with AWK
Solution 1
To create the table with a single call to awk
:
$ awk 'FNR==NR{s+=$2;next;} {printf "%s\t%s\t%s%%\n",$1,$2,100*$2/s}' data data
foo 10 10%
bar 20 20%
oof 50 50%
rab 20 20%
How it works
The file data
is provided as an argument to awk
twice. Consequently, it will be read twice, the first time to get the total, which is stored in the variable s
, and the second to print the output. Looking at the commands in more detail:
FNR==NR{s+=$2;next;}
NR is the total number of records (lines) that
awk
has read and FNR is the number of records read so far from the current file. Consequently, whenFNR==NR
, we are reading the first file. When this happens, the variables
is incremented by the value in the second column. Then,next
tellsawk
to skip the rest of the commands and start over with the next record.Note that it is not necessary to initialize
s
to zero. Inawk
, all numeric variables are, by default, initialized to zero.printf "%s\t%s\t%s%%\n",$1,$2,100*$2/s
If we reach this command, then we are processing the second file. This means that
s
now holds the total of column 2. So, we print column 1, column 2, and the percentage,100*$2/s
.
Output format options
With printf
, detailed control of the output format is possible. The command above uses the %s
format specifier which works for strings, integers, and floats. Three other option that might be useful here are:
%d
formats numbers as integers. If the number is actually floating point, it will be truncated to an integer%f
formats numbers as floating point. It is also possible to specify widths and decimals places as, for example,%5.2f
.%e
provides exponential notation. This would be useful if some numbers were exceptionally large or small.
Make a shell function
If you are going to use this more than once, it is an inconvenience to type a long command. Instead create either a function or a script to hole the command.
To create a function called totals
, run the command:
$ totals() { awk 'FNR==NR{s+=$2;next;} {printf "%s\t%s\t%s%%\n",$1,$2,100*$2/s}' "$1" "$1"; }
With this function defined, the percentages for a data file called data
can be found by running:
$ totals data
To make the definition of totals
permanent, place it in your ~/.bashrc
file.
Make a shell script
If you prefer a script, create a file called totals.sh
with the contents:
#!/bin/sh
awk 'FNR==NR{s+=$2;next;} {printf "%s\t%s\t%s%%\n",$1,$2,100*$2/s}' "$1" "$1"
To get the percentages for a data file called data
, run:
sh totals.sh data
Solution 2
The "simple" way of doing this would be to call awk
twice: once to get the total, another time to calculate the ratios.
$ total=$(awk 'BEGIN{ total=0 } { total=total+$2 } END{ printf total }' data)
$ awk -v total=$total '{ print $1 "\t" $2 "\t" 100*$2/total "%" }' data
Now I'm sure someone will come up with a one-liner somehow...
Solution 3
Awk way with one file open(for completeness)
awk '{a[NR]=$0;x+=(b[NR]=$2)}END{while(++i<=NR)print a[i]"\t"100*b[i]/x"%"}' file
foo 10 10%
bar 20 20%
oof 50 50%
rab 20 20%
This will use more memory than the others but should be faster
This reads the line into array a
and field two into array b
.
Then increments x
by the value in field 2.
At the end it iterates from 1 to the number of records and outputs the correct line and calculates the percentage.
Related videos on Youtube
Rip Leeb
Updated on September 18, 2022Comments
-
Rip Leeb almost 2 years
Given the following
data
file...foo 10 bar 20 oof 50 rab 20
... how would I print column two as a percent of the total of column two? In other words, I want...
foo 10 10% bar 20 20% oof 50 50% rab 20 20%
... with less obvious numbers of course. I can create a running total easily enough, but I'm not sure how I can calculate the total before printing the lines. I am doing this in an awk file
totals.awk
...#!/usr/bin/awk -f BEGIN{ runningtotal=0 } { runningtotal=runningtotal+$2 print $1 "\t" $2 "\t" runningtotal "\t" $2/runningtotal }
So, running
./totals.awk data
yields...foo 10 10 1 bar 20 30 0.666667 oof 50 80 0.625 rab 20 100 0.2
Is there a way to loop twice, once to calculate the total, and once to print the lines? Is this possible in AWK, or must I use other utilities?
-
don_crissti over 9 years
-
syntaxerror over 9 yearsBTW you need not run it as
cat data | ./totals.awk
. Consider thatawk
WANTS a file, so just do a./totals.awk data
(or if you insist,totals.awk < data
, in this case it should not matter) instead. Anyways, the gist is: in any case, omit that superfluouscat
.
-
-
syntaxerror over 9 yearsJust a +1 for not using
cat
. It always hurts my eyes to seecat
"misused" like that. Poorcat
! (meow) ;) -
yorkshiredev over 9 yearsAnother alternative would be to use input redirection: this would even save
awk
the trouble of opening the file. -
syntaxerror over 9 yearsYes, but chances are the latter will not be entirely POSIX-compliant then. (From the top of my head though, would have to try it out to be sure, perhaps I'm again confusing it with command substitution ;))
-
αғsнιη over 9 yearsIsn't good to use
%d
for second and third columns instead of%s
for them?! -
John1024 over 9 years@KasiyA Based on your suggestion, I added a section on
printf
formats.%s
will show integer or floating point as the number requires.%d
would, of course, truncate floating point numbers to integer, potentially losing accuracy. Whether that format is better than more precise output is, I think, a matter for the OP to decide. -
syntaxerror over 9 yearsThanks, very interesting stuff. But do not even think of using this in
C
! Asawk
'sprintf
syntax is pretty alike to theC
syntax, this looks sooo darn tempting. ButC
, in contrary, will show itself obdurate here once more (just had to test it! :))warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘int’
. Trying to run that program even resulted in a segfault...!---So what this post is supposed to say is: don't assume that what you can do inawk
would be possible inC
as well.awk
is WAY more liberal in that respect. -
Admin over 9 years@John1024 you can also use the form
file{,}
for two files or add more commas for more, it saves space if filenames are large. -
Rip Leeb over 9 yearsIs there a way to parse the output stored in the associative array
a
inside ofEND
? Say I only wanted to print the first column and the percentage. -
Rip Leeb over 9 yearsI don't need a one-liner - in fact I am hoping for a .awk file which I can call like
./totals.awk data
. Is there a way to make this work without needing to input the file twice like./totals data data
? I appreciate the explanations btw. -
John1024 over 9 years@Nate I have added sections for making the code into (a) shell function or (b) a shell script. Both work as well. Both eliminate the need for write the file name twice. Use whichever approach you are more comfortable with.
-
Admin over 9 years@Nate There is but i don't think it would be necessary, you could just save the first column instead of the line,
a[NR]=$1
instead ofa[NR]=$0