Group by and sum in shell script without awk
7,341
Solution 1
I think this'll do:
awk 'BEGIN{FS=OFS=","}{a[$1]+=$2}END{ for (i in a) print i,a[i]}'
Solution 2
Pure bash
, one-liner:
unset x y sum; while IFS=, read x y; do ((sum[$x]+=y)); done < input.csv; for i in ${!sum[@]}; do echo $i,${sum[$i]}; done
Or in more readable form:
unset x y sum
while IFS=, read x y; do
((sum[$x]+=y)); done < input.csv
for i in ${!sum[@]}; do
echo $i,${sum[$i]}
done
The result:
100,400
201,400
300,600
Solution 3
With python
this can be done more effectively. This program by default expects the file to be named as 'file.txt', which you can change if needed.
#!/usr/bin/env python3
col1, col2 = [ list(y) for y in zip(*[ x.strip().split(',') for x in open('file.txt').readlines() if x != '\n' ]) ]
for (offset,x) in enumerate(list(col1)):
value = 0
while col1.count(x) > 1:
index = col1.index(x)
col1.pop(index)
value = int(col2.pop(index))
index = col1.index(x)
col2[index] = int(col2[index]) + value
for x, y in zip(col1, col2):
print(x,',',y)
Output:
201 , 400
300 , 600
100 , 400
Related videos on Youtube
Author by
Sunny Monga
Updated on September 18, 2022Comments
-
Sunny Monga almost 2 years
I have a file like:
$ cat input.csv 201,100 201,300 300,100 300,500 100,400
I want to add the values in column 2 which has same value in column 1. Expected output is as follows:
$ cat output.csv 201,400 300,600 100,400
I tried to do this by
awk
command but it is not working in Solaris. Please provide some alternative.-
PM 2Ring over 9 yearsPlease show us your awk code.
-
Stéphane Chazelas over 9 yearsOn Solaris, use
nawk
or/usr/xpg4/bin/awk
, or add aPATH=`getconf PATH`:$PATH
as the one in/bin
is an ancient non-standard one. -
Nickolay about 9 yearsThe answers here focus on one-liners and custom scripts. For those looking for an existing utility, see this question: unix.stackexchange.com/q/85204/41737
-
-
jimmij over 9 yearsThe title: "group by and sum in shell script without awk"
-
Kannan Mohan over 9 yearsThe answer is great, AWK rocks!!!
-
jimmij over 9 yearsSo use
uniq
orsort
, but if OP explicitly ask for non-awk solution I believe that should be respected. -
Marek Zakrzewski over 9 years@jimmij I am curious on your
sh
answer. If you can achieve the above insh
only, I'll remove my answer! -
jimmij over 9 years@val0x00ff I've just submitted solution in pure bash. I don't want to criticize, I didn't downvote etc, just saying that OP asked to not use
awk
. Of course I agree thatawk
is the best tool for this job. -
Marek Zakrzewski over 9 years@jimmij Your answer is ofcourse the OP wanted. Thanks for providing this solution!
-
orion over 9 yearsJust use
sys.argv[1]
for a filename or read fromsys.stdin
, if filename is not specified.