Group by and sum in shell script without awk

shell-script text-processing solaris csv-simple

7,341

Solution 1

I think this'll do:

awk 'BEGIN{FS=OFS=","}{a[$1]+=$2}END{ for (i in a) print i,a[i]}'

Solution 2

Pure bash, one-liner:

unset x y sum; while IFS=, read x y; do ((sum[$x]+=y)); done <  input.csv; for i in ${!sum[@]}; do echo $i,${sum[$i]}; done

Or in more readable form:

unset x y sum
while IFS=, read x y; do
    ((sum[$x]+=y)); done < input.csv
for i in ${!sum[@]}; do
    echo $i,${sum[$i]}
done

The result:

100,400
201,400
300,600

Solution 3

With python this can be done more effectively. This program by default expects the file to be named as 'file.txt', which you can change if needed.

#!/usr/bin/env python3

col1, col2 = [ list(y) for y in zip(*[ x.strip().split(',') for x in open('file.txt').readlines() if x != '\n' ]) ]

for (offset,x) in enumerate(list(col1)):
    value = 0
    while col1.count(x) > 1:
        index = col1.index(x)
        col1.pop(index)
        value =  int(col2.pop(index))

        index = col1.index(x)
        col2[index] = int(col2[index]) + value

for x, y in zip(col1, col2):
    print(x,',',y)

Output:

201 , 400
300 , 600
100 , 400

7,341

Sunny Monga

Updated on September 18, 2022

Comments

Sunny Monga almost 2 years
I have a file like:
```
$ cat input.csv
201,100
201,300
300,100
300,500
100,400
```
I want to add the values in column 2 which has same value in column 1. Expected output is as follows:
```
$ cat output.csv
201,400
300,600
100,400
```
I tried to do this by awk command but it is not working in Solaris. Please provide some alternative.
- PM 2Ring over 9 years
  
  Please show us your awk code.
- Stéphane Chazelas over 9 years
  
  On Solaris, use nawk or /usr/xpg4/bin/awk, or add a PATH=`getconf PATH`:$PATH as the one in /bin is an ancient non-standard one.
- Nickolay about 9 years
  
  The answers here focus on one-liners and custom scripts. For those looking for an existing utility, see this question: unix.stackexchange.com/q/85204/41737
jimmij over 9 years

The title: "group by and sum in shell script without awk"
Kannan Mohan over 9 years

The answer is great, AWK rocks!!!
jimmij over 9 years

So use uniq or sort, but if OP explicitly ask for non-awk solution I believe that should be respected.
Marek Zakrzewski over 9 years

@jimmij I am curious on your sh answer. If you can achieve the above in sh only, I'll remove my answer!
jimmij over 9 years

@val0x00ff I've just submitted solution in pure bash. I don't want to criticize, I didn't downvote etc, just saying that OP asked to not use awk. Of course I agree that awk is the best tool for this job.
Marek Zakrzewski over 9 years

@jimmij Your answer is ofcourse the OP wanted. Thanks for providing this solution!
orion over 9 years

Just use sys.argv[1] for a filename or read from sys.stdin, if filename is not specified.