Group by and sum in shell script without awk

7,341

Solution 1

I think this'll do:

awk 'BEGIN{FS=OFS=","}{a[$1]+=$2}END{ for (i in a) print i,a[i]}'

Solution 2

Pure bash, one-liner:

unset x y sum; while IFS=, read x y; do ((sum[$x]+=y)); done <  input.csv; for i in ${!sum[@]}; do echo $i,${sum[$i]}; done

Or in more readable form:

unset x y sum
while IFS=, read x y; do
    ((sum[$x]+=y)); done < input.csv
for i in ${!sum[@]}; do
    echo $i,${sum[$i]}
done

The result:

100,400
201,400
300,600

Solution 3

With python this can be done more effectively. This program by default expects the file to be named as 'file.txt', which you can change if needed.

#!/usr/bin/env python3

col1, col2 = [ list(y) for y in zip(*[ x.strip().split(',') for x in open('file.txt').readlines() if x != '\n' ]) ]

for (offset,x) in enumerate(list(col1)):
    value = 0
    while col1.count(x) > 1:
        index = col1.index(x)
        col1.pop(index)
        value =  int(col2.pop(index))

        index = col1.index(x)
        col2[index] = int(col2[index]) + value

for x, y in zip(col1, col2):
    print(x,',',y)

Output:

201 , 400
300 , 600
100 , 400
Share:
7,341

Related videos on Youtube

Sunny Monga
Author by

Sunny Monga

Updated on September 18, 2022

Comments

  • Sunny Monga
    Sunny Monga almost 2 years

    I have a file like:

    $ cat input.csv
    201,100
    201,300
    300,100
    300,500
    100,400
    

    I want to add the values in column 2 which has same value in column 1. Expected output is as follows:

    $ cat output.csv
    201,400
    300,600
    100,400
    

    I tried to do this by awk command but it is not working in Solaris. Please provide some alternative.

    • PM 2Ring
      PM 2Ring over 9 years
      Please show us your awk code.
    • Stéphane Chazelas
      Stéphane Chazelas over 9 years
      On Solaris, use nawk or /usr/xpg4/bin/awk, or add a PATH=`getconf PATH`:$PATH as the one in /bin is an ancient non-standard one.
    • Nickolay
      Nickolay about 9 years
      The answers here focus on one-liners and custom scripts. For those looking for an existing utility, see this question: unix.stackexchange.com/q/85204/41737
  • jimmij
    jimmij over 9 years
    The title: "group by and sum in shell script without awk"
  • Kannan Mohan
    Kannan Mohan over 9 years
    The answer is great, AWK rocks!!!
  • jimmij
    jimmij over 9 years
    So use uniq or sort, but if OP explicitly ask for non-awk solution I believe that should be respected.
  • Marek Zakrzewski
    Marek Zakrzewski over 9 years
    @jimmij I am curious on your sh answer. If you can achieve the above in sh only, I'll remove my answer!
  • jimmij
    jimmij over 9 years
    @val0x00ff I've just submitted solution in pure bash. I don't want to criticize, I didn't downvote etc, just saying that OP asked to not use awk. Of course I agree that awk is the best tool for this job.
  • Marek Zakrzewski
    Marek Zakrzewski over 9 years
    @jimmij Your answer is ofcourse the OP wanted. Thanks for providing this solution!
  • orion
    orion over 9 years
    Just use sys.argv[1] for a filename or read from sys.stdin, if filename is not specified.