How can I quickly sum all numbers in a file?

linux perl bash shell awk

185,778

Solution 1

For a Perl one-liner, it's basically the same thing as the awk solution in Ayman Hourieh's answer:

 % perl -nle '$sum += $_ } END { print $sum'

If you're curious what Perl one-liners do, you can deparse them:

 %  perl -MO=Deparse -nle '$sum += $_ } END { print $sum'

The result is a more verbose version of the program, in a form that no one would ever write on their own:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    $sum += $_;
}
sub END {
    print $sum;
}
-e syntax OK

Just for giggles, I tried this with a file containing 1,000,000 numbers (in the range 0 - 9,999). On my Mac Pro, it returns virtually instantaneously. That's too bad, because I was hoping using mmap would be really fast, but it's just the same time:

use 5.010;
use File::Map qw(map_file);

map_file my $map, $ARGV[0];

$sum += $1 while $map =~ m/(\d+)/g;

say $sum;

Solution 2

You can use awk:

awk '{ sum += $1 } END { print sum }' file

Solution 3

None of the solution thus far use paste. Here's one:

paste -sd+ filename | bc

As an example, calculate Σn where 1<=n<=100000:

$ seq 100000 | paste -sd+ | bc -l
5000050000

(For the curious, seq n would print a sequence of numbers from 1 to n given a positive number n.)

Solution 4

Just for fun, let's benchmark it:

$ for ((i=0; i<1000000; i++)) ; do echo $RANDOM; done > random_numbers

$ time perl -nle '$sum += $_ } END { print $sum' random_numbers
16379866392

real    0m0.226s
user    0m0.219s
sys     0m0.002s

$ time awk '{ sum += $1 } END { print sum }' random_numbers
16379866392

real    0m0.311s
user    0m0.304s
sys     0m0.005s

$ time { { tr "\n" + < random_numbers ; echo 0; } | bc; }
16379866392

real    0m0.445s
user    0m0.438s
sys     0m0.024s

$ time { s=0;while read l; do s=$((s+$l));done<random_numbers;echo $s; }
16379866392

real    0m9.309s
user    0m8.404s
sys     0m0.887s

$ time { s=0;while read l; do ((s+=l));done<random_numbers;echo $s; }
16379866392

real    0m7.191s
user    0m6.402s
sys     0m0.776s

$ time { sed ':a;N;s/\n/+/;ta' random_numbers|bc; }
^C

real    4m53.413s
user    4m52.584s
sys 0m0.052s

I aborted the sed run after 5 minutes

I've been diving to lua, and it is speedy:

$ time lua -e 'sum=0; for line in io.lines() do sum=sum+line end; print(sum)' < random_numbers
16388542582.0

real    0m0.362s
user    0m0.313s
sys     0m0.063s

and while I'm updating this, ruby:

$ time ruby -e 'sum = 0; File.foreach(ARGV.shift) {|line| sum+=line.to_i}; puts sum' random_numbers
16388542582

real    0m0.378s
user    0m0.297s
sys     0m0.078s

Heed Ed Morton's advice: using $1

$ time awk '{ sum += $1 } END { print sum }' random_numbers
16388542582

real    0m0.421s
user    0m0.359s
sys     0m0.063s

vs using $0

$ time awk '{ sum += $0 } END { print sum }' random_numbers
16388542582

real    0m0.302s
user    0m0.234s
sys     0m0.063s

Solution 5

Another option is to use jq:

$ seq 10|jq -s add
55

-s (--slurp) reads the input lines into an array.

View more solutions

185,778

Mark Roberts

Updated on April 03, 2021

Comments

Mark Roberts about 3 years
I have a file which contains several thousand numbers, each on it's own line:
```
34
42
11
6
2
99
...
```
I'm looking to write a script which will print the sum of all numbers in the file. I've got a solution, but it's not very efficient. (It takes several minutes to run.) I'm looking for a more efficient solution. Any suggestions?
- brian d foy about 14 years
  
  What was your slow solution? Maybe we can help you figure out what was slow about it. :)
- Mark Roberts about 14 years
  
  @brian d foy, I'm too embarrassed to post it. I know why it's slow. It's because I call "cat filename | head -n 1" to get the top number, add it to a running total, and call "cat filename | tail..." to remove the top line for the next iteration... I have a lot to learn about programming!!!
- dmckee --- ex-moderator kitten about 14 years
  
  That's...very systematic. Very clear and straight forward, and I love it for all that it is a horrible abomination. Built, I assume, out of the tools that you knew when you started, right?
- codeholic about 14 years
  
  full duplicate: stackoverflow.com/questions/450799/…
- David W. over 10 years
  
  @MarkRoberts It must have taken you a long while to work that out. It's a very cleaver problem solving technique, and oh so wrong. It looks like a classic case of over think. Several of Glen Jackman's solutions shell scripting solutions (and two are pure shell that don't use things like awk and bc). These all finished adding a million numbers up in less than 10 seconds. Take a look at those and see how it can be done in pure shell.
- Fortran about 7 years
  
  @ Mark Roberts 1place, stackoverflow.com/a/18380369/4592448 )))
- FlipMcF about 2 years
  
  Such an awesome question and great journey in all the answers.
dmckee --- ex-moderator kitten about 14 years

Very readable. For perl. But yeah, it's going to have to be something like that...
paxdiablo about 14 years

Wow, that shows a deep understanding on what code -nle actually wraps around the string you give it. My initial thought was that you shouldn't post while intoxicated but then I noticed who you were and remembered some of your other Perl answers :-)
brian d foy about 14 years

$_ is the default variable. The line input operator, <>, puts it's result in there by default when you use <> in while.
brian d foy about 14 years

-n and -p just put characters around the argument to -e, so you can use those characters for whatever you want. We have a lot of one-liners that do interesting things with that in Effective Perl Programming (which is about to hit the shelves).
daotoad about 14 years

@Mark, $_ is the topic variable--it works like the 'it'. In this case <> assigns each line to it. It gets used in a number of places to reduce code clutter and help with writing one-liners. The script says "Set the sum to 0, read each line and add it to the sum, then print the sum."
daotoad about 14 years

@Stefan, with warnings and strictures off, you can skip declaring and initializing $sum. Since this is so simple, you can even use a statement modifier while: $sum += $_ while <>; print $sum;
SourceSeeker about 14 years

It doesn't work. bc issues a syntax error because of the trailing "+" and lack of newline at the end. This will work and it eliminates a useless use of cat: { tr "\n" "+" | sed 's/+$/\n/'| bc; } < numbers2.txt or <numbers2.txt tr "\n" "+" | sed 's/+$/\n/'| bc
ghostdog74 about 14 years

tr "\n" "+" <file | sed 's/+$/\n/' | bc
Frank about 14 years

Nice, what are these non-matching curly braces about?
jrockway about 14 years

-n adds the while { } loop around your program. If you put } ... { inside, then you have while { } ... { }. Evil? Slightly.
conny over 12 years

Big bonus for highlighting the -MO=Deparse option! Even though on a separate topic.
leef over 11 years

program exceeded: maximum number of field sizes: 32767
David W. over 10 years

+1: For coming up with a bunch of solutions, and benchmarking them.
nh2 over 10 years

The cumulative version of your one-liner (a rolling sum, printing the current sum for each line): perl -nle '$sum += $_; print $sum} END {'
Ethan Furman almost 10 years

With the -F '\t' option if your fields contain spaces and are separated by tabs.
Brendan Maguire almost 10 years

Very nice! And easy to remember
sevko about 9 years

A simple list comprehension with a named function is a nice use-case for map(): map(float, sys.stdin)
Andrea over 7 years

Please mark this as the best answer. It also works if you want to sum the first value in each row, inside a TSV (tab-separated value) file.
Fortran about 7 years

How fix Can't locate PDL.pm in @INC (you may need to install the PDL module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 ?)) for fun of course=)
Fortran about 7 years

Best answer! Best speed)
Joel Berger about 7 years

You have to install PDL first, it isn't a Perl native module.
rafi wiener almost 7 years

time cat random_numbers|paste -sd+|bc -l real 0m0.317s user 0m0.310s sys 0m0.013s
glenn jackman almost 7 years

that should be just about identical to the tr solution.
Steven the Easily Amused over 6 years

This does not appear to be a standard component as I do not see it in my Ubuntu installation. Would like to see it benchmarked, though.
nisetama about 5 years

Another option (when input is from STDIN) is ruby -e'p readlines.map(&:to_f).reduce(:+)'.
Simo A. about 5 years

seq 100000 | paste -sd+ - | bc -l on Mac OS X Bash shell. And this is by far the sweetest and the unixest solution!
Ed Morton about 5 years

Your awk script should execute a bit faster if you use $0 instead of $1 since awk does field splitting (which obviously takes time) if any field is specifically mentioned in the script but doesn't otherwise.
David about 5 years

And it's probably one of the slowest solutions and therefore not so suitable for large amounts of numbers.
Tom Kelly almost 5 years

I'm a fan of R for other applications but it's not good for performance in this way. File I/O is a major issue. I've tested passing args to a script which can be sped up using the vroom package. I'll post more details when I've benchmarked some other scripts on the same server.
John over 4 years

It's an awesome tool for quick tasks like that, nearly forgot about it. thanks
Connor about 4 years

@SimoA. I vote that we use the term unixiest in place of unixest because to the sexiest solution is always the unixiest ;)
Peter K about 4 years

What is "64"? "10" I suppose is base?
dwurf about 4 years

Yes, 10 is the base. 64 is the number of bits, if the resulting int can't be represented with that many bits then an error is returned. See golang.org/pkg/strconv/#ParseInt
user12719 about 4 years

Converting to float seems to be about twice as fast on my system (320 vs 640 ms). time python -c "print(sum([float(s) for s in open('random_numbers','r')]))"
drumfire almost 4 years

I like this, but could you explain the curly brackets? It's weird to see } without { and vice versa.
edibleEnergy almost 4 years

@drumfire see @brian d foy's answer above with perl -MO=Deparse to see how perl parses the program. or the docs for perlrun: perldoc.perl.org/perlrun.html (search for -n). perl wraps your code with { } if you use -n so it becomes a complete program.
Bruno Unna about 3 years

This is by far the same solution: simple, elegant, efficient.
Lo-Tan over 2 years

Wonderful solution. I had a tab delimited file where I wanted to sum column 6. Did that with the following command: awk '{ print $6 }' myfile.log | jq -s add
Dut A. over 2 years

for the rest of us who can't easily, how about you indicate which language this is in? PHP? Perl?
FlipMcF about 2 years

If you have big numbers: awk 'BEGIN {OFMT = "%.0f"} { sum += $1 } END { print sum }' filename
CYNTHIA Blessing about 2 years

@EthanFurman I actually have a tab delimited file as you explained but not able to make -F '\t' do the magic. Where exactly is the option meant to be inserted? I have it like this awk -F '\t' '{ sum += $0 } END { print sum }' file
Ethan Furman about 2 years

@CYNTHIABlessing: Please ask that as a new question. Thanks!