How can I quickly sum all numbers in a file?

185,778

Solution 1

For a Perl one-liner, it's basically the same thing as the awk solution in Ayman Hourieh's answer:

 % perl -nle '$sum += $_ } END { print $sum'

If you're curious what Perl one-liners do, you can deparse them:

 %  perl -MO=Deparse -nle '$sum += $_ } END { print $sum'

The result is a more verbose version of the program, in a form that no one would ever write on their own:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    $sum += $_;
}
sub END {
    print $sum;
}
-e syntax OK

Just for giggles, I tried this with a file containing 1,000,000 numbers (in the range 0 - 9,999). On my Mac Pro, it returns virtually instantaneously. That's too bad, because I was hoping using mmap would be really fast, but it's just the same time:

use 5.010;
use File::Map qw(map_file);

map_file my $map, $ARGV[0];

$sum += $1 while $map =~ m/(\d+)/g;

say $sum;

Solution 2

You can use awk:

awk '{ sum += $1 } END { print sum }' file

Solution 3

None of the solution thus far use paste. Here's one:

paste -sd+ filename | bc

As an example, calculate Σn where 1<=n<=100000:

$ seq 100000 | paste -sd+ | bc -l
5000050000

(For the curious, seq n would print a sequence of numbers from 1 to n given a positive number n.)

Solution 4

Just for fun, let's benchmark it:

$ for ((i=0; i<1000000; i++)) ; do echo $RANDOM; done > random_numbers

$ time perl -nle '$sum += $_ } END { print $sum' random_numbers
16379866392

real    0m0.226s
user    0m0.219s
sys     0m0.002s

$ time awk '{ sum += $1 } END { print sum }' random_numbers
16379866392

real    0m0.311s
user    0m0.304s
sys     0m0.005s

$ time { { tr "\n" + < random_numbers ; echo 0; } | bc; }
16379866392

real    0m0.445s
user    0m0.438s
sys     0m0.024s

$ time { s=0;while read l; do s=$((s+$l));done<random_numbers;echo $s; }
16379866392

real    0m9.309s
user    0m8.404s
sys     0m0.887s

$ time { s=0;while read l; do ((s+=l));done<random_numbers;echo $s; }
16379866392

real    0m7.191s
user    0m6.402s
sys     0m0.776s

$ time { sed ':a;N;s/\n/+/;ta' random_numbers|bc; }
^C

real    4m53.413s
user    4m52.584s
sys 0m0.052s

I aborted the sed run after 5 minutes


I've been diving to , and it is speedy:

$ time lua -e 'sum=0; for line in io.lines() do sum=sum+line end; print(sum)' < random_numbers
16388542582.0

real    0m0.362s
user    0m0.313s
sys     0m0.063s

and while I'm updating this, ruby:

$ time ruby -e 'sum = 0; File.foreach(ARGV.shift) {|line| sum+=line.to_i}; puts sum' random_numbers
16388542582

real    0m0.378s
user    0m0.297s
sys     0m0.078s

Heed Ed Morton's advice: using $1

$ time awk '{ sum += $1 } END { print sum }' random_numbers
16388542582

real    0m0.421s
user    0m0.359s
sys     0m0.063s

vs using $0

$ time awk '{ sum += $0 } END { print sum }' random_numbers
16388542582

real    0m0.302s
user    0m0.234s
sys     0m0.063s

Solution 5

Another option is to use jq:

$ seq 10|jq -s add
55

-s (--slurp) reads the input lines into an array.

Share:
185,778

Related videos on Youtube

Mark Roberts
Author by

Mark Roberts

Updated on April 03, 2021

Comments

  • Mark Roberts
    Mark Roberts about 3 years

    I have a file which contains several thousand numbers, each on it's own line:

    34
    42
    11
    6
    2
    99
    ...
    

    I'm looking to write a script which will print the sum of all numbers in the file. I've got a solution, but it's not very efficient. (It takes several minutes to run.) I'm looking for a more efficient solution. Any suggestions?

    • brian d foy
      brian d foy about 14 years
      What was your slow solution? Maybe we can help you figure out what was slow about it. :)
    • Mark Roberts
      Mark Roberts about 14 years
      @brian d foy, I'm too embarrassed to post it. I know why it's slow. It's because I call "cat filename | head -n 1" to get the top number, add it to a running total, and call "cat filename | tail..." to remove the top line for the next iteration... I have a lot to learn about programming!!!
    • dmckee --- ex-moderator kitten
      dmckee --- ex-moderator kitten about 14 years
      That's...very systematic. Very clear and straight forward, and I love it for all that it is a horrible abomination. Built, I assume, out of the tools that you knew when you started, right?
    • codeholic
      codeholic about 14 years
    • David W.
      David W. over 10 years
      @MarkRoberts It must have taken you a long while to work that out. It's a very cleaver problem solving technique, and oh so wrong. It looks like a classic case of over think. Several of Glen Jackman's solutions shell scripting solutions (and two are pure shell that don't use things like awk and bc). These all finished adding a million numbers up in less than 10 seconds. Take a look at those and see how it can be done in pure shell.
    • Fortran
      Fortran about 7 years
      @ Mark Roberts 1place, stackoverflow.com/a/18380369/4592448 )))
    • FlipMcF
      FlipMcF about 2 years
      Such an awesome question and great journey in all the answers.
  • dmckee --- ex-moderator kitten
    dmckee --- ex-moderator kitten about 14 years
    Very readable. For perl. But yeah, it's going to have to be something like that...
  • paxdiablo
    paxdiablo about 14 years
    Wow, that shows a deep understanding on what code -nle actually wraps around the string you give it. My initial thought was that you shouldn't post while intoxicated but then I noticed who you were and remembered some of your other Perl answers :-)
  • brian d foy
    brian d foy about 14 years
    $_ is the default variable. The line input operator, <>, puts it's result in there by default when you use <> in while.
  • brian d foy
    brian d foy about 14 years
    -n and -p just put characters around the argument to -e, so you can use those characters for whatever you want. We have a lot of one-liners that do interesting things with that in Effective Perl Programming (which is about to hit the shelves).
  • daotoad
    daotoad about 14 years
    @Mark, $_ is the topic variable--it works like the 'it'. In this case <> assigns each line to it. It gets used in a number of places to reduce code clutter and help with writing one-liners. The script says "Set the sum to 0, read each line and add it to the sum, then print the sum."
  • daotoad
    daotoad about 14 years
    @Stefan, with warnings and strictures off, you can skip declaring and initializing $sum. Since this is so simple, you can even use a statement modifier while: $sum += $_ while <>; print $sum;
  • SourceSeeker
    SourceSeeker about 14 years
    It doesn't work. bc issues a syntax error because of the trailing "+" and lack of newline at the end. This will work and it eliminates a useless use of cat: { tr "\n" "+" | sed 's/+$/\n/'| bc; } < numbers2.txt or <numbers2.txt tr "\n" "+" | sed 's/+$/\n/'| bc
  • ghostdog74
    ghostdog74 about 14 years
    tr "\n" "+" <file | sed 's/+$/\n/' | bc
  • Frank
    Frank about 14 years
    Nice, what are these non-matching curly braces about?
  • jrockway
    jrockway about 14 years
    -n adds the while { } loop around your program. If you put } ... { inside, then you have while { } ... { }. Evil? Slightly.
  • conny
    conny over 12 years
    Big bonus for highlighting the -MO=Deparse option! Even though on a separate topic.
  • leef
    leef over 11 years
    program exceeded: maximum number of field sizes: 32767
  • David W.
    David W. over 10 years
    +1: For coming up with a bunch of solutions, and benchmarking them.
  • nh2
    nh2 over 10 years
    The cumulative version of your one-liner (a rolling sum, printing the current sum for each line): perl -nle '$sum += $_; print $sum} END {'
  • Ethan Furman
    Ethan Furman almost 10 years
    With the -F '\t' option if your fields contain spaces and are separated by tabs.
  • Brendan Maguire
    Brendan Maguire almost 10 years
    Very nice! And easy to remember
  • sevko
    sevko about 9 years
    A simple list comprehension with a named function is a nice use-case for map(): map(float, sys.stdin)
  • Andrea
    Andrea over 7 years
    Please mark this as the best answer. It also works if you want to sum the first value in each row, inside a TSV (tab-separated value) file.
  • Fortran
    Fortran about 7 years
    How fix Can't locate PDL.pm in @INC (you may need to install the PDL module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 ?)) for fun of course=)
  • Fortran
    Fortran about 7 years
    Best answer! Best speed)
  • Joel Berger
    Joel Berger about 7 years
    You have to install PDL first, it isn't a Perl native module.
  • rafi wiener
    rafi wiener almost 7 years
    time cat random_numbers|paste -sd+|bc -l real 0m0.317s user 0m0.310s sys 0m0.013s
  • glenn jackman
    glenn jackman almost 7 years
    that should be just about identical to the tr solution.
  • Steven the Easily Amused
    Steven the Easily Amused over 6 years
    This does not appear to be a standard component as I do not see it in my Ubuntu installation. Would like to see it benchmarked, though.
  • nisetama
    nisetama about 5 years
    Another option (when input is from STDIN) is ruby -e'p readlines.map(&:to_f).reduce(:+)'.
  • Simo A.
    Simo A. about 5 years
    seq 100000 | paste -sd+ - | bc -l on Mac OS X Bash shell. And this is by far the sweetest and the unixest solution!
  • Ed Morton
    Ed Morton about 5 years
    Your awk script should execute a bit faster if you use $0 instead of $1 since awk does field splitting (which obviously takes time) if any field is specifically mentioned in the script but doesn't otherwise.
  • David
    David about 5 years
    And it's probably one of the slowest solutions and therefore not so suitable for large amounts of numbers.
  • Tom Kelly
    Tom Kelly almost 5 years
    I'm a fan of R for other applications but it's not good for performance in this way. File I/O is a major issue. I've tested passing args to a script which can be sped up using the vroom package. I'll post more details when I've benchmarked some other scripts on the same server.
  • John
    John over 4 years
    It's an awesome tool for quick tasks like that, nearly forgot about it. thanks
  • Connor
    Connor about 4 years
    @SimoA. I vote that we use the term unixiest in place of unixest because to the sexiest solution is always the unixiest ;)
  • Peter K
    Peter K about 4 years
    What is "64"? "10" I suppose is base?
  • dwurf
    dwurf about 4 years
    Yes, 10 is the base. 64 is the number of bits, if the resulting int can't be represented with that many bits then an error is returned. See golang.org/pkg/strconv/#ParseInt
  • user12719
    user12719 about 4 years
    Converting to float seems to be about twice as fast on my system (320 vs 640 ms). time python -c "print(sum([float(s) for s in open('random_numbers','r')]))"
  • drumfire
    drumfire almost 4 years
    I like this, but could you explain the curly brackets? It's weird to see } without { and vice versa.
  • edibleEnergy
    edibleEnergy almost 4 years
    @drumfire see @brian d foy's answer above with perl -MO=Deparse to see how perl parses the program. or the docs for perlrun: perldoc.perl.org/perlrun.html (search for -n). perl wraps your code with { } if you use -n so it becomes a complete program.
  • Bruno Unna
    Bruno Unna about 3 years
    This is by far the same solution: simple, elegant, efficient.
  • Lo-Tan
    Lo-Tan over 2 years
    Wonderful solution. I had a tab delimited file where I wanted to sum column 6. Did that with the following command: awk '{ print $6 }' myfile.log | jq -s add
  • Dut A.
    Dut A. over 2 years
    for the rest of us who can't easily, how about you indicate which language this is in? PHP? Perl?
  • FlipMcF
    FlipMcF about 2 years
    If you have big numbers: awk 'BEGIN {OFMT = "%.0f"} { sum += $1 } END { print sum }' filename
  • CYNTHIA Blessing
    CYNTHIA Blessing about 2 years
    @EthanFurman I actually have a tab delimited file as you explained but not able to make -F '\t' do the magic. Where exactly is the option meant to be inserted? I have it like this awk -F '\t' '{ sum += $0 } END { print sum }' file
  • Ethan Furman
    Ethan Furman about 2 years
    @CYNTHIABlessing: Please ask that as a new question. Thanks!