How can I quickly sum all numbers in a file?
Solution 1
For a Perl one-liner, it's basically the same thing as the awk
solution in Ayman Hourieh's answer:
% perl -nle '$sum += $_ } END { print $sum'
If you're curious what Perl one-liners do, you can deparse them:
% perl -MO=Deparse -nle '$sum += $_ } END { print $sum'
The result is a more verbose version of the program, in a form that no one would ever write on their own:
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
$sum += $_;
}
sub END {
print $sum;
}
-e syntax OK
Just for giggles, I tried this with a file containing 1,000,000 numbers (in the range 0 - 9,999). On my Mac Pro, it returns virtually instantaneously. That's too bad, because I was hoping using mmap
would be really fast, but it's just the same time:
use 5.010;
use File::Map qw(map_file);
map_file my $map, $ARGV[0];
$sum += $1 while $map =~ m/(\d+)/g;
say $sum;
Solution 2
You can use awk:
awk '{ sum += $1 } END { print sum }' file
Solution 3
None of the solution thus far use paste
. Here's one:
paste -sd+ filename | bc
As an example, calculate Σn where 1<=n<=100000:
$ seq 100000 | paste -sd+ | bc -l
5000050000
(For the curious, seq n
would print a sequence of numbers from 1
to n
given a positive number n
.)
Solution 4
Just for fun, let's benchmark it:
$ for ((i=0; i<1000000; i++)) ; do echo $RANDOM; done > random_numbers
$ time perl -nle '$sum += $_ } END { print $sum' random_numbers
16379866392
real 0m0.226s
user 0m0.219s
sys 0m0.002s
$ time awk '{ sum += $1 } END { print sum }' random_numbers
16379866392
real 0m0.311s
user 0m0.304s
sys 0m0.005s
$ time { { tr "\n" + < random_numbers ; echo 0; } | bc; }
16379866392
real 0m0.445s
user 0m0.438s
sys 0m0.024s
$ time { s=0;while read l; do s=$((s+$l));done<random_numbers;echo $s; }
16379866392
real 0m9.309s
user 0m8.404s
sys 0m0.887s
$ time { s=0;while read l; do ((s+=l));done<random_numbers;echo $s; }
16379866392
real 0m7.191s
user 0m6.402s
sys 0m0.776s
$ time { sed ':a;N;s/\n/+/;ta' random_numbers|bc; }
^C
real 4m53.413s
user 4m52.584s
sys 0m0.052s
I aborted the sed run after 5 minutes
I've been diving to lua, and it is speedy:
$ time lua -e 'sum=0; for line in io.lines() do sum=sum+line end; print(sum)' < random_numbers
16388542582.0
real 0m0.362s
user 0m0.313s
sys 0m0.063s
and while I'm updating this, ruby:
$ time ruby -e 'sum = 0; File.foreach(ARGV.shift) {|line| sum+=line.to_i}; puts sum' random_numbers
16388542582
real 0m0.378s
user 0m0.297s
sys 0m0.078s
Heed Ed Morton's advice: using $1
$ time awk '{ sum += $1 } END { print sum }' random_numbers
16388542582
real 0m0.421s
user 0m0.359s
sys 0m0.063s
vs using $0
$ time awk '{ sum += $0 } END { print sum }' random_numbers
16388542582
real 0m0.302s
user 0m0.234s
sys 0m0.063s
Solution 5
Another option is to use jq
:
$ seq 10|jq -s add
55
-s
(--slurp
) reads the input lines into an array.
Related videos on Youtube
Mark Roberts
Updated on April 03, 2021Comments
-
Mark Roberts about 3 years
I have a file which contains several thousand numbers, each on it's own line:
34 42 11 6 2 99 ...
I'm looking to write a script which will print the sum of all numbers in the file. I've got a solution, but it's not very efficient. (It takes several minutes to run.) I'm looking for a more efficient solution. Any suggestions?
-
brian d foy about 14 yearsWhat was your slow solution? Maybe we can help you figure out what was slow about it. :)
-
Mark Roberts about 14 years@brian d foy, I'm too embarrassed to post it. I know why it's slow. It's because I call "cat filename | head -n 1" to get the top number, add it to a running total, and call "cat filename | tail..." to remove the top line for the next iteration... I have a lot to learn about programming!!!
-
dmckee --- ex-moderator kitten about 14 yearsThat's...very systematic. Very clear and straight forward, and I love it for all that it is a horrible abomination. Built, I assume, out of the tools that you knew when you started, right?
-
codeholic about 14 yearsfull duplicate: stackoverflow.com/questions/450799/…
-
David W. over 10 years@MarkRoberts It must have taken you a long while to work that out. It's a very cleaver problem solving technique, and oh so wrong. It looks like a classic case of over think. Several of Glen Jackman's solutions shell scripting solutions (and two are pure shell that don't use things like
awk
andbc
). These all finished adding a million numbers up in less than 10 seconds. Take a look at those and see how it can be done in pure shell. -
Fortran about 7 years@ Mark Roberts 1place, stackoverflow.com/a/18380369/4592448 )))
-
FlipMcF about 2 yearsSuch an awesome question and great journey in all the answers.
-
-
dmckee --- ex-moderator kitten about 14 yearsVery readable. For perl. But yeah, it's going to have to be something like that...
-
paxdiablo about 14 yearsWow, that shows a deep understanding on what code -nle actually wraps around the string you give it. My initial thought was that you shouldn't post while intoxicated but then I noticed who you were and remembered some of your other Perl answers :-)
-
brian d foy about 14 years
$_
is the default variable. The line input operator,<>
, puts it's result in there by default when you use<>
inwhile
. -
brian d foy about 14 years-n and -p just put characters around the argument to -e, so you can use those characters for whatever you want. We have a lot of one-liners that do interesting things with that in Effective Perl Programming (which is about to hit the shelves).
-
daotoad about 14 years@Mark,
$_
is the topic variable--it works like the 'it'. In this case<>
assigns each line to it. It gets used in a number of places to reduce code clutter and help with writing one-liners. The script says "Set the sum to 0, read each line and add it to the sum, then print the sum." -
daotoad about 14 years@Stefan, with warnings and strictures off, you can skip declaring and initializing
$sum
. Since this is so simple, you can even use a statement modifierwhile
:$sum += $_ while <>; print $sum;
-
SourceSeeker about 14 yearsIt doesn't work.
bc
issues a syntax error because of the trailing "+" and lack of newline at the end. This will work and it eliminates a useless use ofcat
:{ tr "\n" "+" | sed 's/+$/\n/'| bc; } < numbers2.txt
or<numbers2.txt tr "\n" "+" | sed 's/+$/\n/'| bc
-
ghostdog74 about 14 years
tr "\n" "+" <file | sed 's/+$/\n/' | bc
-
Frank about 14 yearsNice, what are these non-matching curly braces about?
-
jrockway about 14 years-n adds the
while { }
loop around your program. If you put} ... {
inside, then you havewhile { } ... { }
. Evil? Slightly. -
conny over 12 yearsBig bonus for highlighting the
-MO=Deparse
option! Even though on a separate topic. -
leef over 11 yearsprogram exceeded: maximum number of field sizes: 32767
-
David W. over 10 years+1: For coming up with a bunch of solutions, and benchmarking them.
-
nh2 over 10 yearsThe cumulative version of your one-liner (a rolling sum, printing the current sum for each line):
perl -nle '$sum += $_; print $sum} END {'
-
Ethan Furman almost 10 yearsWith the
-F '\t'
option if your fields contain spaces and are separated by tabs. -
Brendan Maguire almost 10 yearsVery nice! And easy to remember
-
sevko about 9 yearsA simple list comprehension with a named function is a nice use-case for
map()
:map(float, sys.stdin)
-
Andrea over 7 yearsPlease mark this as the best answer. It also works if you want to sum the first value in each row, inside a TSV (tab-separated value) file.
-
Fortran about 7 yearsHow fix Can't locate PDL.pm in @INC (you may need to install the PDL module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 ?)) for fun of course=)
-
Fortran about 7 yearsBest answer! Best speed)
-
Joel Berger about 7 yearsYou have to install PDL first, it isn't a Perl native module.
-
rafi wiener almost 7 yearstime cat random_numbers|paste -sd+|bc -l real 0m0.317s user 0m0.310s sys 0m0.013s
-
glenn jackman almost 7 yearsthat should be just about identical to the
tr
solution. -
Steven the Easily Amused over 6 yearsThis does not appear to be a standard component as I do not see it in my Ubuntu installation. Would like to see it benchmarked, though.
-
nisetama about 5 yearsAnother option (when input is from STDIN) is
ruby -e'p readlines.map(&:to_f).reduce(:+)'
. -
Simo A. about 5 years
seq 100000 | paste -sd+ - | bc -l
on Mac OS X Bash shell. And this is by far the sweetest and the unixest solution! -
Ed Morton about 5 yearsYour awk script should execute a bit faster if you use
$0
instead of$1
since awk does field splitting (which obviously takes time) if any field is specifically mentioned in the script but doesn't otherwise. -
David about 5 yearsAnd it's probably one of the slowest solutions and therefore not so suitable for large amounts of numbers.
-
Tom Kelly almost 5 yearsI'm a fan of R for other applications but it's not good for performance in this way. File I/O is a major issue. I've tested passing args to a script which can be sped up using the vroom package. I'll post more details when I've benchmarked some other scripts on the same server.
-
John over 4 yearsIt's an awesome tool for quick tasks like that, nearly forgot about it. thanks
-
Connor about 4 years@SimoA. I vote that we use the term unixiest in place of unixest because to the sexiest solution is always the unixiest ;)
-
Peter K about 4 yearsWhat is "64"? "10" I suppose is base?
-
dwurf about 4 yearsYes, 10 is the base. 64 is the number of bits, if the resulting int can't be represented with that many bits then an error is returned. See golang.org/pkg/strconv/#ParseInt
-
user12719 about 4 yearsConverting to float seems to be about twice as fast on my system (320 vs 640 ms).
time python -c "print(sum([float(s) for s in open('random_numbers','r')]))"
-
drumfire almost 4 yearsI like this, but could you explain the curly brackets? It's weird to see } without { and vice versa.
-
edibleEnergy almost 4 years@drumfire see @brian d foy's answer above with
perl -MO=Deparse
to see how perl parses the program. or the docs for perlrun: perldoc.perl.org/perlrun.html (search for -n). perl wraps your code with { } if you use -n so it becomes a complete program. -
Bruno Unna about 3 yearsThis is by far the same solution: simple, elegant, efficient.
-
Lo-Tan over 2 yearsWonderful solution. I had a tab delimited file where I wanted to sum column 6. Did that with the following command:
awk '{ print $6 }' myfile.log | jq -s add
-
Dut A. over 2 yearsfor the rest of us who can't easily, how about you indicate which language this is in?
PHP
?Perl
? -
FlipMcF about 2 yearsIf you have big numbers:
awk 'BEGIN {OFMT = "%.0f"} { sum += $1 } END { print sum }' filename
-
CYNTHIA Blessing about 2 years@EthanFurman I actually have a tab delimited file as you explained but not able to make -F '\t' do the magic. Where exactly is the option meant to be inserted? I have it like this awk -F '\t' '{ sum += $0 } END { print sum }' file
-
Ethan Furman about 2 years@CYNTHIABlessing: Please ask that as a new question. Thanks!