How can I use std::accumulate and a lambda to calculate a mean?

c++ lambda accumulate

27,897

Solution 1

You shouldn't use integer to store the result:

The return type passed to the function accumulate:
T accumulate( InputIt first, InputIt last, T init, BinaryOperation op ); depends on the third parameter type: (T init) so you have to put there: 0.0 to get result as double.

#include <vector>
#include <algorithm>
#include <iostream>
#include <numeric>
using namespace std;
std::vector<int> v = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
int main()
{
    auto lambda = [&](double a, double b){return a + b / v.size(); };
    std::cout << std::accumulate(v.begin(), v.end(), 0.0, lambda) << std::endl;
}

Solution 2

This may not round quite as nicely, but it works even when there's no size() method on the container:

auto lambda = [count = 0](double a, int b) mutable { return a + (b-a)/++count; };

This takes advantage of a new C++14 features, initialized captures, to store state within the lambda. (You can do the same thing via capture of an extra local variable, but then its scope is local scope, rather than lifetime of the lambda.) For older C++ versions, you naturally can just put the count in the member variable of a struct and put the lambda body as its operator()() implementation.

To prevent accumulation of rounding error (or at least dramatically reduce it), one can do something like:

auto lambda = [count = 0, error = 0.0](double a, int b) mutable {
   const double desired_change = (b-a-error)/++count;
   const double newa = a + (desired_change + error);
   const double actual_change = newa - a;
   error += desired_change - actual_change;
   return newa;
};

Solution 3

Your running "average" is the first parameter to the lambda, so the following is correct.

lambda = [&](int a, int b){return a + b/v.size();};

27,897

EMBLEM

All original source snippets I post on Stackoverflow.com are dedicated to the public domain. If you do find value in my answers, I would very much appreciate an attribution and acknowledgement where possible.

Updated on March 27, 2021

Comments

EMBLEM over 1 year
I have a standard library container of large numbers, so large that they may cause overflow if I add them together. Let's pretend it's this container:
```
std::vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
```
I want to calculate the mean of this container, using std::accumulate, but I can't add all the numbers together. I'll just calculate it with v[0]/v.size() + v[1]/v.size() + .... So I set:
```
auto lambda = ...;
std::cout << std::accumulate(v.begin(), v.end(), 0, lambda) << std::endl;
```
Here is what I have tried so far, where -> indicates the output:
```
lambda = [&](int a, int b){return (a + b)/v.size();};  ->  1
lambda = [&](int a, int b){return a/v.size() + b/v.size();};  ->  1
lambda = [&](int a, int b){return a/v.size() + b;};  ->  10
```
How can I produce the correct mean such that the output will be 5?
- Ben Voigt over 7 years
  
  5 is not the correct answer.
- EMBLEM over 7 years
  
  @BenVoigt It is if you're using integer division.
- Ben Voigt over 7 years
  
  Integer division is not used in calculation of a mean. Combined with std::accumulate, it's even worse -- it will ruin your partial sums. If you want the final result rounded according to the rules of integer division, you should say that explicitly in your question (and then you are not finding a mean). Otherwise your use of integer division looks like a bug to every reader.
dwcanillas over 7 years

isn't there going to be a problem with integer rounding?
AdamF over 7 years

It's good to use this approximation formula for large data set, because the quality of double might not be enough in the original formula.
Ben Voigt over 7 years

@AdamF: One can keep track of an error term as well, to prevent rounding errors from accumulating.
AdamF over 7 years

Perfect. In the previous comment I also wanted to admit your first formula : ) I used it few times, it's powerful and we even don't need to keep this whole array in memory.
vsoftco over 7 years

@BenVoigt nice answer! You should mention that lambda capture expressions work only in C++14
Ben Voigt over 7 years

@vsoftco: I had been intending to do that, then got interested in the rounding-error. Thanks for reminding me.
Arne Vogel over 5 years

Using an error term is roughly like using twice the precision. Another way to reduce the error is to reduce the depth of the expression tree. accumulate uses a left fold, which is the worst possible case (linear depth)
AdamF almost 5 years

@rpattabi Could you elaborate a bit more how to reproduce the warning without return type ? I'm not able to do it.
Chris_128 almost 2 years

Isn't the second type of your lambda the type of the vector's elements? Meaning auto lambda = [&](double a, int b){//...
AdamF almost 2 years

It can't be changed to int, because then the division is incorrect. If you would like to pass it as int then you should cast it to double later: auto lambda = [&](double a, int b) {return a + (double)b / v.size(); };
Chris_128 almost 2 years

Good point. Please consider doing this (maybe with a static_cast<double>(b) in your answer. I think that is better than the current answer because it clearly shows which of the lambda's arguments comes from the accumulation's carry-over and which one is the vector's element. Additionally, that explicitly shows that a cast is going on and doesn't cast that implicitly in the lambda's argument.