sum of double numbers in c++

11,840

Solution 1

Rational numbers are infinitely precise. Computers are finite.
Precision loss is a well known problem in computer programming.
The real question is, how can you remedy it?

Consider using an approximation function when comparing floats for equality.

#include <iostream>
#include <cmath>
#include <limits>
using namespace std;

template <typename T>
bool ApproximatelyEqual(const T dX, const T dY)
{
    return std::abs(dX - dY) <= std::max(std::abs(dX), std::abs(dY))
    * std::numeric_limits<T>::epsilon();
}

int main() {
    double a=0.0132;
    double b=0.9581;
    double c=0.0287;

    //Evaluates to true and does not print error.
    if (!ApproximatelyEqual(a+b+c,1.0)) cout << "error" << endl;
}

Solution 2

Floating point numbers in C++ have a binary representation. This means that most numbers that can exactly represented by a decimal fraction with only a few digits cannot be exactly represented by floating point numbers. That's where your error comes from.

One example: 0.1 (decimal) is a periodic fraction in binary:

0.000110011001100110011001100...

Therefore it cannot be exactly be represented with any number of bits with binary encoding.

In order to avoid this type of error, you can use BCD (binary coded decimal) numbers which are supported by some special libraries. The drawbacks are slower calculation speed (not directly supported by the CPU) and slightly higher memory usage.

ANother option is to represent the number by a general fraction and store numerator and denomiator as separate integers.

Share:
11,840
g1368
Author by

g1368

Updated on June 04, 2022

Comments

  • g1368
    g1368 almost 2 years

    I want to calculate the sum of three double numbers and I expect to get 1.

    double a=0.0132;
    double b=0.9581;
    double c=0.0287;
    cout << "sum= "<< a+b+c <<endl;
    if (a+b+c != 1)
    cout << "error" << endl;
    

    The sum is equal to 1 but I still get the error! I also tried:

    cout<< a+b+c-1
    

    and it gives me -1.11022e-16

    I could fix the problem by changing the code to

    if (a+b+c-1 > 0.00001) cout << "error" << endl;

    and it works (no error). How can a negative number be greater than a positive number and why the numbers don't add up to 1? Maybe it is something basic with summation and under/overflow but I really appreciate your help. Thanks

    • PaulMcKenzie
      PaulMcKenzie almost 8 years
      I'll let you do the research. What is 0.0132 in binary? 0.9581 in binary? etc.? The answer to that is the reason why you do not get the exact answer. Those numbers cannot be represented exactly in binary, and binary is what the computer is using. See this
    • Dimitri Podborski
      Dimitri Podborski almost 8 years
    • paulsm4
      paulsm4 almost 8 years
      The issue is "floating point precision" (or, in this case, IMprecision ;)): Look here or here.
    • WhozCraig
      WhozCraig almost 8 years
      cout << setprecision(24) << "sum= "<< a+b+c <<endl; - may be interesting for you to try.
    • g1368
      g1368 almost 8 years
      Thank you guys for your quick responses. I read the references and understood what my problem is.
    • Pete Becker
      Pete Becker almost 8 years
      While this problem typically gets labelled "floating point precision", it's not limited to floating point. int i = 1/3; i = 3 * i; std::cout << i << '\n'; will display 0, not 1, and nobody except the newest newbie is surprised by this.. The difference is that programmers learn early on how to deal with limited precision in integer types, but rarely learn it for floating-point types.
  • 463035818_is_not_a_number
    463035818_is_not_a_number almost 8 years
    Rational numbers can be implemented without loss of precision as fractions. For real (ie. including irrational) numbers this isn´t true.
  • stark
    stark almost 8 years
    Another option is to do fixed point (scaled) arithmetic.
  • Trevor Hickey
    Trevor Hickey almost 8 years
    @tobi303 Of course you can get around precision loss with custom data types. Your fraction type though would actually just be two integer types with some special logic. I was referring to the fundamental data types of C++.
  • 463035818_is_not_a_number
    463035818_is_not_a_number almost 8 years
    I just had a bit of trouble with "Rational numbers are infinitely precise" because rational numbers are rather boring, there are only countable many of them (infinite but still countable) while if you want "infinte precision" you need real numbers (that cannot be represented precisely on a machine, not even with a custom data type)
  • Trevor Hickey
    Trevor Hickey almost 8 years
    @tobi303 Oh, I see. Thanks, that makes sense.