Is floating-point addition and multiplication associative?

14,976

Solution 1

Floating point addition is not necessarily associative. If you change the order in which you add things up, this can change the result.

The standard paper on the subject is What Every Computer Scientist Should Know about Floating Point Arithmetic. It gives the following example:

Another grey area concerns the interpretation of parentheses. Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers. For example, the expression (x+y)+z has a totally different answer than x+(y+z) when x = 1e30, y = -1e30 and z = 1 (it is 1 in the former case, 0 in the latter).

Solution 2

What is likely, with currently popular machines and software, is:

The compiler encoded .7 as 0x1.6666666666666p-1 (this is the hexadecimal numeral 1.6666666666666 multiplied by 2 to the power of -1), .2 as 0x1.999999999999ap-3, and .1 as 0x1.999999999999ap-4. Each of these is the number representable in floating-point that is closest to the decimal numeral you wrote.

Observe that each of these hexadecimal floating-point constants has exactly 53 bits in its significand (the "fraction" part, often inaccurately called the mantissa). The hexadecimal numeral for the significand has a "1" and thirteen more hexadecimal digits (four bits each, 52 total, 53 including the "1"), which is what the IEEE-754 standard provides for, for 64-bit binary floating-point numbers.

Let's add the numbers for .7 and .2: 0x1.6666666666666p-1 and 0x1.999999999999ap-3. First, scale the exponent of the second number to match the first. To do this, we will multiply the exponent by 4 (changing "p-3" to "p-1") and multiply the significand by 1/4, giving 0x0.66666666666668p-1. Then add 0x1.6666666666666p-1 and 0x0.66666666666668p-1, giving 0x1.ccccccccccccc8p-1. Note that this number has more than 53 bits in the significand: The "8" is the 14th digit after the period. Floating-point cannot return a result with this many bits, so it has to be rounded to the nearest representable number. In this case, there are two numbers that are equally near, 0x1.cccccccccccccp-1 and 0x1.ccccccccccccdp-1. When there is a tie, the number with a zero in the lowest bit of the significand is used. "c" is even and "d" is odd, so "c" is used. The final result of the addition is 0x1.cccccccccccccp-1.

Next, add the number for .1 (0x1.999999999999ap-4) to that. Again, we scale to make the exponents match, so 0x1.999999999999ap-4 becomes 0x.33333333333334p-1. Then add that to 0x1.cccccccccccccp-1, giving 0x1.fffffffffffff4p-1. Rounding that to 53 bits gives 0x1.fffffffffffffp-1, and that is the final result of .7+.2+.1.

Now consider .7+.1+.2. For .7+.1, add 0x1.6666666666666p-1 and 0x1.999999999999ap-4. Recall the latter is scaled to 0x.33333333333334p-1. Then the exact sum is 0x1.99999999999994p-1. Rounding that to 53 bits gives 0x1.9999999999999p-1.

Then add the number for .2 (0x1.999999999999ap-3), which is scaled to 0x0.66666666666668p-1. The exact sum is 0x2.00000000000008p-1. Floating-point significands are always scaled to start with 1 (except for special cases: zero, infinity, and very small numbers at the bottom of the representable range), so we adjust this to 0x1.00000000000004p0. Finally, we round to 53 bits, giving 0x1.0000000000000p0.

Thus, because of errors that occur when rounding, .7+.2+.1 returns 0x1.fffffffffffffp-1 (very slightly less than 1), and .7+.1+.2 returns 0x1.0000000000000p0 (exactly 1).

Solution 3

Floating point multiplication is not associative in C or C++.

Proof:

#include<stdio.h>
#include<time.h>
#include<stdlib.h>
using namespace std;
int main() {
    int counter = 0;
    srand(time(NULL));
    while(counter++ < 10){
        float a = rand() / 100000;
        float b = rand() / 100000;
        float c = rand() / 100000;

        if (a*(b*c) != (a*b)*c){
            printf("Not equal\n");
        }
    }
    printf("DONE");
    return 0;
}

In this program, about 30% of the time, (a*b)*c is not equal to a*(b*c).

Solution 4

Neither addition nor multiplication is associative with IEEE 743 double precision (64-bit) numbers. Here are examples for each (evaluated with Python 3.9.7):

>>> (.1 + .2) + .3
0.6000000000000001
>>> .1 + (.2 + .3)
0.6
>>> (.1 * .2) * .3
0.006000000000000001
>>> .1 * (.2 * .3)
0.006
Share:
14,976

Related videos on Youtube

Karen Tsirunyan
Author by

Karen Tsirunyan

Updated on June 17, 2022

Comments

  • Karen Tsirunyan
    Karen Tsirunyan almost 2 years

    I had a problem when I was adding three floating point values and comparing them to 1.

    cout << ((0.7 + 0.2 + 0.1)==1)<<endl;     //output is 0
    cout << ((0.7 + 0.1 + 0.2)==1)<<endl;     //output is 1
    

    Why would these values come out different?

    • M.M
      M.M almost 10 years
      Your example code differs in commutativity , not associativity. A version demonstrating associativity would be (0.7 + (0.1 + 0.2))
    • tmyklebu
      tmyklebu almost 10 years
      @MattMcNabb: + is a binary operation. With floating-point operands, it is commutative but not associative. Thus, if you have two expressions that produce different results, you cannot form one from the other by applying only commutativity.
    • M.M
      M.M almost 10 years
      @tmyklebu OK, so this does check associativity if and only if it is known that commutativity holds. (The C++ standard does not appear to guarantee commutativity).
  • M.M
    M.M almost 10 years
    or 0% of the time if RAND_MAX < 100000 !
  • Eric Postpischil
    Eric Postpischil over 5 years
    @Evg: The numbers I have in text are mathematical numbers and are not source code or floating-point numbers. They should not be put into source code style. Various things I have written in quotes, such as “.7+.1+.2”, do represent computations made with floating-point arithmetic and could be shown in source code format rather than quotations, so I would consider approving that as an edit.
  • Eric Postpischil
    Eric Postpischil over 5 years
    @Evg: It would not be a bad idea to put some of the “computed” numbers, as opposed to the exact mathematical numbers, in source code style, so I will look at that.