Why does the floating-point value of 4*0.1 look nice in Python 3 but 3*0.1 doesn't?

15,347

Solution 1

The simple answer is because 3*0.1 != 0.3 due to quantization (roundoff) error (whereas 4*0.1 == 0.4 because multiplying by a power of two is usually an "exact" operation). Python tries to find the shortest string that would round to the desired value, so it can display 4*0.1 as 0.4 as these are equal, but it cannot display 3*0.1 as 0.3 because these are not equal.

You can use the .hex method in Python to view the internal representation of a number (basically, the exact binary floating point value, rather than the base-10 approximation). This can help to explain what's going on under the hood.

>>> (0.1).hex()
'0x1.999999999999ap-4'
>>> (0.3).hex()
'0x1.3333333333333p-2'
>>> (0.1*3).hex()
'0x1.3333333333334p-2'
>>> (0.4).hex()
'0x1.999999999999ap-2'
>>> (0.1*4).hex()
'0x1.999999999999ap-2'

0.1 is 0x1.999999999999a times 2^-4. The "a" at the end means the digit 10 - in other words, 0.1 in binary floating point is very slightly larger than the "exact" value of 0.1 (because the final 0x0.99 is rounded up to 0x0.a). When you multiply this by 4, a power of two, the exponent shifts up (from 2^-4 to 2^-2) but the number is otherwise unchanged, so 4*0.1 == 0.4.

However, when you multiply by 3, the tiny little difference between 0x0.99 and 0x0.a0 (0x0.07) magnifies into a 0x0.15 error, which shows up as a one-digit error in the last position. This causes 0.1*3 to be very slightly larger than the rounded value of 0.3.

Python 3's float repr is designed to be round-trippable, that is, the value shown should be exactly convertible into the original value (float(repr(f)) == f for all floats f). Therefore, it cannot display 0.3 and 0.1*3 exactly the same way, or the two different numbers would end up the same after round-tripping. Consequently, Python 3's repr engine chooses to display one with a slight apparent error.

Solution 2

repr (and str in Python 3) will put out as many digits as required to make the value unambiguous. In this case the result of the multiplication 3*0.1 isn't the closest value to 0.3 (0x1.3333333333333p-2 in hex), it's actually one LSB higher (0x1.3333333333334p-2) so it needs more digits to distinguish it from 0.3.

On the other hand, the multiplication 4*0.1 does get the closest value to 0.4 (0x1.999999999999ap-2 in hex), so it doesn't need any additional digits.

You can verify this quite easily:

>>> 3*0.1 == 0.3
False
>>> 4*0.1 == 0.4
True

I used hex notation above because it's nice and compact and shows the bit difference between the two values. You can do this yourself using e.g. (3*0.1).hex(). If you'd rather see them in all their decimal glory, here you go:

>>> Decimal(3*0.1)
Decimal('0.3000000000000000444089209850062616169452667236328125')
>>> Decimal(0.3)
Decimal('0.299999999999999988897769753748434595763683319091796875')
>>> Decimal(4*0.1)
Decimal('0.40000000000000002220446049250313080847263336181640625')
>>> Decimal(0.4)
Decimal('0.40000000000000002220446049250313080847263336181640625')

Solution 3

Here's a simplified conclusion from other answers.

If you check a float on Python's command line or print it, it goes through function repr which creates its string representation.

Starting with version 3.2, Python's str and repr use a complex rounding scheme, which prefers nice-looking decimals if possible, but uses more digits where necessary to guarantee bijective (one-to-one) mapping between floats and their string representations.

This scheme guarantees that value of repr(float(s)) looks nice for simple decimals, even if they can't be represented precisely as floats (eg. when s = "0.1").

At the same time it guarantees that float(repr(x)) == x holds for every float x

Solution 4

Not really specific to Python's implementation but should apply to any float to decimal string functions.

A floating point number is essentially a binary number, but in scientific notation with a fixed limit of significant figures.

The inverse of any number that has a prime number factor that is not shared with the base will always result in a recurring dot point representation. For example 1/7 has a prime factor, 7, that is not shared with 10, and therefore has a recurring decimal representation, and the same is true for 1/10 with prime factors 2 and 5, the latter not being shared with 2; this means that 0.1 cannot be exactly represented by a finite number of bits after the dot point.

Since 0.1 has no exact representation, a function that converts the approximation to a decimal point string will usually try to approximate certain values so that they don't get unintuitive results like 0.1000000000004121.

Since the floating point is in scientific notation, any multiplication by a power of the base only affects the exponent part of the number. For example 1.231e+2 * 100 = 1.231e+4 for decimal notation, and likewise, 1.00101010e11 * 100 = 1.00101010e101 in binary notation. If I multiply by a non-power of the base, the significant digits will also be affected. For example 1.2e1 * 3 = 3.6e1

Depending on the algorithm used, it may try to guess common decimals based on the significant figures only. Both 0.1 and 0.4 have the same significant figures in binary, because their floats are essentially truncations of (8/5)(2^-4) and (8/5)(2^-6) respectively. If the algorithm identifies the 8/5 sigfig pattern as the decimal 1.6, then it will work on 0.1, 0.2, 0.4, 0.8, etc. It may also have magic sigfig patterns for other combinations, such as the float 3 divided by float 10 and other magic patterns statistically likely to be formed by division by 10.

In the case of 3*0.1, the last few significant figures will likely be different from dividing a float 3 by float 10, causing the algorithm to fail to recognize the magic number for the 0.3 constant depending on its tolerance for precision loss.

Edit: https://docs.python.org/3.1/tutorial/floatingpoint.html

Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction. For example, the numbers 0.1 and 0.10000000000000001 and 0.1000000000000000055511151231257827021181583404541015625 are all approximated by 3602879701896397 / 2 ** 55. Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant eval(repr(x)) == x.

There is no tolerance for precision loss, if float x (0.3) is not exactly equal to float y (0.1*3), then repr(x) is not exactly equal to repr(y).

Share:
15,347

Related videos on Youtube

Aivar
Author by

Aivar

Teaching assistant in University of Tartu, Department of CS

Updated on September 07, 2020

Comments

  • Aivar
    Aivar over 3 years

    I know that most decimals don't have an exact floating point representation (Is floating point math broken?).

    But I don't see why 4*0.1 is printed nicely as 0.4, but 3*0.1 isn't, when both values actually have ugly decimal representations:

    >>> 3*0.1
    0.30000000000000004
    >>> 4*0.1
    0.4
    >>> from decimal import Decimal
    >>> Decimal(3*0.1)
    Decimal('0.3000000000000000444089209850062616169452667236328125')
    >>> Decimal(4*0.1)
    Decimal('0.40000000000000002220446049250313080847263336181640625')
    
    • Bathsheba
      Bathsheba over 7 years
      @MorganThrapp: no it isn't. The OP is asking about the rather arbitrary-looking formatting choice. Neither 0.3 nor 0.4 can be represented exactly in binary floating point.
    • Morgan Thrapp
      Morgan Thrapp over 7 years
      It's not arbitrary at all, it's showing any significant digits.
    • BartoszKP
      BartoszKP over 7 years
      Obligatory link under every floating point related question: docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
    • Mooing Duck
      Mooing Duck over 7 years
      @BartoszKP: Having read the document several times, it doesn't explain why Python is displaying 0.3000000000000000444089209850062616169452667236328125 as 0.30000000000000004 and 0.40000000000000002220446049250313080847263336181640625 as .4 even though they appear to have the same accuracy, and thus doesn't answer the question.
    • Random832
      Random832 over 7 years
      See also stackoverflow.com/questions/28935257/… - I'm somewhat irritated that it got closed as a duplicate but this one hasn't.
    • Bakuriu
      Bakuriu over 7 years
      @Gilles No this is not a duplicate of that question. This is a question about string representation of floating points in python.
    • coteyr
      coteyr over 7 years
      Good ole 2 + 2 = 5 for extremely large values of 2
    • ShadowRanger
      ShadowRanger over 7 years
      The What's new in Python 3.1 docs (scroll to end of linked section, just before "New, Improved and Deprecated Modules") are a useful explanation for why/when Python 2.7/3.1+ have much shorter float reprs for some values. Straight from the horse's mouth, so to speak.
    • Antti Haapala -- Слава Україні
      Antti Haapala -- Слава Україні over 7 years
      Reopened, please do not close this as a duplicate of "is floating point math broken".
  • NPE
    NPE over 7 years
    This is an amazingly comprehensive answer, thank you. (In particular, thanks for showing .hex(); I didn't know it existed.)
  • Mark Ransom
    Mark Ransom over 7 years
    @NPE then you might be interested in float.fromhex() too, it does the reverse.
  • supercat
    supercat over 7 years
    I wonder if it would be worth noting the precise decimal values of the nearest "doubles" to 0.1, 0.3, and 0.4, since a lot of people can't read floating-point hex.
  • Mark Ransom
    Mark Ransom over 7 years
    @supercat you make a good point. Putting those super large doubles into the text would be distracting, but I thought of a way to add them.
  • supercat
    supercat over 7 years
    Out of curiosity, does Python always try to use the shortest string that is within 0.50 ulp of the given value, or does it use the shortest string that is within e.g. 0.47 ulp of the given value? Some floating-point libraries, if given a decimal string which almost exactly halfway between two values that are representable as "double", may not always return the value which is closer to the exact value represented by the string, but printing one more decimal digit would solve that problem.
  • nneonneo
    nneonneo over 7 years
    @supercat: Python tries to find the shortest string that would round to the desired value, whatever that happens to be. Obviously the evaluated value must be within 0.5ulp (or it would round to something else), but it may require more digits in ambiguous cases. The code is very gnarly, but if you want to take a peek: hg.python.org/cpython/file/03f2c8fc24ea/Python/dtoa.c#l2345
  • Aivar
    Aivar over 7 years
    Can we then say that Python's repr uses selective rounding (meaning it doesn't use same simple rounding rule for all floats)?
  • Mark Dickinson
    Mark Dickinson over 7 years
    Your answer is accurate for Python versions >= 3.2, where str and repr are identical for floats. For Python 2.7, repr has the properties you identify, but str is much simpler - it simply computes 12 significant digits and produces an output string based on those. For Python <= 2.6, both repr and str are based on a fixed number of significant digits (17 for repr, 12 for str). (And nobody cares about Python 3.0 or Python 3.1 :-)
  • Aivar
    Aivar over 7 years
    Thanks @MarkDickinson! I included your comment in the answer.
  • Bakuriu
    Bakuriu over 7 years
    @supercat This has changed in python3.1, see the issue with the patch. In any case: the default representation is designed to produce the more readable result that completely preserves the value of the float. This means that eval(repr(f)) == f for all floats f (and eval(s) does the same as float(s)). However float('0.100000000000000012') == 0.1 even though it is actually closer to 0.10000000000000002 (which is the next representable double).
  • Mark Dickinson
    Mark Dickinson over 7 years
    @Bakuriu: I'm not sure what you're saying. The float constructor always does correct rounding. The nearest representable float to 0.100000000000000012 is 0.1000000000000000055511151231257827021181583404541015625, which Python displays as 0.1.
  • Mark Dickinson
    Mark Dickinson over 7 years
    @supercat: Always the shortest string that's within 0.5 ulp. (Strictly within if we're looking at a float with odd LSB; i.e., the shortest string that makes it work with round-ties-to-even). Any exceptions to this are a bug, and should be reported.
  • Bergi
    Bergi over 7 years
    What does the p stand for in that hex representation? And are these actually valid number literals (like hex integers), or are they only a custom formatting?
  • Mark Ransom
    Mark Ransom over 7 years
    @Bergi the p takes the place of the e in scientific notation, but I don't know the rationale for choosing a different letter. They are not valid literals, you need to use the float.fromhex() function with a string as I mentioned earlier.
  • Bergi
    Bergi over 7 years
    @MarkRansom Surely they did use something else than e because that's already a hex digit. Maybe p for power instead of exponent.
  • Mark Dickinson
    Mark Dickinson over 7 years
    @Bergi: The use of p in this context goes back (at least) to C99, and also appears in IEEE 754 and in various other languages (including Java). When float.hex and float.fromhex were implemented (by me :-), Python was merely copying what was by then established practice. I don't know whether the intention was 'p' for "Power", but it seems like a nice way to think about it.
  • Antti Haapala -- Слава Україні
    Antti Haapala -- Слава Україні over 7 years
    Note that the rounding from shell comes from repr thus the Python 2.7 behaviour would be identical...
  • Antti Haapala -- Слава Україні
    Antti Haapala -- Слава Україні over 7 years
    This does not really add much to the existing answers.
  • Mark Dickinson
    Mark Dickinson over 7 years
    "Depending on the algorithm used, it may try to guess common decimals based on the significant figures only." <- This seems like pure speculation. Other answers have described what Python actually does.
  • Aleksandr Dubinsky
    Aleksandr Dubinsky over 7 years
    @nneonneo "Python tries to find the shortest string that would round to the desired value." That should be the first line of your answer.