Fixed-width Floating-Point Numbers in C/C++

11,623

Solution 1

According to the current C99 draft standard, annex F, that should be double. Of course, this is assuming your compilers meet that part of the standard.

For C++, I've checked the 0x draft and a draft for the 1998 version of the standard, but neither seem to specify anything about representation like that part of the C99 standard, beyond a bool in numeric_limits that specifies that IEEE 754/IEC 559 is used on that platform, like Josh Kelley mentions.

Very few platforms do not support IEEE 754, though - it generally does not pay off to design another floating-point format since IEEE 754 is well-defined and works quite nicely - and if that is supported, then it is a reasonable assumption that double is indeed 64 bits (IEEE 754-1985 calls that format double-precision, after all, so it makes sense).

On the off chance that double isn't double-precision, build in a sanity check so users can report it and you can handle that platform separately. If the platform doesn't support IEEE 754, you're not going to get that representation anyway unless you implement it yourself.

Solution 2

While I don't know of a type that guarantees a particular size and format, you do have a few options in C++. You can use the <limits> header and its std::numeric_limits class template to find out the size of a given type, std::numeric_limits::digits tells you the number of bits in the mantissa, and std::numeric_limits::is_iec559 should tell you whether the type follows the IEEE format. (For sample code that manipulates IEEE numbers at the bit level, see the FloatingPoint class template in Google Test's gtest-internal.h.)

Solution 3

The other issue is representation of floating point numbers. This is usually based on the hardware on which you are running (but not always). Most system are using IEEE 754 Float point standards, but other can have their own standards as well (an example would be a VAX computer).

Wikipedia explaination of IEEE 754 http://en.wikipedia.org/wiki/IEEE_754-2008

Solution 4

There's no variation in float/double that I'm aware of. Float has has been 32 bits for ages and double has been 64. Floating point semantics are pretty complicated, but there do exist constants in

#include <limits>

boost.numeric.bounds is a simpler interface if you don't need everything in std::numeric_limits

Share:
11,623
Imagist
Author by

Imagist

I am a computer science student and professional programmer. My passion is programming languages; learning them, designing them, and implementing them. I am also an amateur cook, martial artist, mathematician, musician, scientist, and writer.

Updated on June 17, 2022

Comments

  • Imagist
    Imagist almost 2 years

    int is usually 32 bits, but in the standard, int is not guaranteed to have a constant width. So if we want a 32 bit int we include stdint.h and use int32_t.

    Is there an equivalent for this for floats? I realize it's a bit more complicated with floats since they aren't stored in a homogeneous fashion, i.e. sign, exponent, significand. I just want a double that is guaranteed to be stored in 64 bits with 1 sign bit, 10 bit exponent, and 52/53 bit significand (depending on whether you count the hidden bit).