Should I use double or float?

105,102

Solution 1

If you want to know the true answer, you should read What Every Computer Scientist Should Know About Floating-Point Arithmetic.

In short, although double allows for higher precision in its representation, for certain calculations it would produce larger errors. The "right" choice is: use as much precision as you need but not more and choose the right algorithm.

Many compilers do extended floating point math in "non-strict" mode anyway (i.e. use a wider floating point type available in hardware, e.g. 80-bits and 128-bits floating), this should be taken into account as well. In practice, you can hardly see any difference in speed -- they are natives to hardware anyway.

Solution 2

Unless you have some specific reason to do otherwise, use double.

Perhaps surprisingly, it is double and not float that is the "normal" floating-point type in C (and C++). The standard math functions such as sin and log take doubles as arguments, and return doubles. A normal floating-point literal, as when you write 3.14 in your program, has the type double. Not float.

On typical modern computers, doubles can be just as fast as floats, or even faster, so performance is usually not a factor to consider, even for large calculations. (And those would have to be large calculations, or performance shouldn't even enter your mind. My new i7 desktop computer can do six billion multiplications of doubles in one second.)

Solution 3

This question is impossible to answer since there is no context to the question. Here are some things that can affect the choice:

  1. Compiler implementation of floats, doubles and long doubles. The C++ standard states:

    There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double.

    So, all three can be the same size in memory.

  2. Presence of an FPU. Not all CPUs have FPUs and sometimes the floating point types are emulated and sometimes the floating point types are just not supported.

  3. FPU Architecture. The IA32's FPU is 80bit internally - 32 bit and 64 bit floats are expanded to 80bit on load and reduced on store. There's also SIMD which can do four 32bit floats or two 64bit floats in parallel. Use of SIMD is not defined in the standard so it would require a compiler that does more complex analysis to determine if SIMD can be used, or requires the use of special functions (libraries or intrinsics). The upshot of the 80bit internal format is that you can get slightly different results depending on how often the data is saved to RAM (thus, losing precision). For this reason, compilers don't optimise floating point code particularly well.

  4. Memory bandwidth. If a double requires more storage than a float, then it will take longer to read the data. That's the naive answer. On a modern IA32, it all depends on where the data is coming from. If it's in L1 cache, the load is negligible provided the data comes from a single cache line. If it spans more than one cache line there's a small overhead. If it's from L2, it takes a while longer, if it's in RAM then it's longer still and finally, if it's on disk it's a huge time. So the choice of float or double is less important than the way the data is used. If you want to do a small calculation on lots of sequential data, a small data type is preferable. Doing a lot of computation on a small data set would allow you to use bigger data types with any significant effect. If you're accessing the data very randomly, then the choice of data size is unimportant - data is loaded in pages / cache lines. So even if you only want a byte from RAM, you could get 32 bytes transferred (this is very dependant on the architecture of the system). On top of all of this, the CPU/FPU could be super-scalar (aka pipelined). So, even though a load may take several cycles, the CPU/FPU could be busy doing something else (a multiply for instance) that hides the load time to a degree.

  5. The standard does not enforce any particular format for floating point values.

If you have a specification, then that will guide you to the optimal choice. Otherwise, it's down to experience as to what to use.

Solution 4

Double is more precise but is coded on 8 bytes. float is only 4 bytes, so less room and less precision.

You should be very careful if you have double and float in your application. I had a bug due to that in the past. One part of the code was using float while the rest of the code was using double. Copying double to float and then float to double can cause precision error that can have big impact. In my case, it was a chemical factory... hopefully it didn't have dramatic consequences :)

I think that it is because of this kind of bug that the Ariane 6 rocket has exploded a few years ago!!!

Think carefully about the type to be used for a variable

Solution 5

I personnaly go for double all the time until I see some bottlenecks. Then I consider moving to float or optimizing some other part

Share:
105,102

Related videos on Youtube

Khaled Alshaya
Author by

Khaled Alshaya

I am member of the team responsible for the design, implementation and maintenance of the financial systems of the largest energy company in the world! If I could allocate some spare time, I spend it mostly on learning new technologies. C++ was the first programming language I learned and I still love using it for personal projects.

Updated on January 17, 2020

Comments

  • Khaled Alshaya
    Khaled Alshaya over 4 years

    What are the advantages and disadvantages of using one instead of the other in C++?

    • user3015682
      user3015682 over 4 years
      Has anyone tried making an array of floats and an array of doubles and see if indeed there are 4 bytes between members on floats and 8 bytes between members on doubles? It's possible that a 64bit compiler/computer might still reserve 8 bytes per member for floats even though they don't need that much.
  • lavinio
    lavinio almost 15 years
    Yes. With modern CPUs prefetching larger and larger chunks of memory, parallel numerical processing units and pipelined architectures, the speed issue is really not an issue. If you're dealing with huge quantities of numbers, than perhaps the size difference between a 4-byte float and an 8-byte double might make a difference in memory footprint.
  • Greg Rogers
    Greg Rogers almost 15 years
    Well SSE (or any vertor floating point unit) will be able to process twice the number of flops in single precision compared to double precision. If you are doing just x87 (or any scalar) floating point then it probably won't matter.
  • J-16 SDiZ
    J-16 SDiZ almost 15 years
    @Greg Rogers: compilers are not that smart at this moment. Unless you are writing raw assembly, it don't have large different. And yes, this may change as the compiler evolves.
  • J-16 SDiZ
    J-16 SDiZ almost 15 years
    An additional notes: If you have absoluatly no idea what the data look like (or just have no clue at all the maths in the links), just use double -- it is safer in most case.
  • KriptSkitty
    KriptSkitty over 14 years
    On typical modern computers, double is just as fast as float.
  • sleske
    sleske about 14 years
    Note that 4/8 byts for float/double is not even guaranteed, it will depend on the platform. It might even be the same type...
  • vonbrand
    vonbrand over 11 years
    @jokoon, there is nothing simple in floating point and the whole precision/numerical stability problem area.
  • cmwt
    cmwt about 5 years
    The Ariane 5 code tried to convert a 64 bit floating point, whose value was greater than 32,767, into a 16 bit signed integer. This generated an overflow exception which caused the rocket to initiate its self-destruct sequence. The code in question, was code that was reused from an older, smaller rocket.
  • Charlie Parker
    Charlie Parker over 3 years
    can you see the difference in speed in GPUs?
  • Déjà vu
    Déjà vu over 2 years
    "double would produce larger errors"? Had a look at the (1991) paper, and unless I misread, he's talking about the rounding error can be higher with a greater β ; but β is not the mantissa (precision p in the doc) it's the base... and both double and float have β = 2.