What is the fastest way to convert float to int on x86

27,357

Solution 1

It depends on if you want a truncating conversion or a rounding one and at what precision. By default, C will perform a truncating conversion when you go from float to int. There are FPU instructions that do it but it's not an ANSI C conversion and there are significant caveats to using it (such as knowing the FPU rounding state). Since the answer to your problem is quite complex and depends on some variables you haven't expressed, I recommend this article on the issue:

http://www.stereopsis.com/FPU.html

Solution 2

Packed conversion using SSE is by far the fastest method, since you can convert multiple values in the same instruction. ffmpeg has a lot of assembly for this (mostly for converting the decoded output of audio to integer samples); check it for some examples.

Solution 3

A commonly used trick for plain x86/x87 code is to force the mantissa part of the float to represent the int. 32 bit version follows.

The 64-bit version is analogical. The Lua version posted above is faster, but relies on the truncation of double to a 32-bit result, therefore it requires the x87 unit to be set to double precision, and cannot be adapted for double to 64-bit int conversion.

The nice thing about this code is it is completely portable for all platforms conforming to IEEE 754, the only assumption made is the floating point rounding mode is set to nearest. Note: Portable in the sense it compiles and works. Platforms other than x86 usually do not benefit much from this technique, if at all.

static const float Snapper=3<<22;

union UFloatInt {
 int i;
 float f;
};

/** by Vlad Kaipetsky
portable assuming FP24 set to nearest rounding mode
efficient on x86 platform
*/
inline int toInt( float fval )
{
  Assert( fabs(fval)<=0x003fffff ); // only 23 bit values handled
  UFloatInt &fi = *(UFloatInt *)&fval;
  fi.f += Snapper;
  return ( (fi.i)&0x007fffff ) - 0x00400000;
}

Solution 4

There is one instruction to convert a floating point to an int in assembly: use the FISTP instruction. It pops the value off the floating-point stack, converts it to an integer, and then stores at at the address specified. I don't think there would be a faster way (unless you use extended instruction sets like MMX or SSE, which I am not familiar with).

Another instruction, FIST, leaves the value on the FP stack but I'm not sure it works with quad-word sized destinations.

Solution 5

The Lua code base has the following snippet to do this (check in src/luaconf.h from www.lua.org). If you find (SO finds) a faster way, I'm sure they'd be thrilled.

Oh, lua_Number means double. :)

/*
@@ lua_number2int is a macro to convert lua_Number to int.
@@ lua_number2integer is a macro to convert lua_Number to lua_Integer.
** CHANGE them if you know a faster way to convert a lua_Number to
** int (with any rounding method and without throwing errors) in your
** system. In Pentium machines, a naive typecast from double to int
** in C is extremely slow, so any alternative is worth trying.
*/

/* On a Pentium, resort to a trick */
#if defined(LUA_NUMBER_DOUBLE) && !defined(LUA_ANSI) && !defined(__SSE2__) && \
    (defined(__i386) || defined (_M_IX86) || defined(__i386__))

/* On a Microsoft compiler, use assembler */
#if defined(_MSC_VER)

#define lua_number2int(i,d)   __asm fld d   __asm fistp i
#define lua_number2integer(i,n)     lua_number2int(i, n)

/* the next trick should work on any Pentium, but sometimes clashes
   with a DirectX idiosyncrasy */
#else

union luai_Cast { double l_d; long l_l; };
#define lua_number2int(i,d) \
  { volatile union luai_Cast u; u.l_d = (d) + 6755399441055744.0; (i) = u.l_l; }
#define lua_number2integer(i,n)     lua_number2int(i, n)

#endif

/* this option always works, but may be slow */
#else
#define lua_number2int(i,d) ((i)=(int)(d))
#define lua_number2integer(i,d) ((i)=(lua_Integer)(d))

#endif
Share:
27,357

Related videos on Youtube

robottobor
Author by

robottobor

Updated on July 09, 2022

Comments

  • robottobor
    robottobor almost 2 years

    What is the fastest way you know to convert a floating-point number to an int on an x86 CPU. Preferrably in C or assembly (that can be in-lined in C) for any combination of the following:

    • 32/64/80-bit float -> 32/64-bit integer

    I'm looking for some technique that is faster than to just let the compiler do it.

    • JBB
      JBB over 15 years
      Switch from a Pentium 5 to a chip that does math right... (Man that makes me feel old...)
    • Kevin
      Kevin over 15 years
      I'm rolling around on the ground. Dang -- it's too bad people down-modded you for that!
    • akauppi
      akauppi about 15 years
      :) Is there actually a Pentium 5? And if there is, so sorry it does have SSE3 and therefore is perfectly allright. When used wisely (see SSE3 and FISTTP comments).
  • Asim Ihsan
    Asim Ihsan over 15 years
    It is a good suggestion however I will caveat it by saying it assumes two things: - That you have an x86 processor with SSE (>PII) or SSE2 (>PIII) - That you in fact do want a truncation, not a rounding, conversion
  • Don Neufeld
    Don Neufeld over 15 years
    You are simply incorrect. In this case rolling your own is a very demonstrable 10x speed improvement over the built in functions because when you do it yourself you can trust the state of the FPU flags, which the built in _ftol does not do, or you can do it parallelized using SSE.
  • akauppi
    akauppi about 15 years
    Or you can flag '-msse3' (gcc) and have the 'fixed' FTSTTP do it right, seamlessly.
  • chmike
    chmike about 15 years
    For unsigned integer it can be simpler: inline uint32_t toInt( float fval ) { static float const snapper = 1<<23; fval += snapper; return ((uint32_t)fval) & 0x007FFFFF; }
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    static float const snapper; makes this slower than necessary. Simply write fval += 1<<23;
  • Suma
    Suma over 13 years
    On x86 it is not slower, as the code generated is the same. There are no FPU instructions taking immediate arguments on x87.
  • Nick Dowell
    Nick Dowell about 13 years
    The compiler-supplied routines are not well suited for multimedia applications where performance is crucial
  • Cody Gray
    Cody Gray almost 11 years
    This is interesting, and appears to be correct, but in my tests the x64 compiler actually generates the exact same code (verified using a disassembler) for your code here and the MSDN example.
  • PhiS
    PhiS almost 11 years
    Also note the limitation that this will of course not be an option for an 80-bit floating point value
  • PhiS
    PhiS about 10 years
    re inline assembly: yes Embarcadero (formerly Borland) does support it (both C++ and Delphi compilers do)