What is the fastest way to convert float to int on x86
Solution 1
It depends on if you want a truncating conversion or a rounding one and at what precision. By default, C will perform a truncating conversion when you go from float to int. There are FPU instructions that do it but it's not an ANSI C conversion and there are significant caveats to using it (such as knowing the FPU rounding state). Since the answer to your problem is quite complex and depends on some variables you haven't expressed, I recommend this article on the issue:
http://www.stereopsis.com/FPU.html
Solution 2
Packed conversion using SSE is by far the fastest method, since you can convert multiple values in the same instruction. ffmpeg has a lot of assembly for this (mostly for converting the decoded output of audio to integer samples); check it for some examples.
Solution 3
A commonly used trick for plain x86/x87 code is to force the mantissa part of the float to represent the int. 32 bit version follows.
The 64-bit version is analogical. The Lua version posted above is faster, but relies on the truncation of double to a 32-bit result, therefore it requires the x87 unit to be set to double precision, and cannot be adapted for double to 64-bit int conversion.
The nice thing about this code is it is completely portable for all platforms conforming to IEEE 754, the only assumption made is the floating point rounding mode is set to nearest. Note: Portable in the sense it compiles and works. Platforms other than x86 usually do not benefit much from this technique, if at all.
static const float Snapper=3<<22;
union UFloatInt {
int i;
float f;
};
/** by Vlad Kaipetsky
portable assuming FP24 set to nearest rounding mode
efficient on x86 platform
*/
inline int toInt( float fval )
{
Assert( fabs(fval)<=0x003fffff ); // only 23 bit values handled
UFloatInt &fi = *(UFloatInt *)&fval;
fi.f += Snapper;
return ( (fi.i)&0x007fffff ) - 0x00400000;
}
Solution 4
There is one instruction to convert a floating point to an int in assembly: use the FISTP instruction. It pops the value off the floating-point stack, converts it to an integer, and then stores at at the address specified. I don't think there would be a faster way (unless you use extended instruction sets like MMX or SSE, which I am not familiar with).
Another instruction, FIST, leaves the value on the FP stack but I'm not sure it works with quad-word sized destinations.
Solution 5
The Lua code base has the following snippet to do this (check in src/luaconf.h from www.lua.org). If you find (SO finds) a faster way, I'm sure they'd be thrilled.
Oh, lua_Number
means double. :)
/*
@@ lua_number2int is a macro to convert lua_Number to int.
@@ lua_number2integer is a macro to convert lua_Number to lua_Integer.
** CHANGE them if you know a faster way to convert a lua_Number to
** int (with any rounding method and without throwing errors) in your
** system. In Pentium machines, a naive typecast from double to int
** in C is extremely slow, so any alternative is worth trying.
*/
/* On a Pentium, resort to a trick */
#if defined(LUA_NUMBER_DOUBLE) && !defined(LUA_ANSI) && !defined(__SSE2__) && \
(defined(__i386) || defined (_M_IX86) || defined(__i386__))
/* On a Microsoft compiler, use assembler */
#if defined(_MSC_VER)
#define lua_number2int(i,d) __asm fld d __asm fistp i
#define lua_number2integer(i,n) lua_number2int(i, n)
/* the next trick should work on any Pentium, but sometimes clashes
with a DirectX idiosyncrasy */
#else
union luai_Cast { double l_d; long l_l; };
#define lua_number2int(i,d) \
{ volatile union luai_Cast u; u.l_d = (d) + 6755399441055744.0; (i) = u.l_l; }
#define lua_number2integer(i,n) lua_number2int(i, n)
#endif
/* this option always works, but may be slow */
#else
#define lua_number2int(i,d) ((i)=(int)(d))
#define lua_number2integer(i,d) ((i)=(lua_Integer)(d))
#endif
Related videos on Youtube
robottobor
Updated on July 09, 2022Comments
-
robottobor almost 2 years
What is the fastest way you know to convert a floating-point number to an int on an x86 CPU. Preferrably in C or assembly (that can be in-lined in C) for any combination of the following:
- 32/64/80-bit float -> 32/64-bit integer
I'm looking for some technique that is faster than to just let the compiler do it.
-
JBB over 15 yearsSwitch from a Pentium 5 to a chip that does math right... (Man that makes me feel old...)
-
Kevin over 15 yearsI'm rolling around on the ground. Dang -- it's too bad people down-modded you for that!
-
akauppi about 15 years:) Is there actually a Pentium 5? And if there is, so sorry it does have SSE3 and therefore is perfectly allright. When used wisely (see SSE3 and FISTTP comments).
-
Asim Ihsan over 15 yearsIt is a good suggestion however I will caveat it by saying it assumes two things: - That you have an x86 processor with SSE (>PII) or SSE2 (>PIII) - That you in fact do want a truncation, not a rounding, conversion
-
Don Neufeld over 15 yearsYou are simply incorrect. In this case rolling your own is a very demonstrable 10x speed improvement over the built in functions because when you do it yourself you can trust the state of the FPU flags, which the built in _ftol does not do, or you can do it parallelized using SSE.
-
akauppi about 15 yearsOr you can flag '-msse3' (gcc) and have the 'fixed' FTSTTP do it right, seamlessly.
-
chmike about 15 yearsFor unsigned integer it can be simpler: inline uint32_t toInt( float fval ) { static float const snapper = 1<<23; fval += snapper; return ((uint32_t)fval) & 0x007FFFFF; }
-
R.. GitHub STOP HELPING ICE over 13 years
static float const snapper;
makes this slower than necessary. Simply writefval += 1<<23;
-
Suma over 13 yearsOn x86 it is not slower, as the code generated is the same. There are no FPU instructions taking immediate arguments on x87.
-
Nick Dowell about 13 yearsThe compiler-supplied routines are not well suited for multimedia applications where performance is crucial
-
Cody Gray almost 11 yearsThis is interesting, and appears to be correct, but in my tests the x64 compiler actually generates the exact same code (verified using a disassembler) for your code here and the MSDN example.
-
PhiS almost 11 yearsAlso note the limitation that this will of course not be an option for an 80-bit floating point value
-
PhiS about 10 yearsre inline assembly: yes Embarcadero (formerly Borland) does support it (both C++ and Delphi compilers do)