How do I convert from a decimal number to IEEE 754 single-precision floating-point format?

binary floating-point ieee-754

83,081

Solution 1

Find the largest power of 2 which is smaller than your number, e.g if you start with x = 10.0 then 2³ = 8, so the exponent is 3. The exponent is biased by 127 so this means the exponent will be represented as 127 + 3 = 130. The mantissa is then 10.0/8 = 1.25. The 1 is implicit so we just need to represent 0.25, which is 010 0000 0000 0000 0000 0000 when expressed as a 23 bit unsigned fractional quantity. The sign bit is 0 for positive. So we have:

s | exp [130]  | mantissa [(1).25]            |

0 | 100 0001 0 | 010 0000 0000 0000 0000 0000 |

0x41200000

You can test the representation with a simple C program, e.g.

#include <stdio.h>

typedef union
{
    int i;
    float f;
} U;

int main(void)
{
    U u;
    
    u.f = 10.0;
    
    printf("%g = %#x\n", u.f, u.i);

    return 0;
}

Solution 2

Take a number 172.625.This number is Base10 format.

Convert this format is in base2 format For this, first convert 172 in to binary format

128 64 32 16 8 4 2 1
 1  0  1  0  1 1 0 0
172=10101100

Convert 0.625 in to binary format

0.625*2=1.250   1
0.250*2=.50     0
0.50*2=1.0      1
0.625=101

Binary format of 172.625=10101100.101. This is in base2 format 10101100*2

Shifting this binary number

1.0101100*2 **7      Normalized
1.0101100 is mantissa
2 **7 is exponent

add exponent 127 7+127=134

convert 134 in to binary format

134=10000110

The number is positive so sign of the number 0

0 |10000110 |01011001010000000000000

Explanation: The high order of bit is the sign of the number. number is stored in a sign magnitude format. The exponent is stored in 8 bit field format biased by 127 to the exponent The digit to the right of the binary point stored in the low order of 23 bit. NOTE---This format is IEEE 32 bit floating point format

Solution 3

A floating point number is simply scientific notation. Let's say I asked you to express the circumference of the Earth in meters, using scientific notation. You would write:

4.007516×10⁷m

The exponent is just that: the power of ten here. The mantissa is the actual digits of the number. And the sign, of course, is just positive or negative. So in this case the exponent is 7 and the mantissa is 4.007516 .

The only significant difference between IEEE754 and grade-school scientific notation is that floating point numbers are in base 2, so it's not times ten to the power of something, it's times two to the power of something. So where you would write, say, 256 in ordinary human scientific notation as:

2.56×10² (mantissa 2.56 and exponent 2),

in IEEE754, it's

1×2⁸ — the mantissa is 1 and the exponent is 8.

83,081

Author by

tgai

Updated on July 02, 2021

Comments

tgai almost 3 years

How would I go about manually changing a decimal (base 10) number into IEEE 754 single-precision floating-point format? I understand that there is three parts to it, a sign, an exponent, and a mantissa. I just don't completely understand what the last two parts actually represent.
Royi almost 10 years

Could you please create a MATLAB script do so so? Thank You.
Paul R almost 10 years

@Drazick: if you have a MATLAB problem then please post it as a new question.
Royi almost 10 years

Hi, A 'c' Code would be great as well. Just wanted a reference code to implement your solution. Thanks.
Paul R almost 10 years

@Drazick: there is C code in the answer above - or do you need something different ?
Royi almost 10 years

I meant a function which its input is a single / double precision number and its output is a string of 32 / 64 bits which are the representation of the input number. Thank you.
Paul R almost 10 years

@Drazick: OK - I don't really have the bandwidth to do that just now - try and implement it yourself and then if you get stuck post a new question. Note also there are online calculators which can do this for you, e.g. h-schmidt.net/FloatConverter/IEEE754.html
Kimbluey over 8 years

Shouldn't the mantissa in your IEEE FP Format be 01011000000000000000000?
Peter Cordes almost 3 years

@Royi: The general case of strtod is highly non-trivial to make a computer do correctly and efficiently. It potentially requires extended-precision integer to handle an integer or fractional part that's wider than uint64_t. Actual code would either be wrong for some cases, or too complex to actually help understanding of the basic steps. See exploringbinary.com/… / exploringbinary.com/how-strtod-works-and-sometimes-doesnt / exploringbinary.com/how-glibc-strtod-works