Floating Point to Binary Value(C++)

c++ binary floating-point ieee-754

42,281

Solution 1

Use union and bitset:

#include <iostream>
#include <bitset>
#include <climits>

int main()
{
    union
    {
        float input; // assumes sizeof(float) == sizeof(int)
        int   output;
    } data;

    data.input = 2.25125;

    std::bitset<sizeof(float) * CHAR_BIT> bits(data.output);
    std::cout << bits << std::endl;

    // or
    std::cout << "BIT 4: " << bits[4] << std::endl;
    std::cout << "BIT 7: " << bits[7] << std::endl;
}

It may not be an array but you can access bits with [] operator as if you were using an array.

Output

$ ./bits
01000000000100000001010001111011
BIT 4: 1
BIT 7: 0

Solution 2

int fl = *(int*)&floatVar; //assuming sizeof(int) = sizeof(float)

int binaryRepresentation[sizeof(float) * 8];

for (int i = 0; i < sizeof(float) * 8; ++i)
    binaryRepresentation[i] = ((1 << i) & fl) != 0 ? 1 : 0;

Explanation

(1 << i) shifts the value 1, i bits to the left. The & operator computes the bitwise and of the operands.

The for loop runs once for each of the 32 bits in the float. Each time, i will be the number of the bit we want to extract the value from. We compute the bitwise and of the number and 1 << i:

Assume the number is: 1001011, and i = 2

1<<i will be equal to 0000100

  10001011
& 00000100
==========
  00000000

if i = 3 then:

  10001011
& 00001000
==========
  00001000

Basically, the result will be a number with ith bit set to the ith bit of the original number and all other bits are zero. The result will be either zero, which means the ith bit in the original number was zero or nonzero, which means the actual number had the ith bit equal to 1.

Solution 3

other approach, using stl

#include <iostream>
#include <bitset>

using namespace std;
int main()
{
    float f=4.5f;
    cout<<bitset<sizeof f*8>(*(long unsigned int*)(&f))<<endl;
    return 0;
}

Solution 4

Can you just read the binary in the memory that the float variable?

Yes. Static cast a pointer to it to an int pointer and read the bits from the result. An IEEE 754 float type in C++ is 32 bits.

Solution 5

If you need a particular floating point representation, you'll have to build that up semantically from the float itself, not by bit-copying.

c0x standard: http://c0x.coding-guidelines.com/5.2.4.2.2.html doesn't define the format of floating point numbers.

View more solutions

42,281

Author by

user58389

Updated on September 29, 2020

Comments

user58389 over 3 years

I want to take a floating point number in C++, like 2.25125, and a int array filled with the binary value that is used to store the float in memory (IEEE 754).

So I could take a number, and end up with a int num[16] array with the binary value of the float: num[0] would be 1 num[1] would be 1 num[2] would be 0 num[3] would be 1 and so on...

Putting an int into an array isn't difficult, just the process of getting the binary value of a float is where I'm stuck. Can you just read the binary in the memory that the float variable? If not, how could I go about doing this in C++?

EDIT: The reason for doing the comparison this way is that I am wanting to learn to do bitwise operations in C++.
Christoph over 15 years

That's not what he wants: The binary representation must be an array of size sizeof(float) * CHAR_BIT (-1)
mmx over 15 years

@Christoph: I doubt so. Look at the question. He says he wants a binary representation of the float in an int array.
mmx over 15 years

continued: To quote from the question: "So I could take a number, and end up with a int num[16] array with the binary value of the float: num[0] would be 1 num[1] would be 1 num[2] would be 0 num[3] would be 1 and so on..."
Christoph over 15 years

He wants the int array to contain the bit pattern, ie one int for each bit - therefore, its size must be the number of bits in a float variable, ie 32 (he incorrectly assumed that a float value takes 16 bits...)
user58389 over 15 years

Also, I now see I need 32 bits, not 16.
Martin York over 15 years

Do not assume there are 8 bits in a byte. Use CHAR_BIT.
Martin York over 15 years

@unknown (yahoo): That's silly. It does not buy you anything. Assuming this homework: Put each float in an int do an xor on the ints.
Sam over 15 years

I think the number of programmers left in the world who deal with CHAR_BIT as a necessity could be counted on one hand... (as of 2007 I am no longer part of that crowd)
Johannes Schaub - litb over 15 years

sixlettervariables. that's just silly... it's part of the language spec and it's the amount of bits in char. how about omitting the use of sizeof next...
user58389 over 15 years

This isn't homework, one of my professors asked me if I could do it for fun. It would be very simple to just subtract one float from another, and if you do not have 0, then they are not the same. But I think this is simply an exercise in working with bits and binary logic.
mmx over 15 years

What's the kind of fun that professors ask for? It gives out some increase in GPA?
user58389 over 15 years

Can you explain what is happening here: ((1 << i) & fl) != 0 ? 1 : 0;
Konrad Rudolph over 15 years

Mehrdad, any reason for using the pretty much deprecated C-style cast instead of the recommended reinterpret_cast here? There's pretty much consensus that C-style cast should never be used – especially not in a “textbook” example.
mmx over 15 years

@Konrad, It's shorter :) The sole purpose of my answer was the line in the for loop. I didn't want to clutter up the answer with unnecessary best practices.
user58389 over 15 years

Thank you Mehrdad Afshari! You have been a great help.
deft_code about 15 years

ieee754 floats are always 32 bits, c++ is spec'ed to use ieee754 for it floating point types. Long is also spec'ed to be 32 bits. Change the union to use long instead of int, and you'll have truly portable code.
Martin York almost 11 years

@deft_code: C++ is NOT spec'ed to use ieee754 (it can be). Long is NOT spec'ed as 32 bits (it must be at least 32). This will never be portable as assigning to one field in a union and reading from another is unspecified behavior. If I am incorrect about either of the above please let me know the clause in the C++ standards where it is defined because a simple search showed both statements as wrong.
underscore_d over 8 years

@deft_code not only that, but it's also false that "ieee754 floats are always 32 bits". Re-read the standard and note the 3 types specified there, then consider deleting your comment already.
RetroSeven almost 4 years

This is UB. Please don't ever do this.
Martin York almost 4 years

@MichalŠtein Its implementation defined behavior. This technique is a heavily used in C code and for backwards compatibility (a very important part of C++ consideration when new features are designed) needs to work in C++.
RetroSeven almost 4 years

@MatrinYork It's UB in C++.
Martin York almost 4 years

@MichalŠtein What clause in the standard are you using to make that claim?