What is the fastest way to convert a float[] to a byte[]?

14,380

Solution 1

If you do not want any conversion to happen, I would suggest Buffer.BlockCopy().

public static void BlockCopy(
    Array src,
    int srcOffset,
    Array dst,
    int dstOffset,
    int count
)

For example:

float[] floatArray = new float[1000];
byte[] byteArray = new byte[floatArray.Length * 4];

Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);

Solution 2

There is a dirty fast (not unsafe code) way of doing this:

[StructLayout(LayoutKind.Explicit)]
struct BytetoDoubleConverter
{
    [FieldOffset(0)]
    public Byte[] Bytes;

    [FieldOffset(0)]
    public Double[] Doubles;
}
//...
static Double Sum(byte[] data)
{
    BytetoDoubleConverter convert = new BytetoDoubleConverter { Bytes = data };
    Double result = 0;
    for (int i = 0; i < convert.Doubles.Length / sizeof(Double); i++)
    {
        result += convert.Doubles[i];
    }
    return result;
}

This will work, but I'm not sure of the support on Mono or newer versions of the CLR. The only strange thing is that the array.Length is the bytes length. This can be explained because it looks at the array length stored with the array, and because this array was a byte array that length will still be in byte length. The indexer does think about the Double being eight bytes large so no calculation is necessary there.

I've looked for it some more, and it's actually described on MSDN, How to: Create a C/C++ Union by Using Attributes (C# and Visual Basic), so chances are this will be supported in future versions. I am not sure about Mono though.

Solution 3

Premature optimization is the root of all evil! @Vlad's suggestion to iterate over each float is a much more reasonable answer than switching to a byte[]. Take the following table of runtimes for increasing numbers of elements (average of 50 runs):

Elements      BinaryWriter(float)      BinaryWriter(byte[])
-----------------------------------------------------------
10               8.72ms                    8.76ms
100              8.94ms                    8.82ms
1000            10.32ms                    9.06ms
10000           32.56ms                   10.34ms
100000         213.28ms                  739.90ms
1000000       1955.92ms                10668.56ms

There is little difference between the two for small numbers of elements. Once you get into the huge number of elements range, the time spent copying from the float[] to the byte[] far outweighs the benefits.

So go with what is simple:

float[] data = new float[...];
foreach(float value in data)
{
    writer.Write(value);
}

Solution 4

There is a way which avoids memory copying and iteration.

You can use a really ugly hack to temporary change your array to another type using (unsafe) memory manipulation.

I tested this hack in both 32 & 64 bit OS, so it should be portable.

The source + sample usage is maintained at https://gist.github.com/1050703 , but for your convenience I'll paste it here as well:

public static unsafe class FastArraySerializer
{
    [StructLayout(LayoutKind.Explicit)]
    private struct Union
    {
        [FieldOffset(0)] public byte[] bytes;
        [FieldOffset(0)] public float[] floats;
    }

    [StructLayout(LayoutKind.Sequential, Pack = 1)]
    private struct ArrayHeader
    {
        public UIntPtr type;
        public UIntPtr length;
    }

    private static readonly UIntPtr BYTE_ARRAY_TYPE;
    private static readonly UIntPtr FLOAT_ARRAY_TYPE;

    static FastArraySerializer()
    {
        fixed (void* pBytes = new byte[1])
        fixed (void* pFloats = new float[1])
        {
            BYTE_ARRAY_TYPE = getHeader(pBytes)->type;
            FLOAT_ARRAY_TYPE = getHeader(pFloats)->type;
        }
    }

    public static void AsByteArray(this float[] floats, Action<byte[]> action)
    {
        if (floats.handleNullOrEmptyArray(action)) 
            return;

        var union = new Union {floats = floats};
        union.floats.toByteArray();
        try
        {
            action(union.bytes);
        }
        finally
        {
            union.bytes.toFloatArray();
        }
    }

    public static void AsFloatArray(this byte[] bytes, Action<float[]> action)
    {
        if (bytes.handleNullOrEmptyArray(action)) 
            return;

        var union = new Union {bytes = bytes};
        union.bytes.toFloatArray();
        try
        {
            action(union.floats);
        }
        finally
        {
            union.floats.toByteArray();
        }
    }

    public static bool handleNullOrEmptyArray<TSrc,TDst>(this TSrc[] array, Action<TDst[]> action)
    {
        if (array == null)
        {
            action(null);
            return true;
        }

        if (array.Length == 0)
        {
            action(new TDst[0]);
            return true;
        }

        return false;
    }

    private static ArrayHeader* getHeader(void* pBytes)
    {
        return (ArrayHeader*)pBytes - 1;
    }

    private static void toFloatArray(this byte[] bytes)
    {
        fixed (void* pArray = bytes)
        {
            var pHeader = getHeader(pArray);

            pHeader->type = FLOAT_ARRAY_TYPE;
            pHeader->length = (UIntPtr)(bytes.Length / sizeof(float));
        }
    }

    private static void toByteArray(this float[] floats)
    {
        fixed(void* pArray = floats)
        {
            var pHeader = getHeader(pArray);

            pHeader->type = BYTE_ARRAY_TYPE;
            pHeader->length = (UIntPtr)(floats.Length * sizeof(float));
        }
    }
}

And the usage is:

var floats = new float[] {0, 1, 0, 1};
floats.AsByteArray(bytes =>
{
    foreach (var b in bytes)
    {
        Console.WriteLine(b);
    }
});

Solution 5

Although you can obtain a byte* pointer using unsafe and fixed, you cannot convert the byte* to byte[] in order for the writer to accept it as a parameter without performing data copy. Which you do not want to do as it will double your memory footprint and add an extra iteration over the inevitable iteration that needs to be performed in order to output the data to disk.

Instead, you are still better off iterating over the array of floats and writing each float to the writer individually, using the Write(double) method. It will still be fast because of buffering inside the writer. See sixlettervariables's numbers.

Share:
14,380
Nick
Author by

Nick

Desk job like the rest of you.

Updated on June 09, 2022

Comments

  • Nick
    Nick about 2 years

    I would like to get a byte[] from a float[] as quickly as possible, without looping through the whole array (via a cast, probably). Unsafe code is fine. Thanks!

    I am looking for a byte array 4 time longer than the float array (the dimension of the byte array will be 4 times that of the float array, since each float is composed of 4 bytes). I'll pass this to a BinaryWriter.

    EDIT: To those critics screaming "premature optimization": I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with pinvoke'd win32 API. The optimization occurs since this lessens the number of function calls.

    And, with regard to memory, this application creates massive caches which use plenty of memory. I can allocate the byte buffer once and re-use it many times--the double memory usage in this particular instance amounts to a roundoff error in the overall memory consumption of the app.

    So I guess the lesson here is not to make premature assumptions ;)

  • Nick
    Nick over 15 years
    Not sure what you mean. I just want byte-level indexing into the floating-point array (actually, I'm passing the array to a Writer).
  • ryeguy
    ryeguy over 15 years
    @Vlad: What is this supposed to mean? How can a datatype not be representable as bytes? See my answer.
  • vladr
    vladr over 15 years
    it means that the binary representation of (float)0 and that of (byte)0 are not the same (for one they don't have the same size.)
  • Nick
    Nick over 15 years
    Doesn't seem to work: error CS1503: Argument '1': cannot convert from 'byte*' to 'byte[]'
  • vladr
    vladr over 15 years
    This will double the amount of memory allocation in addition to iterating over your two arrays (once to copy, once to write). Very inefficient both speed-wise and memory-wise. Not recommended.
  • jdmichal
    jdmichal over 15 years
    Actually, you should probably just use Buffer.ByteLength: msdn.microsoft.com/en-us/library/system.buffer.bytelength.as‌​px
  • vladr
    vladr over 15 years
    You are better off to just iterate over the float[] array and call Write for each float. This solution is highly inefficient.
  • Jeremy
    Jeremy over 15 years
    Didn't know about that method, thanks! As for efficiency, whenever I have used BlockCopy, I had a byte[] and needed a float[] so there was no unneeded duplication. Plus if you stick with BlockCopy, you do not need unsafe code which can be advantageous. Pick the best method for your needs.
  • ShuggyCoUk
    ShuggyCoUk over 15 years
    Vlad is correct, you cannot fake the bits in memory that consitute a float[] as a byte[]. You CAN get a byte* to the front of the arry which is likely sufficient for your needs but a byte* cannot be magiked into a byte[]
  • jdmichal
    jdmichal over 15 years
    @Jeremy: I didn't either, until 5 seconds before that comment :) @Vlad: Please just rate it up or down. No need to repeatedly post the same comment (while advertizing for your answer). Let the asker and the users decide what is helpful. That's why the rating system exists.
  • Sam
    Sam over 15 years
    Posted answer which confirms @Vlad's suspicions
  • Sam
    Sam over 15 years
    @rstevens: you would have to use Marshal.SizeOf(typeof(float)), but the CLI standard says sizeof(float) should be 32bits.
  • Nick
    Nick over 15 years
    I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with win32 API. ;)
  • Nick
    Nick over 15 years
    Please see my edit which explains why, in my specific case, Jeremy's answer does indeed speed up execution as confirmed by a profiler.
  • Omer Mor
    Omer Mor almost 14 years
    Actually you CAN fake the bits in memory to represent a byte[]. Check out my answer to see how it's done.
  • Gabe
    Gabe over 13 years
    -1 for being completely non-portable. Have you even tried this on a 64-bit machine?
  • Omer Mor
    Omer Mor over 13 years
    nope - it's a hack. If and when I get access to a 64 bit machine, I might check it out and perhaps adapt it. It is also not future proof. In CLR v.Next it might be completely broken. There is a trade-of here: You can use a more robust solution and pay in performance, or use the fastest way I can think of and live on the edge :-)
  • Omer Mor
    Omer Mor about 13 years
    I got a chance to use this on a 64-bit machine, so I made the code portable.
  • Robert Fraser
    Robert Fraser over 12 years
    +1 :-) Thanks for this! I use this method with custom structures, and it is indeed hellza helpful.
  • Admin
    Admin over 11 years
    +1 Nice technique! Is this reference aliasing safe from the garbage collector perspective?
  • Aidiakapi
    Aidiakapi over 11 years
    Just to avoid anybody getting the wrong idea, a System.Double (or in C# simply double) is 8 bytes (or 64 bits) and not 4 bytes (or 32 bits).
  • Cristian Diaconescu
    Cristian Diaconescu about 11 years
    +1 Pretty rad. I must ask, did you find any documentation on the memory layout for the type and length "fields" (for lack of a better word) of the arrays? I mean, how did you come up with this: FLOAT_ARRAY = *(UIntPtr*)(((byte*) pFloats) - 2*PTR_SIZE); ?
  • Cristian Diaconescu
    Cristian Diaconescu about 11 years
    Note to self and others: This article gets to the deeper end of the pool regarding internal type representation for .NET 2.0. codeproject.com/Articles/20481/…
  • Omer Mor
    Omer Mor about 11 years
    Thanks. I deduced the array header metadata fields using "reverse engineering" and some trial and error: I opened a memory window in visual studio, tinkered with the values, and deduced the layout. I updated the code to make it a little clearer.
  • TylerY86
    TylerY86 about 8 years
    This can also let you access uninitialized or other memory. Buffer overflow exploits go!
  • cdiggins
    cdiggins over 7 years
    "sizeof' is unsafe.
  • Peter Mortensen
    Peter Mortensen about 7 years
    There is a sweet spot at 10,000, 3 times faster (or is it a typo? - should it be 30.34 ms?) - how do you explain that?
  • Jan Kotas
    Jan Kotas almost 7 years
    This hack is corrupting the internal garbage collector data structures. It will cause intermittent crashes, data corruptions, and security bugs of the same class as use-after-free in C++. Hacking internal garbage collector data structures like this is absolutely not supported by the .NET runtime. github.com/HelloKitty/Reinterpret.Net/issues/1 has a long discussion about the crashes that this hack will lead to.
  • Omer Mor
    Omer Mor over 6 years
    @JanKotas thanks for the discussion link. Very interesting! I guess I could pin the array for the entire scope of the As{Float,Byte}Array() functions to prevent such corruptions. What do you think?
  • Oliver Bock
    Oliver Bock almost 4 years
    @OmerMor, I think you are right because (a) the garbage collector won't move it while pinned, and (b) the garbage collector won't traverse it because it is an array of simple values.
  • Peter
    Peter about 3 years
    Are you comparing this against foreach (byte b in bytedata) { writer.Write(b); }? Because that's a fairly silly compare, the whole reason why you want this to bytes is so you can use writer.Write(bytedata) directly, skipping the massive overhead per Write call. Writing 1MB to disk should not take 2 seconds, that's just plain absurd. You'd need a week to write a full PC backup this way.
  • Trương Quốc Khánh
    Trương Quốc Khánh about 3 years
    Maybe Length, heap address,... store in cli array, then Bytes.Length and Doubles.Length get same address and then same value. That not safe when using outside function