How to read byte blocks into struct

10,735

Solution 1

Assuming this is C#, I wouldn't create a struct as a FileEntry type. I would replace char[20] with strings and use a BinaryReader - http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx to read individual fields. You must read the data in the same order as it was written.

Something like:

class FileEntry {
     byte Value1;
     char[] Filename;
     byte Value2;
     byte[] FileOffset;
     float whatever;
}

  using (var reader = new BinaryReader(File.OpenRead("path"))) {
     var entry = new FileEntry {
        Value1 = reader.ReadByte(),
        Filename = reader.ReadChars(12) // would replace this with string
        FileOffset = reader.ReadBytes(3),
        whatever = reader.ReadFloat()           
     };
  }

If you insist having a struct, you should make your struct immutable and create a constructor with arguments for each of your field.

 

Solution 2

If you can use unsafe code:

unsafe struct FileEntry{
     byte Value1;
     fixed char Filename[12];
     byte Value2;
     fixed byte FileOffset[3];
     float whatever;
}

public unsafe FileEntry Get(byte[] src)
{
     fixed(byte* pb = &src[0])
     {
         return *(FileEntry*)pb;
     } 
}

The fixed keyword embeds the array in the struct. Since it is fixed, this can cause GC issues if you are constantly creating these and never letting them go. Keep in mind that the constant sizes are the n*sizeof(t). So the Filename[12] is allocating 24 bytes (each char is 2 bytes unicode) and FileOffset[3] is allocating 3 bytes. This matters if you're not dealing with unicode data on disk. I would recommend changing it to a byte[] and converting the struct to a usable class where you can convert the string.

If you can't use unsafe, you can do the whole BinaryReader approach:

public unsafe FileEntry Get(Stream src)
{
     FileEntry fe = new FileEntry();
     var br = new BinaryReader(src);
     fe.Value1 = br.ReadByte();
     ...
}

The unsafe way is nearly instant, far faster, especially when you're converting a lot of structs at once. The question is do you want to use unsafe. My recommendation is only use the unsafe method if you absolutely need the performance boost.

Solution 3

Base on this article, only I have made it generic, this is how to marshal the data directly to the struct. Very useful on longer data types.

public static T RawDataToObject<T>(byte[] rawData) where T : struct
{
    var pinnedRawData = GCHandle.Alloc(rawData,
                                       GCHandleType.Pinned);
    try
    {
        // Get the address of the data array
        var pinnedRawDataPtr = pinnedRawData.AddrOfPinnedObject();

        // overlay the data type on top of the raw data
        return (T) Marshal.PtrToStructure(pinnedRawDataPtr, typeof(T));
    }
    finally
    {
        // must explicitly release
        pinnedRawData.Free();
    }
}

Example Usage:

[StructLayout(LayoutKind.Sequential)]
public struct FileEntry
{
    public readonly byte Value1;

    //you may need to play around with this one
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
    public readonly string Filename;

    public readonly byte Value2;

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
    public readonly byte[] FileOffset;

    public readonly float whatever;
}

private static void Main(string[] args)
{
    byte[] data =;//from file stream or whatever;
    //usage
    FileEntry entry = RawDataToObject<FileEntry>(data);
}

Solution 4

Wrapping your FileStream with a BinaryReader will give you dedicated Read*() methods for primitive types: http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx

Out of my head, you could probably mark your struct with [StructLayout(LayoutKind.Sequential)] (to ensure proper representation in memory) and use a pointer in unsafe block to actually fill the struct C-style. Going unsafe is not recommended if you don't really need it (interop, heavy operations like image processing and so on) however.

Solution 5

Not a full answer (it's been covered I think), but a specific note on the filename:

The Char type is probably not a one-byte thing in C#, since .Net characters are unicode, meaning they support character values far beyond 255, so interpreting your filename data as Char[] array will give problems. So the first step is definitely to read that as Byte[12], not Char[12].

A straight conversion from byte array to char array is also not advised, though, since in binary indices like this, filenames that are shorter than the allowed 12 characters will probably be padded with '00' bytes, so a straight conversion will result in a string that's always 12 characters long and might end on these zero-characters.

However, simply trimming these zeroes off is not advised, since reading systems for such data usually simply read up to the first encountered zero, and the data behind that in the array might actually contain garbage if the writing system doesn't bother to specifically clear its buffer with zeroes before putting the string into it. It's something a lot of programs don't bother doing, since they assume the reading system will only interpret the string up to the first zero anyway.

So, assuming this is indeed such a typical zero-terminated (C-style) string, saved in a one-byte-per-character text encoding (like ASCII, DOS-437 or Win-1252), the second step is to cut off the string on the first zero. You can easily do this with Linq's TakeWhile function. Then the third and final step is to convert the resulting byte array to string with whatever that one-byte-per-character text encoding it's written with happens to be:

public String StringFromCStringArray(Byte[] readData, Encoding encoding)
{
    return encoding.GetString(readData.TakeWhile(x => x != 0).ToArray());
}

As I said, the encoding will probably be something like pure ASCII, which can be accessed from Encoding.ASCII, standard US DOS encoding, which is Encoding.GetEncoding(437), or Windows-1252, the standard US / western Europe Windows text encoding, which you can retrieve with Encoding.GetEncoding("Windows-1252").

Share:
10,735
Gabriel Sanmartin
Author by

Gabriel Sanmartin

Updated on August 24, 2022

Comments

  • Gabriel Sanmartin
    Gabriel Sanmartin over 1 year

    I have this resource file which I need to process, wich packs a set of files.

    First, the resource file lists all the files contained within, plus some other data, such as in this struct:

    struct FileEntry{
         byte Value1;
         char Filename[12];
         byte Value2;
         byte FileOffset[3];
         float whatever;
    }
    

    So I would need to read blocks exactly this size.

    I am using the Read function from FileStream, but how can I specify the size of the struct? I used:

    int sizeToRead = Marshal.SizeOf(typeof(Header));
    

    and then pass this value to Read, but then I can only read a set of byte[] which I do not know how to convert into the specified values (well I do know how to get the single byte values... but not the rest of them).

    Also I need to specify an unsafe context which I don't know whether it's correct or not...

    It seems to me that reading byte streams is tougher than I thought in .NET :)

    Thanks!

  • Gabriel Sanmartin
    Gabriel Sanmartin over 12 years
    This worked like a charm. How would you "replace this with string"? Using ReadString() you can't specify a size, so it reads beyond the desired position.
  • Gabriel Sanmartin
    Gabriel Sanmartin over 12 years
    Probably would be smarter not to use unsafe then, since it's only small files where proessing rates would not be really noticeable.
  • Vasea
    Vasea over 12 years
    Actually, the size of string is contained in the stream if it was written as a string before. From MSDN - "Reads a string from the current stream. The string is prefixed with the length, encoded as an integer seven bits at a time." (msdn.microsoft.com/en-us/library/…). However, then you also must use the BinaryWriter.Write(string) before that. You could construct a string using chars - "StringField = new string(reader.ReadChars(20));".
  • Nyerguds
    Nyerguds over 7 years
    Yeah, strings are probably better read as Byte[], and then retrieved using some method that converts it to Char[] and then to String. With some rudimentary "ascii-only" checking and classic end-on-0 c-string behaviour, that'd be something like String filename = new String(filenameArr.TakeWhile(x => x != 0).Select(x => x < 128 ? Convert.ToChar(x) : '?').ToArray());