serialize any data type as vector<uint8_t> - use reinterpret_cast?

13,805

Solution 1

My suggestion is to ignore all the people telling you that reinterpret_cast is bad. They tell you it is bad, because it's generally not a good practice to take the memory map of one type and pretend that it's another type. But in this case, that is exactly what you want to do, as your entire purpose is to transmit the memory map as a series of bytes.

It is far better than using a double-static_cast, as it fully details the fact that you are taking one type and purposefully pretending that it is something else. This situation is exactly what reinterpret_cast is for, and dodging using it with a void pointer intermediary is simply obscuring your meaning with no benefit.

Also, I'm sure that you're aware of this, but watch for pointers in T.

Solution 2

Your situation is exactly what reinterpret_cast is for, it's simpler than a double static_cast and documents clearly what you're doing.

Just to be safe, you should use unsigned char instead of uint8_t:

  • doing reinterpret_cast to unsigned char * and then dereferencing the resulting pointer is safe and portable and is explicitly permitted by [basic.lval] §3.10/10
  • doing reinterpret_cast to std::uint8_t * and then dereferencing the resulting pointer is a violation of strict aliasing rule and is undefined behavior if std::uint8_t is implemented as extended unsigned integer type.

    If it exists, uint8_t must always have the same width as unsigned char. However, it need not be the same type; it may be a distinct extended integer type. It also need not have the same representation as unsigned char (see When is uint8_t ≠ unsigned char?).

    (This isn't completely hypothetical: making [u]int8_t a special extended integer type allows some aggressive optimizations)

If you really want uint8_t, you could add a:

static_assert(std::is_same<std::uint8_t, unsigned char>::value,
              "We require std::uint8_t to be implemented as unsigned char");

so that the code won't compile on platforms on which it would result in undefined behavior.

Solution 3

You can get rid of one cast by exploiting the fact that any pointer can be implicitly cast to void*. Also, you might want to add a few const:

//Beware, brain-compiled code ahead!
template <typename T>
inline void encode (std::vector< uint8_t >& dst, const T& data)
{
    const void* pdata = &data;
    uint8_t* src = static_cast<uint8_t*>(pdata);
    dst.insert(dst.end(), src, src + sizeof(T));
}

You might want to add a compile-time check for T being a POD, no struct, and no pointer.

However, interpreting some object's memory at the byte-level is never going to be save, period. If you have to do it, then do it in a nice wrapper (as you have done), and get over it. When you port to a different platform/compiler, have an eye on these things.

Solution 4

You're not doing any actual encoding here, you're just copying the raw representation of the data from memory into a byte array and then sending that out over the network. That's not going to work. Here's a quick example as to why:

struct A {
  int a;
};

struct B {
  A* p_a;
}

What happens when you use your method to send a B out over the network? The recipient receives p_a, the address of some A object on your machine, but that object is not on their machine. And even if you sent them the A object too, it wouldn't be at the same address. There's no way that can work if you just send the raw B struct. And that's not even considering more subtle issues like endianness and floating point representation which can affect the transmission of such simple types as int and double.

What you are doing right now is fundamentally no different than just casting to uint8_t* as far as whether it's going to work or not is concerned (it won't work, except for the most trivial cases).

What you need to do is devise a method of serialization. Serialization means any way of solving this sort of problem: how to get objects in memory out onto the network in a form such that they can be meaningfully reconstructed on the other side. This is a tricky problem, but it is a well-known and repeatedly solved problem. Here's a good starting point for reading: http://www.parashift.com/c++-faq-lite/serialization.html

Share:
13,805
ezpz
Author by

ezpz

Updated on July 21, 2022

Comments

  • ezpz
    ezpz almost 2 years

    I didnt find anything directly related in searching, so please forgive if this is a duplicate.

    What I am looking to do is serialize data across a network connection. My approach is to convert everything I need to transfer to a std::vector< uint8_t > and on the receiving side unpack the data into the appropriate variables. My approach looks like this:

    template <typename T>
    inline void pack (std::vector< uint8_t >& dst, T& data) {
        uint8_t * src = static_cast < uint8_t* >(static_cast < void * >(&data));
        dst.insert (dst.end (), src, src + sizeof (T));
    }   
    
    template <typename T>
    inline void unpack (vector <uint8_t >& src, int index, T& data) {
        copy (&src[index], &src[index + sizeof (T)], &data);
    }
    

    Which I'm using like

    vector< uint8_t > buffer;
    uint32_t foo = 103, bar = 443;
    pack (buff, foo);
    pack (buff, bar);
    
    // And on the receive side
    uint32_t a = 0, b = 0;
    size_t offset = 0;
    unpack (buffer, offset, a);
    offset += sizeof (a);
    unpack (buffer, offset, b);
    

    My concern is the

    uint8_t * src = static_cast < uint8_t* >(static_cast < void * >(&data));

    line (which I understand to do the same as reinterpret_cast). Is there a better way to accomplish this without the double cast?

    My naive approach was to just use static_cast< uint8_t* >(&data) which failed. I've been told in the past that reinterpret_cast is bad. So I'd like to avoid it (or the construct I have currently) if possible.

    Of course, there is always uint8_t * src = (uint8_t *)(&data).

    Suggestions?

  • ezpz
    ezpz almost 14 years
    So, yes, misnomer. Regarding the rest of your comment: the question, as posed, is a simplification to inquire about whether or not to reinterpret_cast (or similar) - I'll rename to be more specfic. I'm aware of the subtleties in transferring data and internally everything has a pack/unpack which essentially does what I describe above for its own data.
  • ezpz
    ezpz almost 14 years
    I have the const in there but elided for brevity. I do not, however, have the check for pointer and/or struct. This is used only by myself, but it would probably be safest to add those checks to be sure. Thanks.
  • underscore_d
    underscore_d about 8 years
    +1 for this being better than chained static_casts and especially the warnings about uint8_t. I read a post like this, maybe even the same one, in the past - and quickly had to do a lot of s/uint8_t/unsigned char/g ;)