Serializing a class which contains a std::string

21,000

Solution 1

I'm serializing by casting the class to a char* and writing it to a file with fstream. Reading of course is just the reverse.

Unfortunately, this only works as long as there are no pointers involved. You might want to give your classes void MyClass::serialize(std::ostream) and void MyClass::deserialize(std::ifstream), and call those. For this case, you'd want

std::ostream& MyClass::serialize(std::ostream &out) const {
    out << height;
    out << ',' //number seperator
    out << width;
    out << ',' //number seperator
    out << name.size(); //serialize size of string
    out << ',' //number seperator
    out << name; //serialize characters of string
    return out;
}
std::istream& MyClass::deserialize(std::istream &in) {
    if (in) {
        int len=0;
        char comma;
        in >> height;
        in >> comma; //read in the seperator
        in >> width;
        in >> comma; //read in the seperator
        in >> len;  //deserialize size of string
        in >> comma; //read in the seperator
        if (in && len) {
            std::vector<char> tmp(len);
            in.read(tmp.data() , len); //deserialize characters of string
            name.assign(tmp.data(), len);
        }
    }
    return in;
}

You may also want to overload the stream operators for easier use.

std::ostream &operator<<(std::ostream& out, const MyClass &obj)
{obj.serialize(out); return out;}
std::istream &operator>>(std::istream& in, MyClass &obj)
{obj.deserialize(in); return in;}

Solution 2

Simply writing the binary contents of an object into a file is not only unportable but, as you've recognized, doesn't work for pointer data. You basically have two options: either you write a real serialization library, which handles std::strings properly by e.g. using c_str() to output the actual string to the file, or you use the excellent boost serialization library. If at all possible, I'd recommend the latter, you can then serialize with a simple code like this:

#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/serialization/string.hpp>

class A {
    private:
        std::string s;
    public:
        template<class Archive>
        void serialize(Archive& ar, const unsigned int version)
        {
            ar & s;
        }
};

Here, the function serialize works for serializing and deserializing the data, depending on how you call it. See the documentation for more information.

Solution 3

The easiest serialization method for strings or other blobs with variable size is to serialize first the size as you serialize integers, then just copy the content to the output stream.

When reading you first read the size, then allocate the string and then fill it by reading the correct number of bytes from the stream.

An alternative is to use a delimiter and escaping, but requires more code and is slower both on serialization and deserialization (however the result can be kept human readable).

Solution 4

You'll have to use a more complicated method of serialization than casting a class to a char* and writing it to a file if your class contains any exogenous data (string does). And you're correct about why you're getting a segmentation fault.

I would make a member function that would take an fstream and read in the data from it as well as an inverse function which would take an fstream and write it's contents to it to be restored later, like this:

class MyClass {
pubic:
    MyClass() : str() { }

    void serialize(ostream& out) {
        out << str;
    }

    void restore(istream& in) {
        in >> str;
    }

    string& data() const { return str; }

private:
    string str;
};

MyClass c;
c.serialize(output);

// later
c.restore(input);

You can also define operator<< and operator>> to work with istream and ostream to serialize and restore your class as well if you want that syntactic sugar.

Share:
21,000
iwasinnamuknow
Author by

iwasinnamuknow

Updated on July 22, 2022

Comments

  • iwasinnamuknow
    iwasinnamuknow almost 2 years

    I'm not a c++ expert but I've serialized things a couple of times in the past. Unfortunately this time I'm trying to serialize a class which contains an std::string, which I understand is pretty much like serializing a pointer.

    I can write out the class to a file and read it back in again. All int fields are fine, but the std::string field gives an "address out of bounds" error, presumably because it points to data which is no longer there.

    Is there a standard workaround for this? I don't want to go back to char arrays, but at least I know they work in this situation. I can provide code if necessary, but I'm hoping I've explained my problem well.

    I'm serializing by casting the class to a char* and writing it to a file with std::fstream. Reading of course is just the reverse.

    • RocketR
      RocketR over 12 years
      IMO, you'll have to dump string data manually. Prepare a plain structure which has a char buffer and the string length and serialize it instead of original object.
    • john
      john over 12 years
      It ssems to be the only real issue is how you delimit the string, but you would face that issue with a char array as well. I'm not getting where you are having trouble, seems very easy to serialise a string to me. Probably you better post some code.
    • osgx
      osgx over 12 years
      The java has standard serialization (in standard library). C++ have no such functionality nor in the language nor in the STL library. There is an external libs to do such, e.g. boost can do this. Other variant is using of google's protocol buffers.
    • xtofl
      xtofl over 12 years
      Nitpicking: you're serializing an object.
    • iwasinnamuknow
      iwasinnamuknow over 12 years
      An intermediate structure does make sense to me. It does beg the question why I'm bothering with these strings in the first place, it seems to be a false economy in the long run.
    • iwasinnamuknow
      iwasinnamuknow over 12 years
      I will edit the original post with some code shortly. A little further explanation though. When the object is written, the ints are written as numbers, but the string is written as a pointer address instead of characters. Hence unless memory remains unchanged, the string is lost on reading.
    • john
      john over 12 years
      I'm guessing you are doing this out << &str;, that's the wrong way to do it.
    • RocketR
      RocketR over 12 years
      If you're on Linux, another good method is to construct an array of IOVs and give it to writev function(linux.die.net/man/2/writev) to write everything in one shot.
    • Oliver Charlesworth
      Oliver Charlesworth over 12 years
      @iwas: "False economy"? You mean apart from the automatic memory management, integration with streams, etc., etc.?
  • osgx
    osgx over 12 years
    LPTSTR is unportable (windows only).
  • Seth Carnegie
    Seth Carnegie over 12 years
    He didn't want to go back to using arrays though.
  • iwasinnamuknow
    iwasinnamuknow over 12 years
    Wouldn't that assume that the string is separate from anything else? I'm trying to have the entire class and its contents written/read in one go.
  • Benjamin Lindley
    Benjamin Lindley over 12 years
    If that's interspersed with other data, or if the string has spaces in it, the input will not be accurate.
  • 6502
    6502 over 12 years
    Would that work with a string that contains spaces and/or newlines?
  • Oliver Charlesworth
    Oliver Charlesworth over 12 years
    @iwas: You cannot simply reinterpret a class as a char *. In general, serialization of objects requires (semi-)manually serializing each member variable in turn. I'm not quite sure what sort of solution you're looking for!
  • iwasinnamuknow
    iwasinnamuknow over 12 years
    Would the write/read actions act differently if used as member functions? I'm not really understanding how that would write the actual characters instead of the pointer address.
  • Oliver Charlesworth
    Oliver Charlesworth over 12 years
    @John: Good point, no it wouldn't. But then it wouldn't work with raw char *, either.
  • iwasinnamuknow
    iwasinnamuknow over 12 years
    I'm not dead against char arrays, but I've worked hard to use std::strings instead, tired of being told I was old fashioned. If they make things easier then I might have to go back.
  • jp2code
    jp2code over 12 years
    @osgx: I'm not saying store the LPTSTR. I'm saying serialize char[MAX] and read that into your string.
  • john
    john over 12 years
    @iwasinnamuknow: No write and read actions don't act differently when used as member function, what gives you that idea?
  • Seth Carnegie
    Seth Carnegie over 12 years
    @iwasinnamuknow It's using operator<< and >> of (i|o)stream on a string which is defined to write the contents of the string to file. You'd obviously have more data members than one string, so you'd just write them all to the output file and then read them in from the input file in the same order.
  • Seth Carnegie
    Seth Carnegie over 12 years
    @john it was just a quick example.
  • legion
    legion over 12 years
    @RocketR. Did i write union. Well fixed. You know its was quick past of code portions from some old my project files..
  • osgx
    osgx over 12 years
    you said it in very unportable manner. Both TCHAR and LPTSTR are windows terms and if the user don't know them, you said just a Vodou spell.
  • xtofl
    xtofl over 12 years
    Excellent idea. However it looks as if you show an example of the 'latter' - use boost, while you advise the 'former'...
  • john
    john over 12 years
    @Oli: This is the point surely, the OP is claiming that serialising a std::string is somehow harder than serialising a char array. That's the bit I don't get and until he explains himself I don't think we're going to get very far.
  • john
    john over 12 years
    They surely do not make things easier. But why not read into a char array, and then assign your char array to a string? Is that very hard?
  • iwasinnamuknow
    iwasinnamuknow over 12 years
    @oli I'm following one of the multitude of guides that simply suggest casting the class and writing it out. This does work except where a pointer is concerned. And the string is acting like a pointer. A static length char array works fine as well in my quick test. I'm trimming up some code now.
  • jp2code
    jp2code over 12 years
    There. I think that's portable. I'm not trying to tell the guy how to write robust code, I'm just throwing out an idea that could solve his problem.
  • iwasinnamuknow
    iwasinnamuknow over 12 years
    I haven't dug in to boost before but I'll check it out next. thanks
  • Oliver Charlesworth
    Oliver Charlesworth over 12 years
    @iwas: No, there are many more cases where that won't work. It can only possibly work for PODs (plain-old data structures).
  • iwasinnamuknow
    iwasinnamuknow over 12 years
    Looks interesting and not too disruptive to the existing code/workflow. I'll have a play. thanks
  • iwasinnamuknow
    iwasinnamuknow over 12 years
    @Oli I understand, I'm checking out the boost serialization now in addition to the other answers.
  • Benjamin Lindley
    Benjamin Lindley over 12 years
    (1) Your streams need to be passed by reference, istream and ostream copy constructors are disabled. (2) width and height and the size of the string will be concatenated together on output, so reading them back in will result in a single number.
  • john
    john over 12 years
    in.read(&name[0], len); that is surely wrong. You cannot treat a string like a vector. And even as a vector it would fail if len == 0.
  • Rudy Velthuis
    Rudy Velthuis over 12 years
    @john: agreed. An intermediate char *nameValue = new char[len + 1]; seems to be required.
  • Mooing Duck
    Mooing Duck over 12 years
    @Benjamin Lindley: whoops, I forgot to make them by reference. My bad.
  • Mooing Duck
    Mooing Duck over 12 years
    @John and Rudy: nope. std::string (non-const) operator[] returns a char&, so the address of that is a char* into the string. Since I've resized to the exact length, this is all defined behavior, and works. (Although it would have failed if len was zero)
  • 6502
    6502 over 12 years
    Writing using c_str() is going to create problems if the strings has embedded NUL chars (\0) because less characters will be written. You should either writing the correct number of characters looping over the string or write strlen(c_str()) instead of size if you want to drop characters after the first NUL. Writing size and c_str will make you reading corrupted data if a NUL is stored in a string.
  • Mooing Duck
    Mooing Duck over 12 years
    @John and Rudy: I stand corrected, std::string is not guaranteed to be contiguous. Temporary is required.
  • Mooing Duck
    Mooing Duck over 12 years
    Wish I could reset the votes and uncheck accepted answer, this code has changed quite a bit since it was accepted :(
  • Benjamin Lindley
    Benjamin Lindley over 12 years
    auto_ptr should not be used for arrays. It it only calls delete, not delete[]. Try a vector<char>, you can read directly into it from a file using &v[0].
  • Costantino Grana
    Costantino Grana about 2 years
    This would stop reading at the first space in the string.