Saving an Object (Data persistence)

328,927

Solution 1

You could use the pickle module in the standard library. Here's an elementary application of it to your example:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as outp:
    company1 = Company('banana', 40)
    pickle.dump(company1, outp, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, outp, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as inp:
    company1 = pickle.load(inp)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(inp)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

You could also define your own simple utility like the following which opens a file and writes a single object to it:

def save_object(obj, filename):
    with open(filename, 'wb') as outp:  # Overwrites any existing file.
        pickle.dump(obj, outp, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

Update

Since this is such a popular answer, I'd like touch on a few slightly advanced usage topics.

cPickle (or _pickle) vs pickle

It's almost always preferable to actually use the cPickle module rather than pickle because the former is written in C and is much faster. There are some subtle differences between them, but in most situations they're equivalent and the C version will provide greatly superior performance. Switching to it couldn't be easier, just change the import statement to this:

import cPickle as pickle

In Python 3, cPickle was renamed _pickle, but doing this is no longer necessary since the pickle module now does it automatically—see What difference between pickle and _pickle in python 3?.

The rundown is you could use something like the following to ensure that your code will always use the C version when it's available in both Python 2 and 3:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

Data stream formats (protocols)

pickle can read and write files in several different, Python-specific, formats, called protocols as described in the documentation, "Protocol version 0" is ASCII and therefore "human-readable". Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0, but in Python 3.8.1, it's Protocol version 4. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL added to it, but that doesn't exist in Python 2.

Fortunately there's shorthand for writing pickle.HIGHEST_PROTOCOL in every call (assuming that's what you want, and you usually do), just use the literal number -1 — similar to referencing the last element of a sequence via a negative index. So, instead of writing:

pickle.dump(obj, outp, pickle.HIGHEST_PROTOCOL)

You can just write:

pickle.dump(obj, outp, -1)

Either way, you'd only have specify the protocol once if you created a Pickler object for use in multiple pickle operations:

pickler = pickle.Pickler(outp, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

Note: If you're in an environment running different versions of Python, then you'll probably want to explicitly use (i.e. hardcode) a specific protocol number that all of them can read (later versions can generally read files produced by earlier ones).

Multiple Objects

While a pickle file can contain any number of pickled objects, as shown in the above samples, when there's an unknown number of them, it's often easier to store them all in some sort of variably-sized container, like a list, tuple, or dict and write them all to the file in a single call:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

and restore the list and everything in it later with:

with open('tech_companies.pkl', 'rb') as inp:
    tech_companies = pickle.load(inp)

The major advantage is you don't need to know how many object instances are saved in order to load them back later (although doing so without that information is possible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file? for details on different ways to do this. Personally I liked @Lutz Prechelt's answer the best, so that's the approach used in the sample code below:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickle_loader(filename):
    """ Deserialize a file of pickled objects. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickle_loader('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

Solution 2

I think it's a pretty strong assumption to assume that the object is a class. What if it's not a class? There's also the assumption that the object was not defined in the interpreter. What if it was defined in the interpreter? Also, what if the attributes were added dynamically? When some python objects have attributes added to their __dict__ after creation, pickle doesn't respect the addition of those attributes (i.e. it 'forgets' they were added -- because pickle serializes by reference to the object definition).

In all these cases, pickle and cPickle can fail you horribly.

If you are looking to save an object (arbitrarily created), where you have attributes (either added in the object definition, or afterward)… your best bet is to use dill, which can serialize almost anything in python.

We start with a class…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>> 

Now shut down, and restart...

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>> 

Oops… pickle can't handle it. Let's try dill. We'll throw in another object type (a lambda) for good measure.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>> 

And now read the file.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>    

It works. The reason pickle fails, and dill doesn't, is that dill treats __main__ like a module (for the most part), and also can pickle class definitions instead of pickling by reference (like pickle does). The reason dill can pickle a lambda is that it gives it a name… then pickling magic can happen.

Actually, there's an easier way to save all these objects, especially if you have a lot of objects you've created. Just dump the whole python session, and come back to it later.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>> 

Now shut down your computer, go enjoy an espresso or whatever, and come back later...

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

The only major drawback is that dill is not part of the python standard library. So if you can't install a python package on your server, then you can't use it.

However, if you are able to install python packages on your system, you can get the latest dill with git+https://github.com/uqfoundation/dill.git@master#egg=dill. And you can get the latest released version with pip install dill.

Solution 3

Quick example using company1 from your question, with python3.

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

However, as this answer noted, pickle often fails. So you should really use dill.

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))

Solution 4

You can use anycache to do the job for you. It considers all the details:

  • It uses dill as backend, which extends the python pickle module to handle lambda and all the nice python features.
  • It stores different objects to different files and reloads them properly.
  • Limits cache size
  • Allows cache clearing
  • Allows sharing of objects between multiple runs
  • Allows respect of input files which influence the result

Assuming you have a function myfunc which creates the instance:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache calls myfunc at the first time and pickles the result to a file in cachedir using an unique identifier (depending on the function name and its arguments) as filename. On any consecutive run, the pickled object is loaded. If the cachedir is preserved between python runs, the pickled object is taken from the previous python run.

For any further details see the documentation

Solution 5

Newer versions of pandas has also a functionality to save pickles.

I find it easier. e.g.

pd.to_pickle(object_to_save,'/temp/saved_pkl.pickle' )
Share:
328,927
Peterstone
Author by

Peterstone

I am Electronic Engineer I am interesting in politics I am interested in business I am interested in artificial intelligent I am interested in learning productivity techniques based on the synergy of teams I am interested in learning about daily synergy way of working and living. I am interested in create a synergy community I am interested in tools to create agents to collect and select information by exploring the web or chatting with people. I am interested in FPGAs implementations of Neural Networks

Updated on January 20, 2022

Comments

  • Peterstone
    Peterstone over 2 years

    I've created an object like this:

    company1.name = 'banana' 
    company1.value = 40
    

    I would like to save this object. How can I do that?

  • Peterstone
    Peterstone over 13 years
    This is rare to me because I imagined there would be a easier way to do save a object... Something like 'saveobject(company1,c:\mypythonobjects)
  • martineau
    martineau over 13 years
    @Peterstone: If you only wanted to store one object you would only need about half as much code as in my example -- I purposefully wrote it the way I did to show how more than one object could be saved into (and later read back from) the same file.
  • Harald Scheirich
    Harald Scheirich over 13 years
    @Peterstone, there is a very good reason for the separation of responsibilities. This way there is no limitation on how the data from the pickling process is being used. You can store it to disc or you could also send it accross a network connection.
  • martineau
    martineau over 13 years
    @Harald Scheirich: Could you please elaborate on what you mean about a "separation of responsibilities" -- I'm not exactly sure to what you are referring.
  • Harald Scheirich
    Harald Scheirich over 13 years
    @martinaeau, this was in response to perstones remark about one should have just one function to save an object to disk. The pickles responsibility is only to turn an object into data that can be handled as a chunk. Writing things to file is the file objects responsibility. By keeping things separate one enables higher reuse e.g. being able to send the pickled data accross a network connection or storing it in a database, all responsibilities separate from the actual data<->object conversion
  • martineau
    martineau over 13 years
    @Harald Scheirich: Ah, I see...and agree ;-) but doubt Peterstone is concerned nor able appreciate the finer points of the high level design of the various Python modules and how they fit together at this stage.
  • MikeiLL
    MikeiLL over 9 years
    I'm getting a TypeError: __new__() takes at least 2 arguments (1 given) when trying to use dill (which looks promising) with a rather complex object that includes an audio file.
  • Mike McKerns
    Mike McKerns over 9 years
    @MikeiLL: You are getting a TypeError when you do what, exactly? That's usually a sign of having the wrong number of arguments when instantiating a class instance. If this is not part of the workflow of the above question, could you post it as another question, submit it to me over email, or add it as an issue on the dill github page?
  • martineau
    martineau over 9 years
    For anyone following along, here's the related question @MikeLL posted -- from the answer, it apparently wasn't a dill issue.
  • Mike McKerns
    Mike McKerns over 8 years
    You delete company1 and company2. Why don't you also delete Company and show what happens?
  • martineau
    martineau about 8 years
    @Mike: Primarily because doing so has little bearing on the question at hand, nor is it something most folks reading the thread are likely to ever want to do. But also because I strongly suspect the primary motivation for asking it is nothing more than a veiled attempt to publicize the dill module you wrote — which certainly sounds like it might come in handy in some situations.
  • Mike McKerns
    Mike McKerns about 8 years
    @martineau: it's not a veiled attempt... when you have a hammer, you find you see that there are nails everywhere. Anyway... it's a valid point for me to raise… and it has been asked several times on SO.
  • martineau
    martineau about 8 years
    @Mike: Sorry, I don't think this question is the right kind of nail. FWIW, I think an effective way to promote dill would be to more clearly state what it can do that pickle can't on its download page, rather than proposing its use to solve issues unrelated to the problem at hand in various SO posts. If there's a consensus that it adequately addresses serious deficiencies folks are commonly encountering while trying to use pickle, perhaps it should be made part of the standard library.
  • Mike McKerns
    Mike McKerns about 8 years
    @martineau: I disagree. Look at the question. It's a pretty strong assumption to assume that the object the OP is asking about is a class. If it's not a class, then pickle will fail you in most cases. My answer is a more general one, thus the right type of nail. I was picking at a scab a bit in the comments of your answer, but it's still in my opinion a valid point. You have every right to disagree. …and I'm not really trying to promote dill, I'm just answering the question with all the tools I have in my toolbox.
  • Varlor
    Varlor almost 7 years
    What is if i create my objects in a for loop (because of different parameter combinations). The have to be named in a different way to get later acess to it, isnt it. And if yes, then how could i create different names for my objects in a for loop?
  • martineau
    martineau almost 7 years
    @Varlor: You could create different names by defining a counter that is initialized just before entering the for loop, and then inside it to generate each filename. i.e. filename = 'myname{}.pkl'.format(count), followed by count += 1. However it would probably be better to put all the objects in a list and pickle.dump() it, which will save all of them at once in a single file and preserve the order they appear in the list. You can also put them in a dictionary and dump() that.
  • martineau
    martineau almost 6 years
    How would one use anycache to save more than one instance of, say, a class or container such as a list (that wasn't the result of calling a function)?
  • Sandro
    Sandro almost 5 years
    @martineau: You promote the idea to use pickle.HIGHEST_PROTOCOL. But isn't that a bit risky in the longer term? If you store your object in format version x from Python 3.6 but way later switch to a newer version of Python where pickle.HIGHEST_PROTOCOL refers to format version x+1 then you might not be able anymore to load your data, right? This is why I think it would be better to consciously choose the currently highest version. Then you can update Python and still migrate the format later.
  • Sandro
    Sandro almost 5 years
    Or is the version used to store the object(s) part of the data so the reading process knows the version and can always load the data with the correct version?
  • martineau
    martineau almost 5 years
    @Sandro: Yes, the protocol used is stored in the file. Later versions should be able to read a files produced by a previous versions. In an environment where a mixture of Python versions are being used, what you say makes sense. I "promoted" the use of HIGHEST_PROTOCOL because it generally is going to produce the smallest (and probably the fastest) results.
  • Sandro
    Sandro almost 5 years
    @martineau Thanks for the answer, that's very good to know! Others might wonder about that aspect as well so you could touch on that in your answer if you want.
  • martineau
    martineau almost 5 years
    @Sandro: Added a little something about it to the Data stream formats section.
  • Farid Alijani
    Farid Alijani over 4 years
    dill gives me MemoryError though! so does cPickle, pickle and hickle.
  • Ma0
    Ma0 over 4 years
    Thanks for the very informative answer! Just one thing; what is the save_object function you are using?
  • martineau
    martineau over 4 years
    @Ev.Kounis: It's defined near the beginning of the answer.
  • alper
    alper over 2 years
    During read operation I am getting following error for dill RecursionError: maximum recursion depth exceeded would it be possible to over come this?
  • Mike McKerns
    Mike McKerns over 2 years
    @alper: you are getting a RecursionError for the above example, or for something else? Best to post a new question and refer to this one if the code is related.
  • alper
    alper over 2 years
    @MikeMcKerns I was getting it for much complex object
  • Mike McKerns
    Mike McKerns over 2 years
    A RecursionError is a known result for some objects. Maybe you've found one of those cases. Post a new SO question, or open a GitHub issue if you want help.