Common use-cases for pickle in Python

23,573

Solution 1

Some uses that I have come across:

1) saving a program's state data to disk so that it can carry on where it left off when restarted (persistence)

2) sending python data over a TCP connection in a multi-core or distributed system (marshalling)

3) storing python objects in a database

4) converting an arbitrary python object to a string so that it can be used as a dictionary key (e.g. for caching & memoization).

There are some issues with the last one - two identical objects can be pickled and result in different strings - or even the same object pickled twice can have different representations. This is because the pickle can include reference count information.

To emphasise @lunaryorn's comment - you should never unpickle a string from an untrusted source, since a carefully crafted pickle could execute arbitrary code on your system. For example see https://blog.nelhage.com/2011/03/exploiting-pickle/

Solution 2

Minimal roundtrip example..

>>> import pickle
>>> a = Anon()
>>> a.foo = 'bar'
>>> pickled = pickle.dumps(a)
>>> unpickled = pickle.loads(pickled)
>>> unpickled.foo
'bar'

Edit: but as for the question of real-world examples of pickling, perhaps the most advanced use of pickling (you'd have to dig quite deep into the source) is ZODB: http://svn.zope.org/

Otherwise, PyPI mentions several: http://pypi.python.org/pypi?:action=search&term=pickle&submit=search

I have personally seen several examples of pickled objects being sent over the network as an easy to use network transfer protocol.

Solution 3

I have used it in one of my projects. If the app was terminated during it's working (it did a lengthy task and processed lots of data), I needed to save the whole data structure and reload it after the app was run again. I used cPickle for this, as speed was a crucial thing and the size of data was really big.

Solution 4

Pickling is absolutely necessary for distributed and parallel computing.

Say you wanted to do a parallel map-reduce with multiprocessing (or across cluster nodes with pyina), then you need to make sure the function you want to have mapped across the parallel resources will pickle. If it doesn't pickle, you can't send it to the other resources on another process, computer, etc. Also see here for a good example.

To do this, I use dill, which can serialize almost anything in python. Dill also has some good tools for helping you understand what is causing your pickling to fail when your code fails.

And, yes, people use picking to save the state of a calculation, or your ipython session, or whatever.

Solution 5

Pickle is like "Save As.." and "Open.." for your data structures and classes. Let's say I want to save my data structures so that it is persistent between program runs.

Saving:

with open("save.p", "wb") as f:    
    pickle.dump(myStuff, f)        

Loading:

try:
    with open("save.p", "rb") as f:
        myStuff = pickle.load(f)
except:
    myStuff = defaultdict(dict)

Now I don't have to build myStuff from scratch all over again, and I can just pick(le) up from where I left off.

Share:
23,573
satoru
Author by

satoru

Curious programmer.

Updated on July 14, 2022

Comments

  • satoru
    satoru almost 2 years

    I've looked at the pickle documentation, but I don't understand where pickle is useful.

    What are some common use-cases for pickle?

  • lunaryorn
    lunaryorn almost 14 years
    One should not transfer pickled objects over network or other untrusted channels, unless the pickled data is carefully secured against manipulation. The pickle documentation explicitly warns to never unpickle data from untrusted or unauthenticated sources.
  • Dave Kirby
    Dave Kirby almost 14 years
    @lunaryorn: good point. If you are going to transfer pickled data between machines then use a secure channel such as SSL or SSH tunnelling.
  • L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳
    L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ over 13 years
    Then you are still trusting the endpoint not to exploit you, which may or not be okay, depending on context.
  • Mike McKerns
    Mike McKerns about 9 years
    your "answer" is not an answer, it's more of a comment. The OP's question is "What are some common use-cases for pickle?". Do you feel you have answered that question in any way?
  • Bad
    Bad about 9 years
    well, I feel that I have answered the question because I also had difficulties to understand common uses of pickle when I tried to read about this module here, here and here. Because mostly they begin to explain what pickle does assuming that you know the motivation behind the whole concept of serialization. After I read simple wiki article on serialization I grasped the general idea as well as "common cases". Maybe it'll help somebody...
  • Mike McKerns
    Mike McKerns about 9 years
    and some of those common cases are…? If there are some that are not listed here in other answers… adding them to your answer would be very appropriate.
  • Pardeep Sharma
    Pardeep Sharma about 6 years
    @lunaryorn - good point but in that case how can we encrypt data in public domain. do we have to use some other py lib or not to use pickle
  • salotz
    salotz over 5 years
    Is point 4) true? I found this which has some (old) evidence that wouldn't work here.
  • Dave Kirby
    Dave Kirby over 5 years
    @salotz see the following paragraph where I note that pickling the same datastructure twice may result in different strings. Whether this is an issue or not depends on the context. If you using it for a cache key to improve performance then the occasional cache miss may not be significant. YMMV.
  • salotz
    salotz over 5 years
    Are you talking about this? I am not looking to have the cache you describe I am looking to hash them as identifiers, and I am trying to figure out if and how that can be done. I'm guessing if there are no references then it should be okay. I just don't see this clearly documented anywhere, how did you figure this out?
  • DeusXMachina
    DeusXMachina over 2 years
    Using pickle to get an identifier as in 4) is almost always the wrong pattern. As in, folks who do this already know exactly what they are doing with pickle and why. As Dave Kirby mentions, this is fine when you can accept a cache miss, and your pickling time is fast enough that it's worth the tradeoff. Most of the time (e.g. novice to middling experience python devs) you want to do something to implement __hash__(). This allows it to be used as keys in a dict/set.
  • DeusXMachina
    DeusXMachina over 2 years
    3) is not a great example, IMHO. One should think long and hard about why they would do this, and whether they can't just convert it to a database row, or json-encoded string. It can be a complete pain to deal with pickles stored long-term in a database. Basically willingness to use pickle should be inverse to the expected lifetime of the data.