Python serialization - Why pickle?

52,885

Solution 1

Pickling is a way to convert a python object (list, dict, etc.) into a character stream. The idea is that this character stream contains all the information necessary to reconstruct the object in another python script.

As for where the pickled information is stored, usually one would do:

with open('filename', 'wb') as f:
    var = {1 : 'a' , 2 : 'b'}
    pickle.dump(var, f)

That would store the pickled version of our var dict in the 'filename' file. Then, in another script, you could load from this file into a variable and the dictionary would be recreated:

with open('filename','rb') as f:
    var = pickle.load(f)

Another use for pickling is if you need to transmit this dictionary over a network (perhaps with sockets or something.) You first need to convert it into a character stream, then you can send it over a socket connection.

Also, there is no "compression" to speak of here...it's just a way to convert from one representation (in RAM) to another (in "text").

About.com has a nice introduction of pickling here.

Solution 2

Pickling is absolutely necessary for distributed and parallel computing.

Say you wanted to do a parallel map-reduce with multiprocessing (or across cluster nodes with pyina), then you need to make sure the function you want to have mapped across the parallel resources will pickle. If it doesn't pickle, you can't send it to the other resources on another process, computer, etc. Also see here for a good example.

To do this, I use dill, which can serialize almost anything in python. Dill also has some good tools for helping you understand what is causing your pickling to fail when your code fails.

And, yes, people use picking to save the state of a calculation, or your ipython session, or whatever. You can also extend pickle's Pickler and UnPickler to do compression with bz2 or gzip if you'd like.

Share:
52,885
kiriloff
Author by

kiriloff

AI, natural language processing

Updated on March 14, 2020

Comments

  • kiriloff
    kiriloff about 4 years

    I understood that Python pickling is a way to 'store' a Python Object in a way that does respect Object programming - different from an output written in txt file or DB.

    Do you have more details or references on the following points:

    • where are pickled objects 'stored'?
    • why is pickling preserving object representation more than, say, storing in DB?
    • can I retrieve pickled objects from one Python shell session to another?
    • do you have significant examples when serialization is useful?
    • does serialization with pickle imply data 'compression'?

    In other words, I am looking for a doc on pickling - Python.doc explains how to implement pickle but seems not dive into details about use and necessity of serialization.

  • moooeeeep
    moooeeeep over 12 years
    usually one would do with open('filename') as f: ...
  • Tim Pietzcker
    Tim Pietzcker over 12 years
    Also, you would need to do with open(filename, 'wb') as f: ... or you wouldn't be able to write to the file.
  • kiriloff
    kiriloff over 12 years
    Thanks!! This one on Python persistence management is nice, here
  • jfs
    jfs over 12 years
    In general it is not a very good idea to use pickle to transmit a dictionary over a network (json could be better here). Though in rare cases it might be useful e.g., multiprocessing module.
  • jfs
    jfs over 12 years
    @Tim Pietzcker: protocol=0 (default on Python2.x) can be used with files opened in text mode.
  • Tim Pietzcker
    Tim Pietzcker over 12 years
    @J.F.Sebastian: OK, but he opened the file for reading, not for writing.
  • austin1howard
    austin1howard over 12 years
    Geez, this is what happens when you write the code here absentmindedly and don't actually debug. =O
  • FullMetalScientist
    FullMetalScientist over 2 years
    In Python2, cPickle is faster than pickle, but in Python3 cPickle is integrated in pickle so we can just use pickle, I think