How do I read binary pickle data first, then unpickle it?

python serialization pickle

21,306

Solution 1

pickle.load(file) expects a file-like object. Instead, use:

pickle.loads(string)

Read a pickled object hierarchy from a string. Characters in the string past the pickled object’s representation are ignored.

Solution 2

The documentation mentions StringIO, which I think is one possible solution.

Try:

f = open("big_networkx_graph.pickle","rb")
bin_data = f.read()
sio = StringIO(bin_data)
graph_data = pickle.load(sio)

21,306

Author by

conradlee

Data engineer and analyst who contracts for Parsely. Interested in machine learning and network analysis. Here are some of my past projects: Got a PhD in computational social science (with a focus on clustering social network data, but also looked at supervised ML problems like link prediction). Wrote these papers in that time. Founded a machine learning startup called Synference, which was later acquired by Optimizely. Wrote a data blog sociograph.blogspot.com Moved to Europe from the US, and (luckily) got stuck in Vienna

Updated on July 09, 2022

Comments

conradlee almost 2 years
I'm unpickling a NetworkX object that's about 1GB in size on disk. Although I saved it in the binary format (using protocol 2), it is taking a very long time to unpickle this file---at least half an hour. The system I'm running on has plenty of system memory (128 GB), so that's not the bottleneck.

I've read here that pickling can be sped up by first reading the entire file into memory, and then unpickling it (that particular thread refers to python 3.0, which I'm not using, but the point should still be true in python 2.6).

How do I first read the binary file, and then unpickle it? I have tried:
```
import cPickle as pickle
f = open("big_networkx_graph.pickle","rb")
bin_data = f.read()
graph_data = pickle.load(bin_data)
```
But this returns:
```
TypeError: argument must have 'read' and 'readline' attributes
```
Any ideas?