How to pickle or store Jupyter (IPython) notebook session for later

python ipython ipython-notebook jupyter jupyter-notebook

56,603

Solution 1

I think Dill answers your question well.

pip install dill

Save a Notebook session:

import dill
dill.dump_session('notebook_env.db')

Restore a Notebook session:

import dill
dill.load_session('notebook_env.db')

Source

Solution 2

(I'd rather comment than offer this as an actual answer, but I need more reputation to comment.)

You can store most data-like variables in a systematic way. What I usually do is store all dataframes, arrays, etc. in pandas.HDFStore. At the beginning of the notebook, declare

backup = pd.HDFStore('backup.h5')

and then store any new variables as you produce them

backup['var1'] = var1

At the end, probably a good idea to do

backup.close()

before turning off the server. The next time you want to continue with the notebook:

backup = pd.HDFStore('backup.h5')
var1 = backup['var1']

Truth be told, I'd prefer built-in functionality in ipython notebook, too. You can't save everything this way (e.g. objects, connections), and it's hard to keep the notebook organized with so much boilerplate codes.

Solution 3

This question is related to: How to cache in IPython Notebook?

To save the results of individual cells, the caching magic comes in handy.

%%cache longcalc.pkl var1 var2 var3
var1 = longcalculation()
....

When rerunning the notebook, the contents of this cell is loaded from the cache.

This is not exactly answering your question, but it might be enough to when the results of all the lengthy calculations are recovered fast. This in combination of hitting the run-all button on top of the notebook is for me a workable solution.

The cache magic cannot save the state of a whole notebook yet. To my knowledge there is no other system yet to resume a "notebook". This would require to save all the history of the python kernel. After loading the notebook, and connecting to a kernel, this information should be loaded.

56,603

Author by

redacted

Updated on June 22, 2020

Comments

redacted almost 4 years

Let's say I am doing a larger data analysis in Jupyter/Ipython notebook with lots of time consuming computations done. Then, for some reason, I have to shut down the jupyter local server I, but I would like to return to doing the analysis later, without having to go through all the time-consuming computations again.

What I would ~~like~~ love to do is pickle or store the whole Jupyter session (all pandas dataframes, np.arrays, variables, ...) so I can safely shut down the server knowing I can return to my session in exactly the same state as before.

Is it even technically possible? Is there a built-in functionality I overlooked?

EDIT: based on this answer there is a %store magic which should be "lightweight pickle". However you have to store the variables manually like so:

#inside a ipython/nb session
foo = "A dummy string"
%store foo
closing seesion, restarting kernel
%store -r foo # r for refresh
print(foo) # "A dummy string"

which is fairly close to what I would want, but having to do it manually and being unable to distinguish between different sessions makes it less useful.
redacted over 8 years

This is a very interesting workaround, but I can literally feel the pain associated with maintaining such system. Thanks for the tip tho :)
redacted almost 6 years

fails when there are generators (which kind of makes sense when I think about it), but it seems that this is as close we can hope for!
Michael Szczepaniak over 4 years

Worked great for me. Couple things to keep in mind: First, If you have pyodbc connection objects hanging around, you'll need to close them and then set them all to None otherwise, you get a "TypeError: can't pickle pyodbc.Connection objects" error. Second, the notebook state does not include graphs that were generated by your code, so you'll need to rerun the cells to bring these back.
Jaya A about 4 years

But it doesn't work I used the saved file on another machine
cheznead over 3 years

Installed dill. Do I do import dill dill.dump_session('notebook_env.db') from command line?
BoreBoar over 3 years

No, you'll need to do it while running the Jupyter notebook. Both the dump_session and load_session should be through the notebook. Your load_session can be at the start of the notebook. And the dump_session can be at the very end of the notebook.
Orhan Solak over 3 years

I do not recommend this way to apply. Once you dumped the session, you are not able to recover back. Even if you uninstall dill library. Basically, avoid to use this library.
Meet about 3 years

This is a good workaround. Just to put it out, this solution will probably require installing tables module to be able to create the backup file.