Jupyter (IPython) notebook: Convert an HTML notebook to ipynb

25,398

I recently used BeautifulSoup and JSON to convert html notebook to ipynb. the trick is to look at the JSON schema of a notebook and emulate that. The code selects only input code cells and markdown cells

here is my code

from bs4 import BeautifulSoup
import json
import urllib.request
url = 'http://nbviewer.jupyter.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb'
response = urllib.request.urlopen(url)
#  for local html file
# response = open("/Users/note/jupyter/notebook.html")
text = response.read()

soup = BeautifulSoup(text, 'lxml')
# see some of the html
print(soup.div)
dictionary = {'nbformat': 4, 'nbformat_minor': 1, 'cells': [], 'metadata': {}}
for d in soup.findAll("div"):
    if 'class' in d.attrs.keys():
        for clas in d.attrs["class"]:
            if clas in ["text_cell_render", "input_area"]:
                # code cell
                if clas == "input_area":
                    cell = {}
                    cell['metadata'] = {}
                    cell['outputs'] = []
                    cell['source'] = [d.get_text()]
                    cell['execution_count'] = None
                    cell['cell_type'] = 'code'
                    dictionary['cells'].append(cell)

                else:
                    cell = {}
                    cell['metadata'] = {}

                    cell['source'] = [d.decode_contents()]
                    cell['cell_type'] = 'markdown'
                    dictionary['cells'].append(cell)
open('notebook.ipynb', 'w').write(json.dumps(dictionary))

here is part of print(soup.div) output

div class="container">
<div class="navbar-header">
<button class="navbar-toggle collapsed" data-target=".navbar-collapse" data-toggle="collapse" type="button">
<span class="sr-only">Toggle navigation</span>
<i class="fa fa-bars"></i>
</button>
<a class="navbar-brand" href="/">
<img src="/static/img/nav_logo.svg?v=479cefe8d932fb14a67b93911b97d70f" width="159"/>
</a>
</div>
<div class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li>
<a class="active" href="http://jupyter.org">JUPYTER</a>
</li>
<li>
<a href="/faq" title="FAQ">
<span>FAQ</span>

A screen shot of the resulting ipynb file, loaded on my local jupyter and after running all the cells

enter image description here

Share:
25,398
foglerit
Author by

foglerit

Updated on February 11, 2022

Comments

  • foglerit
    foglerit about 2 years

    I have converted a Jupyter/IPython notebook to HTML format and subsequently lost the original ipynb file.

    Is there a simple way to generate the original notebook file from the converted HTML file?

  • foglerit
    foglerit over 6 years
    That's great. Thanks for sharing.
  • mdev
    mdev almost 5 years
    Works like a charm! I just had to install lxml (pip install lxml) and ipynb created!
  • drpawelo
    drpawelo almost 4 years
    ❤️extra basic how-to steps 1. create a new file intonotebook.py Open it code editor (not in Word) 2. copy-paste the first block of code from this answer. 3. Change the top line 4 to your file the web. but if file's on your computer, put # in front of lines 4 and 5, and remove # before line 7. Then change line 7 to where your html file is (# means a 'comment'). make sure there are no spaces at the beginning of lines you edited. save the file. 4. open terminal, go to the folder your created the file and type python intonotebook.py. 5. To change name of output file, change last line
  • THN
    THN over 2 years
    Is it possible to keep the cell's output in the converted .ipynb file?
  • sgDysregulation
    sgDysregulation over 2 years
    removing the line cell['outputs'] = [] should allow for the output to be kept