Jupyter (IPython) notebook: Convert an HTML notebook to ipynb
25,398
I recently used BeautifulSoup and JSON to convert html notebook to ipynb. the trick is to look at the JSON schema of a notebook and emulate that. The code selects only input code cells and markdown cells
here is my code
from bs4 import BeautifulSoup
import json
import urllib.request
url = 'http://nbviewer.jupyter.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb'
response = urllib.request.urlopen(url)
# for local html file
# response = open("/Users/note/jupyter/notebook.html")
text = response.read()
soup = BeautifulSoup(text, 'lxml')
# see some of the html
print(soup.div)
dictionary = {'nbformat': 4, 'nbformat_minor': 1, 'cells': [], 'metadata': {}}
for d in soup.findAll("div"):
if 'class' in d.attrs.keys():
for clas in d.attrs["class"]:
if clas in ["text_cell_render", "input_area"]:
# code cell
if clas == "input_area":
cell = {}
cell['metadata'] = {}
cell['outputs'] = []
cell['source'] = [d.get_text()]
cell['execution_count'] = None
cell['cell_type'] = 'code'
dictionary['cells'].append(cell)
else:
cell = {}
cell['metadata'] = {}
cell['source'] = [d.decode_contents()]
cell['cell_type'] = 'markdown'
dictionary['cells'].append(cell)
open('notebook.ipynb', 'w').write(json.dumps(dictionary))
here is part of print(soup.div)
output
div class="container">
<div class="navbar-header">
<button class="navbar-toggle collapsed" data-target=".navbar-collapse" data-toggle="collapse" type="button">
<span class="sr-only">Toggle navigation</span>
<i class="fa fa-bars"></i>
</button>
<a class="navbar-brand" href="/">
<img src="/static/img/nav_logo.svg?v=479cefe8d932fb14a67b93911b97d70f" width="159"/>
</a>
</div>
<div class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li>
<a class="active" href="http://jupyter.org">JUPYTER</a>
</li>
<li>
<a href="/faq" title="FAQ">
<span>FAQ</span>
A screen shot of the resulting ipynb file, loaded on my local jupyter and after running all the cells
Author by
foglerit
Updated on February 11, 2022Comments
-
foglerit about 2 years
I have converted a Jupyter/IPython notebook to HTML format and subsequently lost the original ipynb file.
Is there a simple way to generate the original notebook file from the converted HTML file?
-
foglerit over 6 yearsThat's great. Thanks for sharing.
-
mdev almost 5 yearsWorks like a charm! I just had to install
lxml
(pip install lxml
) and ipynb created! -
drpawelo almost 4 years❤️extra basic how-to steps 1. create a new file
intonotebook.py
Open it code editor (not in Word) 2. copy-paste the first block of code from this answer. 3. Change the top line 4 to your file the web. but if file's on your computer, put # in front of lines 4 and 5, and remove # before line 7. Then change line 7 to where your html file is (# means a 'comment'). make sure there are no spaces at the beginning of lines you edited. save the file. 4. open terminal, go to the folder your created the file and typepython intonotebook.py
. 5. To change name of output file, change last line -
THN over 2 yearsIs it possible to keep the cell's output in the converted .ipynb file?
-
sgDysregulation over 2 yearsremoving the line
cell['outputs'] = []
should allow for the output to be kept