Jupyter notebooks are excellent for interactive programming and combining documentation with code. However, sharing Jupyter notebooks can be challenging. For instance, some people may not have a Python environment installed, making it impossible for them to open and read the notebook. An alternative is to export notebooks to PDF documents, as they are universally accessible. However, this process becomes complicated if you have more than a dozen notebooks to share. Most existing tools focus on creating websites, and there are no dedicated tools for creating PDF books from multiple Jupyter notebooks. Upon closer examination of nbconvert
, I found that creating a simple PDF from notebooks doesn't require a specialized tool.
When I refer to a simple PDF, I mean one that doesn't require advanced formatting, referencing, or citations. In such cases, you can effectively compile LaTeX content generated by nbconvert
for each notebook. All you need are:
chap(\d+).tex
: LaTeX content, i.e., everything between\begin{document}
and\end{document}
generated bynbconvert
.preamble.tex
: LaTeX preamble generated bynbconvert
.main.tex
: A file to manually combine everything.
Writing Templates to Extract Contents and Preamble #
To extract LaTeX content and preamble, you need two customized templates (content
and header
) for nbconvert
. A template is a directory containing conf.json
and index.tex.j2
:
templates/
├── content
│ ├── conf.json
│ └── index.tex.j2
└── header
├── conf.json
└── index.tex.j2
The conf.json
file in both templates contains metadata for the template:
{
"base_template": "latex",
"mimetypes": {
"text/latex": true,
"text/tex": true,
"application/pdf": true
}
}
Here, we specify that the base_template
is latex
, the default template for LaTeX and PDF files in nbconvert
.
To extract LaTeX content, place the following template into templates/content/index.tex.j2
:
((*- extends 'latex/document_contents.tex.j2' -*))
% Render code cells
((* block input scoped *))
\begin{Verbatim}[commandchars=\\\{\}, fontsize=\small]
((( cell.source | highlight_code(strip_verbatim=True, metadata=cell.metadata) )))
\end{Verbatim}
((* endblock input *))
% Render markdown without citation and auto identifiers
((* block markdowncell scoped *))
((( cell.source | convert_pandoc('markdown+tex_math_double_backslash-auto_identifiers', 'latex') )))
((* endblock markdowncell *))
In this template, extends 'latex/document_contents.tex.j2'
means this template inherits the document_contents.tex.j2
file in the base template (latex
), which renders the body of the LaTeX file. The document_contents.tex.j2
file does not contain the template for code cell rendering, as this logic is handled by:
style_jupyter.tex.j2
: Jupyter style code rendering, similar to the Jupyter web UI.style_python.tex.j2
: Python style code rendering, withoutIn []
indicators or gray boxes.style_ipython.tex.j2
: IPython style code rendering, without gray boxes but withIn []
indicators.
I copied the code from style_python.tex.j2
, which I find to be the simplest and neatest. I also customized the markdown rendering by removing unnecessary logic and the auto_identifiers
feature, avoiding duplicated auto-generated labels for each section, which can cause issues.
Tip for using Jupyter style: You don't need to copy the macro draw_cell
; just import it like this:
((*- from 'latex/style_jupyter.tex.j2' import draw_cell with context -*))
To extract the LaTeX preamble, place the following template into templates/header/index.tex.j2
:
((*- extends 'latex/style_python.tex.j2' -*))
((*- block docclass -*))
((*- endblock docclass -*))
((*- block body -*))
((* endblock body *))
This code is straightforward: we remove docclass
and body
, leaving only the LaTeX preamble. Again, I'm using the Python code rendering style here, and you may need to adjust it accordingly.
Writing Code for Conversion #
With the templates ready, the next step is to create a build script that uses nbconvert
to perform the conversion. Here is a simple example:
import shutil
import subprocess
import sys
from pathlib import Path
from nbconvert import LatexExporter
from traitlets.config import Config
SOURCE_DIR = Path(__file__).parent / "notebooks"
TARGET_DIR = Path(__file__).parent / "tex"
TEMPLATE_DIR = Path(__file__).parent / "templates"
def sync_tex(ipynb_file: Path, tex_exporter: LatexExporter):
tex_dir = TARGET_DIR / ipynb_file.parent.relative_to(SOURCE_DIR)
tex_target = tex_dir / (ipynb_file.stem + ".tex")
tex_dir.mkdir(parents=True, exist_ok=True)
body, resources = tex_exporter.from_filename(ipynb_file)
for resource_file, resource_content in resources["outputs"].items():
with (TARGET_DIR / resource_file).open("wb") as f:
f.write(resource_content)
with tex_target.open("w", encoding="utf8") as f:
f.write(body)
if __name__ == "__main__":
if not TARGET_DIR.exists():
TARGET_DIR.mkdir()
c = Config()
c.TemplateExporter.extra_template_basedirs = [
str(TEMPLATE_DIR)
]
c.TemplateExporter.template_name = "content"
c.FilesWriter.build_directory = "asdf"
tex_exporter = LatexExporter(config=c)
print(SOURCE_DIR)
for ipynb_file in SOURCE_DIR.rglob("*.ipynb"):
if ipynb_file.parent.stem == ".ipynb_checkpoints":
continue
print(f"Processing {ipynb_file.relative_to(SOURCE_DIR)}")
sync_tex(ipynb_file, tex_exporter)
c.TemplateExporter.template_name = "header"
header_exporter = LatexExporter(config=c)
body, _ = header_exporter.from_filename(ipynb_file)
with (TARGET_DIR / "preamble.tex").open("w", encoding="utf8") as f:
f.write(body)
if shutil.which("latexmk"):
subprocess.run(["latexmk", "main.tex"], cwd=TARGET_DIR)
The above script does not handle image resources with the same name (body, resources = tex_exporter.from_filename(ipynb_file)
line). You may need to modify it if your notebooks output many images.
Gluing Everything Together #
The final step is to create a main.tex
. Here is a simple example:
\documentclass[12pt]{report}
\input{preamble}
% Add any customizations you like...
\begin{document}
\tableofcontents
\chapter{Chapter}
\include{chap1.tex}
\end{document}
You can customize main.tex
to include fancy title pages, style the pages, etc. Note that you may want to place your customized preamble below the \input
preamble, as preamble.tex
modifies the page geometry.
In conclusion, creating a simple PDF book from multiple Jupyter notebooks is a feasible task that doesn't require advanced tools. By leveraging nbconvert
with custom templates, you can efficiently extract and compile LaTeX content into a nice PDF document.