Simple PDF Book from Multiple Jupyter Notebooks

If you want to create a simple PDF book using Jupyter notebooks, you probably don't need advanced tools like Quarto and JupyterBook.

Jupyter notebooks are excellent for interactive programming and combining documentation with code. However, sharing Jupyter notebooks can be challenging. For instance, some people may not have a Python environment installed, making it impossible for them to open and read the notebook. An alternative is to export notebooks to PDF documents, as they are universally accessible. However, this process becomes complicated if you have more than a dozen notebooks to share. Most existing tools focus on creating websites, and there are no dedicated tools for creating PDF books from multiple Jupyter notebooks. Upon closer examination of nbconvert, I found that creating a simple PDF from notebooks doesn't require a specialized tool.

When I refer to a simple PDF, I mean one that doesn't require advanced formatting, referencing, or citations. In such cases, you can effectively compile LaTeX content generated by nbconvert for each notebook. All you need are:

chap(\d+).tex: LaTeX content, i.e., everything between \begin{document} and \end{document} generated by nbconvert.
preamble.tex: LaTeX preamble generated by nbconvert.
main.tex: A file to manually combine everything.

Writing Templates to Extract Contents and Preamble #

To extract LaTeX content and preamble, you need two customized templates (content and header) for nbconvert. A template is a directory containing conf.json and index.tex.j2:

templates/
├── content
│   ├── conf.json
│   └── index.tex.j2
└── header
    ├── conf.json
    └── index.tex.j2

text

The conf.json file in both templates contains metadata for the template:

{
  "base_template": "latex",
  "mimetypes": {
    "text/latex": true,
    "text/tex": true,
    "application/pdf": true
  }
}

json

Here, we specify that the base_template is latex, the default template for LaTeX and PDF files in nbconvert.

To extract LaTeX content, place the following template into templates/content/index.tex.j2:

((*- extends 'latex/document_contents.tex.j2' -*))

% Render code cells
((* block input scoped *))
    \begin{Verbatim}[commandchars=\\\{\}, fontsize=\small]
((( cell.source | highlight_code(strip_verbatim=True, metadata=cell.metadata) )))
    \end{Verbatim}
((* endblock input *))

% Render markdown without citation and auto identifiers
((* block markdowncell scoped *))
    ((( cell.source | convert_pandoc('markdown+tex_math_double_backslash-auto_identifiers', 'latex') )))
((* endblock markdowncell *))

jinja

In this template, extends 'latex/document_contents.tex.j2' means this template inherits the document_contents.tex.j2 file in the base template (latex), which renders the body of the LaTeX file. The document_contents.tex.j2 file does not contain the template for code cell rendering, as this logic is handled by:

style_jupyter.tex.j2: Jupyter style code rendering, similar to the Jupyter web UI.
style_python.tex.j2: Python style code rendering, without In [] indicators or gray boxes.
style_ipython.tex.j2: IPython style code rendering, without gray boxes but with In [] indicators.

I copied the code from style_python.tex.j2, which I find to be the simplest and neatest. I also customized the markdown rendering by removing unnecessary logic and the auto_identifiers feature, avoiding duplicated auto-generated labels for each section, which can cause issues.

Tip for using Jupyter style: You don't need to copy the macro draw_cell; just import it like this:

((*- from 'latex/style_jupyter.tex.j2' import draw_cell with context -*))

jinja

To extract the LaTeX preamble, place the following template into templates/header/index.tex.j2:

((*- extends 'latex/style_python.tex.j2' -*))

((*- block docclass -*))
((*- endblock docclass -*))

((*- block body -*))
((* endblock body *))

jinja

This code is straightforward: we remove docclass and body, leaving only the LaTeX preamble. Again, I'm using the Python code rendering style here, and you may need to adjust it accordingly.

Writing Code for Conversion #

With the templates ready, the next step is to create a build script that uses nbconvert to perform the conversion. Here is a simple example:

import shutil
import subprocess
import sys
from pathlib import Path

from nbconvert import LatexExporter
from traitlets.config import Config

SOURCE_DIR = Path(__file__).parent / "notebooks"
TARGET_DIR = Path(__file__).parent / "tex"
TEMPLATE_DIR = Path(__file__).parent / "templates"


def sync_tex(ipynb_file: Path, tex_exporter: LatexExporter):
    tex_dir = TARGET_DIR / ipynb_file.parent.relative_to(SOURCE_DIR)
    tex_target = tex_dir / (ipynb_file.stem + ".tex")
    tex_dir.mkdir(parents=True, exist_ok=True)
    body, resources = tex_exporter.from_filename(ipynb_file)
    for resource_file, resource_content in resources["outputs"].items():
        with (TARGET_DIR / resource_file).open("wb") as f:
            f.write(resource_content)
    with tex_target.open("w", encoding="utf8") as f:
        f.write(body)


if __name__ == "__main__":
    if not TARGET_DIR.exists():
        TARGET_DIR.mkdir()
    c = Config()
    c.TemplateExporter.extra_template_basedirs = [
        str(TEMPLATE_DIR)
    ]
    c.TemplateExporter.template_name = "content"
    c.FilesWriter.build_directory = "asdf"
    tex_exporter = LatexExporter(config=c)

    print(SOURCE_DIR)
    for ipynb_file in SOURCE_DIR.rglob("*.ipynb"):
        if ipynb_file.parent.stem == ".ipynb_checkpoints":
            continue
        print(f"Processing {ipynb_file.relative_to(SOURCE_DIR)}")
        sync_tex(ipynb_file, tex_exporter)

    c.TemplateExporter.template_name = "header"
    header_exporter = LatexExporter(config=c)
    body, _ = header_exporter.from_filename(ipynb_file)
    with (TARGET_DIR / "preamble.tex").open("w", encoding="utf8") as f:
        f.write(body)

    if shutil.which("latexmk"):
        subprocess.run(["latexmk", "main.tex"], cwd=TARGET_DIR)

python

The above script does not handle image resources with the same name (body, resources = tex_exporter.from_filename(ipynb_file) line). You may need to modify it if your notebooks output many images.

Gluing Everything Together #

The final step is to create a main.tex. Here is a simple example:

\documentclass[12pt]{report}
\input{preamble}
% Add any customizations you like...
\begin{document}
\tableofcontents
\chapter{Chapter}
\include{chap1.tex}
\end{document}

tex

You can customize main.tex to include fancy title pages, style the pages, etc. Note that you may want to place your customized preamble below the \input preamble, as preamble.tex modifies the page geometry.

In conclusion, creating a simple PDF book from multiple Jupyter notebooks is a feasible task that doesn't require advanced tools. By leveraging nbconvert with custom templates, you can efficiently extract and compile LaTeX content into a nice PDF document.