Building docs for different targets with Sphinx (with bonus Unicode support)

Let’s say you want to build your docs in different formats. I already alluded to html versus dirhtml, but the distinction between those is relatively small; it’s just a matter of directory structure. What if I want to build my docs as PDFs or ePubs or something else? Well, Sphinx can do it.

By default, sphinx-build supports a number of outputs. The ones I’m most likely to use are the following:

html: to make standalone HTML files
dirhtml: to make HTML files named index.html in directories
singlehtml: to make a single large HTML file
epub: to make an epub
latexpdf: to make LaTeX and PDF files (default pdflatex)
text: to make text files
man: to make manual pages
gettext: to make PO message catalogs
doctest: to run all doctests embedded in the documentation (if enabled)
coverage: to run coverage check of the documentation (if enabled)

And of those, let’s face it, it’s dirhtml, epub, latexpdf. If you’re building towards a constrained set of targets, you can usually more easily work on styling them and using the particular affordances those formats offer.

But I’ve gotta tell you something more than just make dirhtml and make latexpdf, right? Styling and themes are for another post (and especially with PDF, are a hairy topic), but let’s talk about character encoding here. Being as I am, of course what I write uses a lot more of Unicode than just the ASCII-overlapping space. By default, TeX/LaTeX does not handle even simple things like accented characters well, let alone alchemical symbols.

So you have to use a stronger (more modern? more Unicode-capable? more culturally aware?) TeX engine. This involves editing conf.py to set the following:

latex_engine = 'xelatex'

I like xelatex. You could also use lualatex. If you know enough to care about which you choose, I trust you can make that choice well, and if you don’t, either will do!

That’ll get you most of the way there. You can include UTF-8 characters in your source and they should get built correctly when you make a PDF. You can muck around with LaTeX packages to further control things via mechanisms like the latex_elements['preamble'] value in conf.py, too:

latex_elements = {
# For example:
'preamble': r'''
\usepackage{fontspec}
''',
}

But maybe you don’t want to use those alchemical symbols I was talking about, not just the occasional “é”. And you don’t wanna type them into your reStructuredText source all the time; maybe your text editor doesn’t even display them well, but you want the resulting HTML or PDF to display them correctly.

Let’s use rst_epilog and substitution definitions. In conf.py:

rst_epilog = """
.. |hg-sub| unicode:: U+1F712 .. mercury sublimate
"""

Let’s talk about that line. .. indicates that this line is special to reStructuredText; based on what follows, it knows it’s a substitution definition. The stuff between | is what to look for in the source, and substitute with what follows. Then there’s unicode:: which says “use the following Unicode codepoint”, in this case U+1F712. Then there’s another .. which indicates the definition is done, and what follows is a human-readable reminder of what that codepoint is.

With the rst_epilog set, Sphinx will append that block to each .rst file it builds, making those definitions available in that source. Now, in my reStructuredText source, every time I need to write “?”, I can write |hg-sub|, and the build process will substitute the correct Unicode in, and then xelatex will build that TeX source correctly. Be sure LaTeX is using a font that includes the glyphs you’re using though; it won’t automatically fall back to a font that contains them.

Maybe you don’t need alchemical symbols, but you need em- and en-dashes. The documentation has you covered.

Oh, also: Read the Docs doesn’t yet support xelatex or lualatex, and so this won’t work if you’re building your docs there. But they’re working on it, and I hope will support it soon!