I am writing a book on Julia programming which contains a number of pictures, illustrations, mathematical equations and code samples.
I went through a fair amount of problems with this, that I want to share some experiences and give a guide to how to go about this for those who may plan to do this.
My first attempt was using Apple Pages. I gave up on that and switched to using Julia’s own documentation system based on the Documenter package. This is based on Markdown which I use a lot and is quite fond of.
E.g. this story is written using markdown in iA Writer.
Finally I ended up converting my Julia markdown to pandoc style markdown.
As you can see I went through quite a lot of choices and considered many other different approaches only to discard them.
I made a number of detours and mistakes which is why I want to write down what I have learned here.
Fancy “Time Saving” Solutions
On of the appeals of using Julia Documenter is that you can write code snippets in the text which will get executed and the results automatically put into the book.
You can even create tests around the code, as well as linking code examples together. So for instance if one code sample says:
foo = 3
Then the next code example would not work unless it was linked to the first.
bar = foo*10
I thought this looked really neat initially. I figured it would save me time getting all my code examples correct.
Except I found it to be basically a huge waste of time. All too often I needed flexibility in what code I showed and how. Getting all the code samples to link together so that everything would work, because very tedious busy work.
And in the end the output often didn’t turn out the way I wanted. This has been my general experience:
Fancy, supposedly time saving solutions. Usually don’t save you time. Keep things simple. Use simple technology solutions.
For instance if I have an example of a creating a
Knight object in example ex1 on the Julia REPL I would end up with output like this:
julia> black = Knight(35)
But I wold normally not want the ex1 prefix to show, as it would normally be an internal ID for my examples. This is just an example of the kind of annoyances which would pop up when you try to get fancy.
Avoid Web Service Solutions
For a period I tried to write in the Markua format because that is sort of the officially supported format by Leanpub.
However this was a really frustrating experience. I would have to upload source code through git or I could write an a web based editor. Regardless of solution picked, it would run a web service to generate my book.
This became a too slow and tedious way to work. If you have formatting issues it is too tedious to do multiple changes and test out of they solve the problem since every time you make a change you need to wait for a web service.
In the end I discovered Markua, simply did not have good support for LaTeX and even if it did, I would not easily get hold of proper LaTeX error messages.
Pandoc to the Rescue
So in the end I went for Pandoc. It allowed me to generate epub, pdf, HTML and many other types of output from markdown input.
The Pandoc manual is extensive but it doesn’t really cover well what any realistic eBook project needs to do. Pandoc lets you do things in many different ways, can easily confuse you about the most optimal way of doing things.
E.g. initially I did everything on the command line like this:
$ pandoc -s --pdf-engine=xelatex -o julia-beginners.pdf \
preface.md intro.md overview.md \
Or rather I essentially constructed this command line argument in a Julia script and ran that. Some people create this command line in a bash shell.
Yet there is no need to do it this way, and it isn’t a very organized way of doing it.
Use Default Files
Instead you should use default files. This is just a text file listing all the options you are using. As you can see in the link the Pandoc manual gives you a full overview of what you can put in there.
I got a default file called
epub-settings.txt which I use to generate my epub files from markdown files. To do that, I write the following on the command line:
$ pandoc --defaults epub-settings.txt
The file looks something like what you see below. Here it lists customs CSS files I use. What format I am converting to as well as the input markdown files.
The properties as the same as the switch names. So if you can write
pandoc --css awesome.css then you can also put
css: awesome.css in the defaults file.
How to Setup PDF Generation
Generating PDF files correctly is a bit more involved than doing epub files because it involves LaTeX which is a huge chunk of machinery which is easy to get lost it.
I made numerous mistakes here, which you ought to be aware of.
Pandoc PDF Pitfalls
Here are mistakes I made:
xelatex. The default
pdflatexcannot handle unicode correctly.
- Falsely assumed that any font use that works in epub would work in LaTeX.
- Used old outdated LaTeX version. Make sure you got the latest.
- Thought that I didn’t have to know or understand anything about LaTeX to make a PDF using Pandoc.
Let us get more into the details.
There are basically three LaTeX versions today
lualatex. If you download e.g. the MacTeX distribution for maOS you will get all of them.
pdflate you can forget about because it doesn't support Unicode. That means greek letters, smileys etc will not work. So say you want to write π, you cannot have the unicode π in there. You have to actually write
That may be fine if you are only writing in LaTeX, but my source markdown file contained code examples in Julia with variables named π.
Hence when generating pdf files you need the
pdf-engine: xelatex in your defaults file or write
--pdf-engine=xelatex on the command line.
I could have picked
lualatex as well, but that is mainly for people who want extensive customization and automation through lua code.
xelatex at the moment seems to generate slightly higher quality documents.
Be Aware that LaTeX Fonts are Pain
To get fonts to work in pretty much any Mac, Windows or Linux software a whole bunch of complex stuff happens.
If a particular font does not exist in a font family then the OS tries a fallback font instead.
This is utilized all the time because almost no font family contains fonts for every possible unicode characters.
E.g. if I make a long arrow like this
⟶, while using "Avenir Next," that font family may not actually contain a glyph for this character. But that is fine, a fallback font will be used.
LaTeX however does not have this kind of fallback mechanism, so you have to set it up yourself. But where do you do all this custom setup you ask?
Well look at my defaults file for PDF:
# Gets put in LaTeX header before
# beginning of document
# Needed to support unicode
include-in-header property. This allows me to specify a
.tex file with extra options I want to use together with the default latex template used by Pandoc. Every type of document Pandoc generates has a default template. You can use this command to see what default template Pandoc uses for LaTeX:
$ pandoc -t latex --print-default-template=latex
You could take this and use as a foundation for a new template you edit by hand. You specify your own custom header with the
However I find that messy. You end up with a much bigger file to edit by hand.
Instead I prefer to just add a few LaTeX instructions to the header using the
Handle Fallback for Unicode Symbols
So to handle Unicode Symbols I add the following to my
header.tex file. Note, the file can be named anything, but this is the name I gave to the
What all these LaTeX command do is to handle the fallback.
But this also sets my main font to be
George. Also notice that to provide Emoji icons I use the
This font is actually not included as default with macOS. Instead Apple uses the “Apple Color Emoji” font. However this is deceptive and will not work with LaTeX. It is a
.ttc font file rather than a
.ttf (True Type Font) file.
Apple’s Emoji are not really fonts but bitmap graphics. It is a trick to get colored little icons in the middle of your text. However LaTeX has no understanding of this. It is made for fonts. So you need a real
.ttf font for smileys. That is why you need to download Symbola font and install it (drag and drop on font book).
To configure stuff such as tile, author of book etc. You provide that info in the front matter. You could put that in any
.md file, but I have put it in a separate file called
It can literally be put in the middle of the file as long as you have the
... at the beginning and end. This is convention for the YAML format. Static site generators often use the same approach. E.g. when I write blog posts in Hugo static site generator, I also put this "front matter" in my markdown blog posts. That is to add meta data such as category, tag, date, author etc.
So pandoc using this as well, just follows established conventions in the web/markdown world I suppose.
title: Julia for Beginners
subtitle: From Romans to Rockets
author: Erik Engheim
rights: All rights reserved
In an ideal world, this is all you need, and you don’t have to provide the
header.tex file. A number of these options e.g. directly configure your LaTeX template. It is used to insert things like paper size and document class properties used by LaTeX.
When epub files are generated, this will be ignored.
mainfont is used for your regular text while
monofont is used for the source code samples.
Although both settings get overridden by what I do in my
header.tex to setup fallback fonts (unicode).
Generating a proper epub file not without challenges either. E.g. I could not get my source code to show if I used a particular language for my source code highlight. Only occasionally worked.
For your epub to work on different devices they want you to run an epub v3.2 validation. You can do this as web service but that is just annoying. Get a local solution.
You can get the EPUBCheck software published by W3C. It is Java software you can download and run on any platform.
Here is how I check my epub file:
$ java -jar epubcheck.jar julia-beginners.epub
This uncovered a whole bunch of problems. E.g. my SVG files used other encodings than UTF-8 which it doesn’t like. E.g. many of my SVG files have the first line in the file looking like this.
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
But epub3 would rather have them look like this.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
But before you can make that change you have to check whether you need to convert the text file. In my case all the these
ISO-8859-1 files turned out to be valid
UTF-8 files. But you have to verify that first. On macOS you can use the
$ file -I *.svg
If the files say ASCII or UTF-8 your are fine. Otherwise you may have to run the
iconv command to perform a text encoding conversion.
iconv -f ISO-8859-1 -t UTF-8 oldfile.svg > newfile.svg
The next problem was external references with the epub3 verifier did not like. Lines like this, refer to a DTD file online, but epub files should be self contained.
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
This was solved by simply deleting that line.
Diagnosing Problems with Calibre
When running EPUBCheck software to find all sorts of errors in your epub file, you need a simple way of locating these errors. You could just unpack the epub file as they are just zip files, and look inside.
However I find it more pragmatic to use the open source Calibre software. It allows you to edit an epub file and look inside it. It also offers a handy preview for each file you edit. The errors from EPUBCheck look like this:
ERROR(RSC-005): julia-beginners.epub/EPUB/media/file48.svg(27,31): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
This tells you the file inside your epub zip archive where the error was. In this particular example, the error is in
file48.svg file at line 27.
In retrospect I should probably have simply stayed with Apple Pages. Once you have to deal with LaTeX e.g. you are really pulled down a rabbit hole and can spend hours googling to figure out stuff.
In the end you need to make a decision about your time and effort. If you are an academic and use LaTeX a lot anyway then the route I took may have been worth it.
But I normally spend most of my time writing in Markdown. I do have an affinity for LaTeX but it is cumbersome to use for web stuff. E.g. if you want to write blog posts. Does not help you much for epub files.
And it has an enormous legacy to drag around. You can do a lot as a LaTeX wizard but that will require substantial investment in time to become. If you don’t know if you will use LaTeX that much, I am not sure if it is worth it.
Also generating an epub file that looks good is not trivial.
It is easy to think the more programmer heavy tools will save you time in the long run. However in the end I wasted a lot of time trying to get line numbers and syntax highlighting working only to decide against it anyway.
My idea was that by putting everything in plain text and part of a more programmer oriented pipeline, I could more easily do things like test that my source code examples still work. I can more easily modify a style all across the document.
However that never worked out anyway. It proved too cumbersome to link up various code examples.
Instead you should simply keep separate code repository with example code, and with tests. Preferably with more complete solutions rather than small code snippets used in the book, which would simply be cumbersome to sync up.
How to use Apple Pages Correctly
Some of my reservations against Apple Pages came from not using it correctly. To use it effectively you need to create a proper Template.
A template in Pages is a collection of Master Pages. When you create a new document, it is based off a Pages Template.
However within the document each page you add is based off a Master page. A master is basically a template for an individual page in your document.
While reused on each master you have paragraph styles which says what color, background and font should be used for a paragraph.
Within a paragraph individual words or letter can belong to a Character Style.
We can illustrate this hierarchy defining your document template.
└── Master Pages
└── Paragraph Styles
└── Character Styles
Templates come in two flavors:
- Page layout
You should choose the latter. This is what gives you desktop publishing style layout. You organize all your text in boxes. These boxes can be linked so text flows from one box to another.
The reason think this is worth selecting is because this is what really gives you the strength in using a visual editing tool. It allows you to create more enticing layout. E.g. you can put source code on one side of the page and description of the code on the other. Epub3 format supports this kind of page oriented layout.
When setting this up familiarize yourself with the
Format > Advanced > Define as Media Placeholder as well as the
Format > Advanced > Define as Placeholder Text menu entries.
This is for parts of you page which you want to be replaceable by the user.
Also be aware that text boxes and images can have tags. You see that under the style properties.
These are for things such as heading, subheading, body (main text) and caption (text under images).
I use this for marking particular words as as referring to source code identifiers such as function names or type names.
I might also use particular character styles for letter used in math equations.