Docbook
Docbook is a typesetting and layout tool for authors. Specifically, it is an XML schema intended to produce both print and electronic copies of text. Combined with any one of many powerful xml processors available, it achieves the goal of letting an author write once and publish to anything. It is widely used in technical documentation (such as the first few editions of Slackermedia itself), but also for works of fiction, academia, and more.
Strengths [Weaknesses]
Familiar
XML is vastly different than HTML, but the concept is very similar. If you are good with HTML, then XML will feel like the “pro” version of what you already know.
Strict
XML is rigid in what its processors accept, so there is an absolutism to how you structure your documents. This may help you organise information better, and it guarantees predictable output in the end. You won't spend time shifting indents and special meta-characters around in your text editor; you will spend time writing content within a well-structured framework.
Documented
XML is a long-standing format, and Docbook is a well-respected schema. Docbook is well documented on http://docbook.org and XML is so well-known that you can take classes on the subject.
[Ex]Portable
One your text is in XML format, it is structured and predictable. This probably means that if there is another format (html, epub, pdf, ps, plain text, rtf, odt, and so on) that you want to output to, you can convert to it from XML. There just isn't any ambiguity about XML, and heaps of post-processors.
Weaknesses [Strengths]
Complex
The process of creating well-formed XML is not simple. It is a very verbose format, it will fail at the smallest error, it enforces inheritance, and it requires some number of post-processors in order to get it out of the XML format.
Strict
Unlike markdown or HTML, XML is intolerant of any deviation from its defined schema. Something as simple as a missing closing tag will break the processor. There are tools, such as xmllint
to help ensure well-formed XML, but it is not uncommon to attempt at least three builds before a successful one.
Style
The look of documents output from Docbook are clean and professional, but to change the look and feel of your output, you probably need to learn XSL. XSL can be complex, especially if you have only just learnt XML and how to process it.
Install
Docbook is not an application, but a schema, meaning that it is nothing more than a set of rules that you follow whilst writing text in any plain text editor of your choice. If you have ever used HTML, it's a little like that; you don't install HTML, you just write it, and other programmes bear the burden of interpreting it and processing it into a form for public consumption.
The docbook schema, along with a number of XML tools, comes pre-installed on Slackware.
To find where you schemas are located, use the find
command:
find / -iname "*docbook*dtd*"
This reveals that the schemas are located in /usr/share/xml/docbook/xml-dtd-X.Y
(where X.Y
is a version number).
Quickstart
The best quickstart guide to Docbook is a short work by David Rugge, Mark Galassi, and Eric Bischoff and located at http://xml.web.cern.ch/XML/goossens/dbatcern/dbatcern.html.
Here is a basic summary, featuring a severely limited set of functionality:
Docbook Header
The Docbook header is a line of text at the top of a Docbook file which identifies the file as being an XML document following the Docbook schema, and points to where the schema's rules are located on your computer (or a networked location, if you have confidence in your network environment).
<?xml version='1.0' encoding='utf-8' ?> <!DOCTYPE book PUBLIC "-//OASIS//DTD Docbook XML V4.5//EN" "/usr/local/share/xml/docbook/4.5/docbookx.dtd">
Docbook can format articles or books; so your header should match what you intend to write:
- If writing a book, the header should include:
DOCTYPE book PUBLIC
- If you are writing an article, then:
DOCTYPE article PUBLIC
You may never commit the header to memory unless you type it daily, so keep it someplace handy.
The structure of a Docbook file is inflexible. If you are writing an article, then the order of tags is defined by the article
schema, and if you are writing a book then the order of tags is defined by the book
schema. Any deviation from the schema rules will result in invalid XML and usually will be rejected by an XML processor.
The easiest way to learn what a particular schema demands is straight from the Docbook documentation, available online at http://www.docbook.org/tdgXY/en/html/docbook.html (where tdgXY
is the version of Docbook that you are using).
A simple example of a Docbook file using the book
schema:
<?xml version='1.0' encoding='utf-8' ?> <!DOCTYPE book PUBLIC "-//OASIS//DTD Docbook XML V4.5//EN" "/usr/local/share/xml/docbook/4.5/docbookx.dtd"> <book> <title> Texto Ekzempla </title> <chapter> <title> Kiel Libera Programmaro Intigas Arton </title> <para> Elsxuti <ulink url="http://slackware.com">Linukso</ulink>, kaj provu gxin. </para> </chapter> </book>
To some degree, it is intuitive as long as you know the tags that you have to work with, and the order of the basic skeleton structure. In the case of a book
, the basic structure is:
- book
- book title
- chapter
- chapter title
- paragraphs
- (close chapter tag)
- (close book tag)
This file, saved, is a valid Docbook file, but it is essentially source code.
Processing XML
The easiest XML processor to use is xmlto
, an application run in the shell with the sole purpose of translating XML into any number of other formats.
For example, to convert the book into an html file:
$ mkdir html $ xmlto html book.xml -o ./html
Or into plain text:
$ xmlto txt book.xml -o book.txt
There are several ways to generate a PDF, but the most reliable tends to be the Apache Foundation's fop
tool. Fop is written in Java, so it does require that the Java runtime is installed.
Apache fop
Install Java (called jdk
; the Java Development Kit) from either the /extra
directory on your install media, or from http://slackbuilds.org/repository/X.Y/development/jdk/
(where X.Y
is the version of Slackware that you are running). As Java is currently owned and maintained by Oracle, it requires a EULA agreement that can only be satisfied with a GUI, so you must manually download the Java installer, but you can then run the SlackBuild installer separately so that you have Java properly logged in /var/log/packages
.
Once JDK is installed, install fop
from http://slackbuilds.org/repository/14.1/office/fop/?search=fop.
To create a PDF, first translate your Docbook file to fo
with xmlto
:
$ mkdir ./pdf $ xmlto fo book.xml -o ./pdf $ fop ./pdf/book.fo -o ./pdf/book.pdf
Wait for the document to process, and at the end you have, in the pdf
directory that you created, a PDF file. Even the hyperlink is fully-functional, just like in a “real” PDF (because it is a real PDF, fully compliant with the spec).
Advanced Techniques
In large works, there is an advantage to keeping your writing modular. Structuring your work such that each chapter is an individual file allows your text editors to load and work on them faster, and makes rearranging the order of the chapters trivial.
If you choose to work modularly, you only need your Docbook header in the first file, and you should only close your <book> tag in the final file. For example:
00.xml
: docbook header + <book><title></title>01.xml
: <chapter><title></title><para></para><para>..</para></chapter> (and so on)02.xml
: <chapter><title></title><para></para><para>..</para></chapter> (and so on)end.xml
: </book>
The file end.xml
can literally have nothing but one tag in it: </book>
or, if you have a colophon or appendix, you can place them in your end matter; the point is to wrap the modular files in two extremities (00.xml
and end.xml
, for example) to ensure that the outermost docbook tags are included once and only once.
Concatenating and Building
Once you have all of your files ready, there is just one additional step to what you already know: you must concatenate all of your files into a temporary master file.
Assuming you have all of your files in a directory called xml
:
$ cd ~/mybook/xml $ cat 00.xml 01.xml 03.xml end.xml > tmp.xml $ mkdir pdf $ xmlto fo tmp.xml ./pdf $ fop ./pdf/tmp.fo ./pdf/mybook.pdf $ trash tmp.xml
Makefiles
The entire build process can be automated with a Makefile
, a kind of script for GNU make
.
They have a specific syntax: keyword
→ colon
→ required files
→ instruction block
This becomes executable by typing make keyword
.
A specific example:
# Makefile by myName html: html cat ??.xml end.xml > tmp.xml xmlto html tmp.xml -o ./html pdf: pdf cat ??.xml end.xml > tmp.xml xmlto fo tmp.xml -o ./pdf
Each block, somewhat translated:
[keyword]: [what file must exist to proceed] [tab] The command to run. [tab] Another command to run.
To use it, run make
from the directory where your makefile exists:
$ cd mybook $ ls -1 xml Makefile $ mkdir html $ make html
By scripting with make
, the build process is simpler than building manually, and does not require you to remember commands or syntax.
See Also
Sphinx
Fountain
Screenwriter-mode