Matteo Franchin's corner

16 Jun 2012: Dox, the documentation system

The documentation system of Box, Dox, is a sort of Doxygen for Box, a system which scans a set of Box source files and inspects comments looking for documentation. The documentation is gathered and displayed in a proper format. In particular, Dox is what provides data to populate the documentation browser. If you don’t know what the documentation browser is, try typing CTRL+H in Boxer (if you don’t know what Boxer is, try You should get this:


Screenshot of the documentation browser of Boxer.


Dox was written in a few nights and started as a quick hack. It was originally a bunch of Python source files, which were separated from the main GUI. It ended up becoming a program doing something similar to what Doxygen does: it scanned a directory containing Box source code, extracted documentation from the comments and put it together in HTML pages. With time, however, I realised that it could be used to generate interactive documentation. I probably had in mind the - honestly pretty ugly - OCaml documentation browser (ocamlbrowser). I then added it to the Boxer GUI and I gave its own separate window. The documentation browser of Boxer was born.

Enough about history. The documentation browser is now a very important part of the project. It is the door through which the user approaches the libraries of the language. It is now quite useful and will become central in future releases of the project. For this reason I rewrote it from scratch. The main issue here was to make it more methodic, to decide how to organise the work. I copied below some notes on how this is done.

How the documentation system works

During the Dox parse process, the documentation in a selected set of Box source files is read and the documentation tree is generated. The documentation tree (DoxTree in file dox/ is an intermediate data structure which contains all the documentation information, organised in a convenient way. The tree can thus be used to generate the final output. The entire documentation process in Dox can thus be subdivided in two phases:

  1. a set of files is read and parsed and a documentation tree is generated,
  2. the documentation tree is translated into one of the available output formats.

Below I give a brief summary of how phase 1 above is carried out. Phase 2 is somewhat simpler and less interesting. I’ll skip it for now (or leave it for antoher post).

PHASE 1: Parsing of documentation and generation of the parse tree

The parsing itself is subdivided as follows:

  1. An empty DoxTree object is created,
  2. Each file in the set of input source files is opened in sequence,
  3. The file is subdivided into “portions” of different type. I call these portions “text slices”. A typical file is made by source slices (Box source code) iterleaved by comment slices (regular comments) and documentation slices. The documentation slices are special types of comments which contain documentation comments (starting with (** or ///),
  4. The slices are concatenated in an ordered list. Each text slice is made aware of what text slices are preceding and following it.
  5. Slices are interpreted. Documentation slices are mapped to DoxBlock objects. Each block is linked to its originating text slice,
  6. Each documentation block is associated to a target (i.e. the “thing” the block is documenting). Blocks may be themselves documentation targets for other blocks.
  7. Each block is given an opportunity to inspect the current context and modify it, if necessary. The concept of context is mainly used to organize nodes and put them into sections. The current context contains the active section of documentation, the name of the file which is being analysed, etc.
  8. DoxBlock objects are given an opportunity to generate a node of the tree. Basically, the method DoxBlock.add_node is called and the tree is passed as an argument. The block can thus manipulate the tree. Nodes are linked to their blocks and blocks are linked to their nodes.
  9. After doing all this for each source file, the DoxTree.process method is called. This method does global processing of the tree (while what we have seen so far was per-file processing). Missing types are detected, subtypes are associated to their parent types, etc.

The tree is now complete. It contains all the information we need to write an HTML file with the documentation or to populate the controls of the Dox browser.