MBD: The Presentation Phase

MBD Home

Overview

Concepts

Modeling

Semantic Wikis

The Extraction Phase

The Analysis Phase

The Presentation Phase

  Output Formats
  Markup Languages
  Documentation Generators
  Eye Candy
  HTML, redux
  Page Types

Case Study (FSW)

Case Study (Unix)

Advice

Tools

Books


Rich Morin, rdm@cfcl.com

Printable Version

The concepts page presents a data flow "model" for a Model-based Documentation (MBD) suite. In this model, the presentation phase is responsible for generating various output formats, using assorted tools (e.g., markup languages, documentation generators).

Output Formats

Most documentation needs can be readily met by one of two output formats: PDF or HTML (and friends). Although there is some overlap, it is usually easy to decide which format is appropriate.

  • Adobe's Portable Document Format (PDF) is generally used for read-only distribution of "formal" documents (e.g., technical papers) in which precise layout is important. Although it is possible to "reflow" some PDF documents, formatting is generally left alone: documents are either printed or viewed as page images.

    Most PDF viewers provide little interactive capability. Keyword search is usually supported, as is text clipping, etc. Hyperlinks are possible (in recent versions of PDF), but are rarely used in practice. (See the Extraction page for pointers to PDF-specific tools, reference material, etc.)

  • Hypertext Markup Language (HTML) tends to be used for less formal content. It supports a variety of interactive mechanisms, ranging from simple links to image maps, tooltips, and Dynamic HTML (DHTML), as discussed below.

There are many ways to generate these formats, but only a few of them are relevant to mechanized production. Consequently, we'll focus our attention on this subset. We'll start with markup languages, then move on to other tools and some web-specific technologies.

Markup Languages

Markup languages such as HTML allow text to be "marked up" with formatting and other ancillary notations. An appropriate tool (e.g., a web browser) can interpret the markup, producing formatted text, hypertext links, etc. Using markup languages, MBD systems can generate attractive PDF documents, web pages, and more.

    Note: Structured Generalized Markup Language (SGML) and Extensible Markup Language (XML) aren't really markup languages in the same sense that HTML is. Rather, they are "metalanguages" which can be used to create markup languages. Finally, YAML Ain't Markup Language (YAML) is a data serialization language, not a markup language at all.

If you're just producing web pages, HTML is a direct and obvious choice. Even with Cascading Style Sheets (CSS), HTML doesn't allow precise control of formatting, but this is seldom needed.

Unfortunately, HTML can't generate arbitrary PDF documents. Many other markup languages are available, however, offering a wide range of features. Here are three popular "families" of markup languages, to get you started...

  • DocBook

    DocBook is a presentation-neutral markup language for technical documentation: it can generate PDF files, web pages, and assorted other formats, using a single set of input files. DocBook is well suited to books, manual sets, and other large-scale documentation projects.

    By the same token, DocBook has a substantial learning curve, involving large amounts of SGML and/or XML arcana and complicated tool chains. See the DocBook Project for more information.

  • TeX

    The TeX suite (including LaTeX and other associated packages) is very powerful, documented by dozens of books, and supported by an active user community. Building and configuring a distribution can be daunting, however. Do yourself a favor and start with a "canned" distribution (e.g., TeX Live, from the TeX Users Group (TUG).

    Texinfo is the documentation format of the GNU Project. Although Texinfo predates the Web, it has built-in hypertext capabilities. It can be used to generate HTML pages and many other output formats, including DVI, PDF, and XML.

  • Troff

    Troff has been a part of Unix since the earliest days. It is still used for the Unix "man pages", but its role in document production has largely been taken over by other programs. In addition, it provides little support for hyperlinks, image maps, and other web-related features.

    Nonetheless, the Troff suite (including pre-processors such as eqn, grap, pic, and tbl) is still very handy for mechanized document production. It is available in most Unixish distributions, either as a variation on the original Bell Labs version or as Groff (the GNU Project's re-implementation).

Documentation Generators

Documentation generators are actually specialized MBD suites, documenting software-related entities such as data structures, functions, and modules. The results are generally published as web pages, but other formats are sometimes available. In particular, collected information may be available in a form (e.g., XML) that is suitable for follow-on analysis.

Most documentation generators perform a variant of screen scraping, parsing the source code (and specially-formatted comments). Consequently, they are specific to particular programming languages. A few, however, work strictly from comments or binary files. If you are documenting a software project, be sure to investigate this class of tools.

Eye Candy

There are a number of tools that can generate "eye candy" (e.g., diagrams, images, plots) for documents. The trick, of course, is to generate useful eye candy. Here, in any case, are some potentially useful tools.

  • GIMP

    The GNU Image Manipulation Program (GIMP) is generally used interactively, but it can also be used in batch mode, by means of scripting extensions such as Script-Fu.

  • Gnuplot

    Gnuplot is a very portable and versatile tool for plotting numerical data. It works well with GNU Octave, a system for performing numerical (e.g., scientific) computations.

  • Graphviz

    The Graphviz suite contains several useful tools for "graph visualization". I've made extensive use of the dot program, which lays out diagrams of directed graphs. It can be used to generate data flow diagrams, entity-relationship diagrams, and more.

  • ImageMagick

    ImageMagick is an extremely capable suite (read, "Swiss Army Knife") of image manipulation tools. It can be used from the command line or as a library for any of several programming languages.

  • Troff pre-processors

    As mentioned above, a number of pre-processors have been written for Troff. Two of these (grap and pic) can be useful for generating images: grap does simple data graphing; pic does constraint-based diagram layout. Although grap is not part of groff, an Open Source version is available.

HTML, redux

There are many ways to generate content for web pages. HTML can be edited by hand, generated directly by a script, or translated from a markup language. Eye candy (e.g., images) has a similar wealth of sources. Now, let's talk about pulling it all together.

Modern web servers and browsers are capable of handling far more than simple HTML. Here are some variations and enhancements which are worth considering for use in MBD:

  • Cascading Style Sheets

    Many webmasters use Cascading Style Sheets (CSS) to give web pages a consistent "look", while simplifying their definitions. The same principle can be applied to mechanically-generated pages, but the simplification also extends to the generating programs.

  • Downloadable Files

    If you've gone to the trouble of collecting a bunch of data, why not make it available for easy follow-on analysis? Comma-separated Value (CSV) files, for example, are trivial to generate and can be loaded into a wide variety of spreadsheet programs. With a bit more work, you can generate spreadsheet files, complete with fancy formatting, headers, formulas, etc.

    More generally, try to think of useful files that you can offer for downloading. Most web servers can handle almost any sort of data: bits are bits, after all...

  • Image Maps

    Some programs can generate client-side image maps, allowing images to serve as navigational aids. I have made heavy use of dot to generate context diagrams for web pages. The diagrams show related pages (and their inter-relationships), provide informative "pop-up" messages (i.e., tooltips), and ease inter-page navigation.

  • Java, JavaScript, etc.

    I have deep reservations about Java, JavaScript, and other imperative client-side programming languages. Any given program may be error-ridden or even actively malicious. So, I disable these languages in my browser, enabling them only when (a) they are needed and (b) I'm willing to trust the originating site.

    On the other hand, client-side processing can do many splendid things. Google Maps and other AJAX applications clearly demonstrate this power. So, it's certainly worth keeping these languages in mind. Just don't use them unless there's a good reason.

  • Movies

    If you have an image-generation tool that produces consistent and controllable layout, it isn't that big a step to produce stop-motion animations. QuickTime movies, for example, can be used to clarify the sequence of data flow diagrams.

  • Scalable Vector Graphics

    Scalable Vector Graphics (SVG) is an extremely appealing technology for client-side image generation, etc. When combined with AJAX, it should be able to do some really spectacular things. I expect to make extensive use of it, just as soon as more browsers get around to supporting it.

  • Search Engines

    No matter how well you design your inter-page navigation, there will be times when it won't fit the user's needs. Fortunately, it's trivial to add a search engine, such as Simple Web Indexing System for Humans - Enhanced (Swish-e), which can index web pages, PDF files, and more. As a side benefit, the indexing process can detect broken links!

  • Server-side Processing

    Server Side Includes (SSIs) provide an easy way to "include" standardized bits of HTML (e.g., headers, footers) into static web pages. In MBD, they can be used to fold mechanically-produced HTML into manually-edited web pages (and vice versa). SSIs can also resolve content generation conflicts. For example, one script's images and image maps might be needed by an earlier script's web pages.

    There are many other ways to do server-side processing, including Common Gateway Interface (CGI), script-based template engines (e.g., Mason, PHP, Ruby on Rails, and XML-based systems such Apache Cocoon. Selecting and integrating these sorts of tools, however, should be approached with care. If an approach doesn't scale well or restricts your data model, it may be a poor choice.

    It's also worthwhile to think about cost/performance trade-offs. Disk storage and offline computer time are very inexpensive, in most situations. By generating much of your web content in advance, you can trade cheap resources for crisp interactive performance.

Page Types

Although MBD-based web sites can have totally arbitrary content and format, certain types of web pages are likely to be useful.

"Entity" pages

Most MBD-generated web pages describe entities, so let's consider the kinds of information that the user should see on such a page:

  • What characteristics distinguish this entity?

  • What class(es) does this entity belong to?

  • What other entities is it related to, and how?

Because these pages are mechanically generated, it is trivial to provide rich cross-linking between pages, add explanatory descriptions and tooltips, etc. Image-mapped "context diagrams" can be generated to show "close relatives", etc.

Help and tutorial pages can also be linked in, explaining sections, pages, or even sets of pages. Finally, if the user can't find what s/he is looking for, mailto links and a search facility are obvious, trivial, and very useful amenities to add...

"Index" pages

No single index can meet the needs of every user at every time. A user may only be interested in a selected subset of the entities, need them sorted in a particular order, etc. The number of available views (i.e., combinations) can grow very rapidly with the number of options allowed. Nonetheless, it is still a manageable problem.

Mechanical generation of index pages can allow any number of "views" to be shown. With proper planning, navigation between the views can be very simple. For example, the user might navigate between views by clicking in one or more "link tables". Alternatively, in a forms-based interface, checkboxes, menus, and/or radio buttons can be used to good effect.

"Tutorial" pages

To the extent that the presented model matches the organization of the system being documented, the same descriptive material can be used to cover both. That is, understanding of the system helps in understanding of the web site, and vice versa.

Image-mapped diagrams can be used to good effect, allowing the user to examine and "explore" the presented model. Animated diagrams, showing control or data flow, can also be useful. In both cases, mechanized generation techniques can ease the burden of generating the diagrams.

Next: Case Study (FSW)