MBD (Model-based Documentation)
is an integrated approach to the design and development
of semi-automated document production systems.
Specifically, MBD uses a consistent "system model"
(at various levels of abstraction)
to provide conceptual clarity, ease navigation,
and guide the development process.
MBD bridges the gap
between traditional documentation and report generation techniques,
leveraging the strengths of each.
It works well for generating timely, integrated,
and detailed documentation for large systems.
At the same time,
it facilitates the rapid prototyping
of specialized documents and reports.
Although I have only used MBD in a software development context,
I believe it to be applicable to any substantial system
that depends extensively on computers.
This page introduces some key concepts in MBD,
discussing the role of models in intelligence, communication,
and documentation.
It then presents an abstract data flow model for MBD processing.
In the popular MVC (Model-View-Controller) architecture,
the model stores the data that underlies the application.
My use of the term is a bit broader,
including "mental models", etc.
Here are some applicable definitions:
A small object ...
that represents in detail another, often larger object.
A schematic description of a system ...
that accounts for its known or inferred properties
and may be used for further study of its characteristics ...
Either of these definitions could,
with a bit of stretching, describe documentation.
More to the point,
documentation which is based on these sorts of models
has a built-in conceptual and organizational framework.
This framework can assist both the documenter and the reader,
as discussed below.
More generally,
because modeling is fundamental to both intelligence and communication,
thinking about the underlying model(s)
is an inseparable part of good documentation design.
Let's explore this connection a bit further...
Jeff Hawkins
("On Intelligence")
contends that intelligence and memory
are byproducts of modeling activity in the
neocortex.
Specifically, our brains record and recognize "invariant patterns"
in received data,
then make and test predictions about other, related patterns.
After hearing a few notes from a known song,
we can recognize a pattern and predict the following notes.
If the prediction fails,
our attention will be drawn to the disparity
and our brains will try to resolve it.
Is this is a stylized rendition, a different song, or what?
Using billions of interconnected neurons,
the neocortex can recognize and predict
many levels and variations of patterns.
The incoming flow of information interacts with the stored patterns,
causing them to be interlinked, modified, reinforced, etc.
Collectively, the patterns form a "model" of perceived reality.
We may think that we are interacting with "reality",
or at least what our senses report about it,
but we are actually dealing with this internal model.
Communication, similarly,
is based on the transmission and sharing of models.
The sender offers assertions, qualifications, connections, etc.
If the communication is successful,
these will be incorporated into the receiver's models.
Modeling also plays a significant role in message preparation.
A "mental model" of readers' typical backgrounds
(or really, of their mental models)
influences the sender's choice of material,
presentation style and order, etc.
In preparing these pages,
I thought about the key concepts I wanted to present
and the likely backgrounds of my readers.
Using familiar concepts (e.g., brain, memory) as a base,
I presented new concepts (e.g., invariant patterns).
Because "neocortex" might be an unfamiliar concept,
I defined it by context and added an HTML link.
As I got further into the material,
I was able to rely on concepts already presented.
Thus, my model of the readers' models
was predicated on their ongoing comprehension
of the material being presented.
This sort of model-based "second guessing"
is very common in human communication.
Different forms of communication
(e.g., articles, conversations, documentation,
formal and informal presentations)
employ it to different degrees, however.
In casual conversation and informal presentations,
the speaker may not pay much attention to organizing the presentation.
After all, the listener(s) can easily ask
about anything that isn't understood.
Thus, interactivity reduces the need for organization.
Formal presentations and most forms of written communication
do not allow for easy interaction,
so authors tend to spend significant effort on organization.
They write outlines, move material around,
and generally polish the text until the ideas "flow" smoothly.
Some trickery may be required,
because many topics don't "serialize" cleanly.
For example, the author may have to use a "placeholder" definition,
because a full definition would interrupt the flow
and/or rely on material not yet presented.
Documentation differs
from other forms of written communication
in several ways, including
complexity, scale, information level, and typical usage.
These differences affect the way that documentation is
(or at least should be :-) designed.
In summary, documentation creation presents unique challenges.
The writer must present vast amounts of information,
with very little knowledge of the readers' mental models.
About all that the writer can predict is that the reader
will not approach the material in a sequential fashion.
Because the writer can't predict the reader's background,
other means must be used to supply context.
In practice, this means that the author
must provide ways for the reader
to find (i.e., navigate to) any needed background information.
Navigation can be supported by various mechanisms:
indexes, links, search engines, etc.
Using mechanized techniques,
it's quite easy to generate web pages
full of links, pop-ups, clickable diagrams, etc.
The tricky part is to make the result
comprehensible and useful.
MBD addresses this problem, where possible,
by basing the site's organization and navigation
on an abstract model
of the documented system's entities and relationships.
Because the model is consistent,
readers soon learn how to navigate around the documents.
Overview text and diagrams can ease and expedite this process.
Typically, MBD uses a "suite" of cooperating programs,
mixing general-purpose utilities (e.g., text formatters)
with special-purpose programs (e.g., data filters).
Taking our own advice, let's model a typical MBD suite.
The following diagram shows (very abstractly!)
how the suite fits into the overall documentation data flow:
Our goal is to document a system,
using a combination of its own Data (e.g., databases, files)
and any related Info (e.g., documentation, institutional memory).
If an existing document is useful and presentable,
we'll simply add it to our collection of Documents.
Alternatively, information can be extracted
from Info and/or Data sources, analyzed, and presented in Documents.
The dotted line connecting the Info and Data boxes
indicates that they are closely related.
For example, a system's documentation should
describe the nature of its data.
This relationship isn't part of the MBD suite,
but it forms an important part of the suite's working environment.
An MBD Suite can be considered as having three processing phases:
Data extraction (i.e., input filtering) is usually straightforward.
Although some data sources use complex, undocumented formats,
most do not.
In addition, tools and libraries are often available
to access complex formats.
Analysis may range from trivial (e.g., tallying bug reports)
to extremely difficult
(e.g., extracting information from unstructured text).
Generating documents is seldom challenging,
once the content and layout have been determined.
However, selecting and learning how
to configure and use various tools
(e.g., Cocoon, dot, Rails, troff)
can require significant start-up effort.
A trivial MBD application might combine all three phases
in a single program.
For example, a Perl script might access Bugzilla's MySQL database,
tally selected bug reports,
and generate a web page.
As more types of data sources and generated documents come into play,
however, a single program will become unwieldy.
Real-world MBD applications may have dozens of data sources,
generate many kinds of reports, etc.
Consequently, they will have many instances
of Analysis, Extraction, and Presentation programs.
There may also be cases where multiple levels of analysis are needed.
As the diagram indicates,
the data traverses a
directed graph.
With a bit of care,
the graph can be constrained to be
acyclic (no program can modify data that affects its own input).
This may sound quite theoretical,
but it has some useful consequences.
In particular, it means that dependency-based tools such as
The following diagram shows the data flow
for a smallish MBD suite.
If data source S1 changes, programs E1, A1, and P1 will be run,
producing an updated version of document D1.
Programs A2, P2, and P3 will also be run,
updating all of the other documents.
Changing S3 or S4, in contrast,
would only cause D3 and D4 to be updated.
To be sure, this description leaves out many implementation details.
For example, it says nothing
about information storage and transmission.
Are we using files, a database, or what?
We also need to ensure that each program's input information
is complete and consistent while it is processing.
Given that the input could be several thousand files,
a bit of trickery may be needed.
Nonetheless, this approach is not science fiction.
I recently created a
web site that contains tens of thousands of web pages, image-mapped diagrams, etc.
Because the pages are heavily cross-linked,
the site contains hundreds of thousands of links and tooltips.
Nonetheless, the site is easy to navigate, understand, etc.
Next: Modeling
Models
Intelligence
Communication
Documentation
Navigation
Modeling MBD
Processing Phases
Data Flow
Here is a slightly less abstract diagram,
showing the flow of data through these (sets of) programs:
make (or even parallel make)
can be used to control the suite's operation.
Better yet, we can run the entire suite under
Cron and have an automated "service"
that updates the relevant output documents
whenever an underlying data source changes.