MBD: Advice

MBD Home

Overview

Concepts

Modeling

Semantic Wikis

The Extraction Phase

The Analysis Phase

The Presentation Phase

Case Study (FSW)

Case Study (Unix)

Advice

  Data Catalogs
  System Sketches
  Report Summaries
  Coding Practices

Tools

Books


Rich Morin, rdm@cfcl.com

Printable Version

Although some projects actually use documentation as a design tool, most documentation is created "after the fact". So, it's likely that you're dealing with an existing system. Your task is to enhance the current documentation, producing a consistent, integrated result. Using MBD's phases as an outline, let's look at some ways to prepare for this:

  • Extraction - obtaining and storing the desired data

    Start an informal "data catalog", capturing critical information (e.g., content, creator, format, name, owner, users) about existing data sources, documentation, etc. Be sure to keep track of any "domain experts" you encounter in this process!

  • Analysis - combining or tallying data, discerning patterns, etc.

    Create some informal sketches of the system's fundamental entities and relationships. Try to understand the system's goals, constraints, design, etc. Start thinking about how this "model" (or some variation thereof) could be used to organize the documentation.

  • Presentation - generating diagrams, indexes, plots, tables, etc.

    Start a documentation "wish list", containing descriptions of topics the documentation should cover, questions it should be able to answer, etc. Which documents should be added, improved, or replaced? What navigational aids (e.g., diagrams, indexes, search tools) would be useful?

Data Catalogs

Information on a system's data flow (e.g., creation, storage, use) is a useful and frequently overlooked form of documentation. In addition, knowing about available data sources is very helpful in mechanized document generation. Finally, researching these issues will uncover many useful tidbits about the system.

It isn't necessary to create a single, formal "data catalog". In fact, the effort to create one may be counter-productive, because many pieces of information won't "fit" a rigid format. Instead, create an informal web page for each "lump" of data, capturing critical information (e.g., format, name, owner, users), as well as informal notes, open questions, etc.

Don't be afraid to catalog "oddball" data sources. If a ReadMe file, spreadsheet, or PDF document contains interesting information, list it. Figuring out how to capture the data in a convenient format can come later...

Once you have some pages started, invite others to report errors, omissions, etc. In time, you may have enough information to create formal pages, data flow diagrams, etc. In the meanwhile, you'll have a very handy set of notes!

The Project Wiki

Start up a wiki for the project. Take some care in selecting the wiki software. You'll want it to be comfortable, popular, robust, scalable, and rich in useful features. It should also be undergoing active development in areas such as dynamic content, semantic wikis, etc.

My current choice, FWIW, is MediaWiki, the technology underlying Wikipedia. It is written in PHP, so it installs easily on a wide range of platforms. It has extensive documentation, a variety of mailing lists, etc. Although MediaWiki's principal focus is supporting Wikipedia, some of its development goes beyond this purpose.

Adding mechanically-generated content is quite feasible. MediaWiki uses an RDBMS (typically MySQL), so you can simply do a careful set of table updates. Indeed, the MediaWiki source code can be co-opted to help in this. Alternatively, page edits can be done over HTTP, using a library such as LWP.

Because MediaWiki has support for Transclusion, generated content can be written into its own pages and "included" into human-edited pages. This limits the chance for collisions, editing errors, etc.

Dynamically-generated content is also possible. Thomas Gries has extended the InterWiki facility to perform transclusion from arbitrary web servers. His WikiMania 2004 slide set, Getlets: extending the interwiki concept beyond any limits, describes the extension's motivation, capabilities, architecture, and syntax.

Once you have the wiki set up, you'l find that it is a useful place to "publish" ideas, try out experiments, etc. One obvious notion (and my current activity :-) is to use the wiki as a place to publish "system sketches". For example, see my Unix Ontology page.

System Sketches

Any significant system is going to be very complicated. Fortunately, a model doesn't have to capture everything in order to be useful. In fact, it's in the very nature of models to be incomplete. The trick lies in deciding which aspects of the system the model needs to represent.

Think about the types of entities and relationships that underlie the system. Which entities deserve web pages? Which relationships deserve links? As you proceed, publish "snapshots" of your current thinking (e.g., textual descriptions, diagrams) on the wiki.

Diagrams

Diagrams can be very useful, but Humpty Dumpty's advice applies:

    "The question is", said Alice,
    "whether you can make words mean so many different things".

    "The question is", said Humpty Dumpty,
    "which is to be master - that's all."

    Lewis Carroll, "Through The Looking Glass" http://www.sundials.org/about/humpty.htm

So, sketch out and study some informal diagrams, using whatever tools and notations you find convenient. Diagrams, like words, should not be the master.

There are many formal and precise tools for diagramming systems, including Conceptual Graphs, Entity-relationship diagrams, Object-Role Modeling, and Unified Modeling Language diagrams. The problem with these tools, for this exercise, is that their very formality and precision may get in the way.

Use a simple graph notation (e.g., circles for entities, lines for relationships). A diagram editor (e.g., Dia, Graphviz, OmniGraffle, Visio) can help you to produce "pretty" diagrams, but paper or a whiteboard may be more comfortable for initial and/or collaborative exploration.

Brainstorming

Once you have the appropriate tools, start brainstorming about things that you might want to document. Software systems typically include items such as databases, functions, libraries, modules, and programs. Software development efforts have bug reports, tests, etc. Corporate settings have office locations, reporting structures, etc.

As you add items, consider which relationships might be of interest. A program's web page might link to relevant databases, functions, libraries, programmers, test suites, etc. Don't try for perfection; just try to get the main items and relationships recorded...

Report Summaries

While you're waiting for help with the data catalogs, start drafting a set of report summaries. The initial purpose of these summaries is to define the objectives and strategies for each "report" (i.e., plot, table, web page) you'll be generating. Each summary should contain, at least:

  • A description of the general question

    "Do we have any quality assurance problems?"

  • Examples of specific questions

    "How many bugs are getting past development and testing?"

  • The content and format of the report(s)

    "Generate a table showing the number of bugs found at various stages of each release cycle. Show the percentage of bugs found after the release."

  • Data source(s) and processing

    "Using Bugzilla's MySQL database, extract and tally bug creation dates."

Once the documentation is generated, the summary should be brought up to date, expanded, and made available (e.g., as a "help page") to both developers and users.

Coding Practices

    Document my code?
    Why do you think they call it "code"?

    - T-shirt slogan, from Computer Gear

Most programmers agree that code should be well commented, have a clear and consistent style, etc. They understand that these practices improve readability, ease debugging, and reduce errors.

The same line of reasoning can (and should) be applied to mechanically-generated data files. By spending a little extra effort on the generating code, you can make textual output files easy (nay, pleasant) to read.

  • Each file should have an informative header, indicating the file's format, origins, purpose, etc. If the "template" for this text appears in the generating program, the information can actually serve double duty!

  • Section comments are a Good Idea; they can provide useful context, help in navigating through the file, etc.

  • Don't be afraid to include some "extra" newlines and enough spaces for useful indentation. C'mon, folks, white space is your friend...