TIP 40: Documentation Generator for Tcl Scripts

Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2017 Conference, Houston/TX, US, Oct 16-20
Send your abstracts to tclconference@googlegroups.com
by Aug 21.
Author:         Arjen Markus <arjen.markus@wldelft.nl>
Author:         Donal K. Fellows <fellowsd@cs.man.ac.uk>
State:          Withdrawn
Type:           Project
Vote:           Pending
Created:        04-Jul-2001
Keywords:       documentation,automatic generation,HTML,reference
Tcl-Version:    8.0


This TIP proposes the adoption of a standard documentation format for Tcl scripts and the implementation of a simple tool that will extract this documentation from the source code so that it may be turned into a programmer's guide. This is in essence akin to documentation tools like the well-known javadoc utility for Java programs and Eiffel's short utility.


The style guide by Ray Johnson presents a documentation standard that is easy to use and is in fact adopted in the Tcl/Tk distribution. Other than this, the standard has not been enforced or encouraged. It is also not backed up by some tool (as far as I know) that can generate pretty looking documents from this. As a consequence, styles of documentation may vary widely and at times it is necessary to read the source code (looking for descriptions) to understand how to use the script.

The availability of such a tool may encourage people to use the standard, as the costs of generating the documentation are relatively low. The tool must accommodate for variations and therefore be flexible - for instance by providing customisable procedures to support the user's preferred header format.

The tool also needs to distinguish the types of output: in many cases HTML output is desirable to make it look pretty and provide hypertext facilities, in other cases it should provide plain text, formatted so that it can be read in any ordinary text editor or printed quickly.

Parallel to the development of such a tool, a standard or checklist should be assembled of what information programmer ought to provide, the version of Tcl/Tk, extensions that need to be present, what functionality is offered and so on.


Automatic documentation generation has two goals: improving usability and improving maintainability. The first means: pleasant looking documentation, at low cost for the author, is easy to use. One can also avoid reading the source code. Further, it ensures homegeneously looking documentation.

Improving the maintainability is achieved by having more or less technical documentation near the code. There is no need for separate documents, something which enhances the risk of discrepancies. Remember the DRY principle: Don't Repeat Yourself.

What information

A user clearly needs different information than a maintainer. For the user it is important to know what functionality is provided, what other packages or extensions are needed, which (public) procedures are available and how to use them.

For the maintainer: having an overview of the source files helps finding the procedures. Part of this information can be extracted directly from the source (such as via inspection of the proc, package and namespace statements).


Use the format proposed by the style guide as a guideline (certainly for the reference implementation):

 # pkg_compareExtension --
 #  Used internally by pkg_mkIndex to compare the extension of a file to
 #  a given extension. On Windows, it uses a case-insensitive comparison
 #  because the file system can be file insensitive.
 # Arguments:
 #  fileName     name of a file whose extension is compared
 #  ext          (optional) The extension to compare against; you must
 #               provide the starting dot.
 #               Defaults to [info sharedlibextension]
 # Results:
 #  Returns 1 if the extension matches, 0 otherwise

(This comes from the "package.tcl" script file that came with Tcl 8.3.1, it is consistent with the Tcl style guide by Ray Johnson)


The requirements are simple to describe:

Summary of reactions

The replies on the first version of this TIP were quite positive: both Donal Porter and Cameron Laird think it is a good idea. Juan Gil gave a very extensive reaction, describing a more general framework that would eventually result in a system for generating all kinds of output from Tcl scripts, TIPs and so on.

To do him more justice, without repeating the entire document, he proposes the use of XML as an intermediate format holding the structure of the information. The advantage is the possibility to reuse all existing tools and (de facto) standards, notably DocBook, in this context.

Even though I share some of the enthousiasm of Juan, I am a bit awed by it: the original idea of this TIP is not so much creating a publication system, but rather an easy-to-use tool for automatically extracting useful information in a nice shape. Eventually it could develop into something of the kind Juan describes, but that should not be the first goal.

The technique for representing the information structure he proposes, is quite useable (and akin to the rendering process of TIPs):

The first problem to solve is then finding a suitable structure for the information we need to extract. This is the subject of the next section.

Will Duquette and Andreas Kupries mentioned the frequent use of specialised commands that introduce new commands (rather than a straightforward call to "proc"). This feature will have to be looked into, because if you only look for lines like "proc some-command { ... } {", you might well miss the essentials of such applications.

Information in a Tcl script

Tcl scripts are organised in three essentially unrelated ways:

In practice these methods will be used in accord with each other, but there is no guarantee for instance, that a source file contains only one package and programs will probably quite often use more than one package.

On a smaller scale, the following items are of importance:

For a user it will be important to know what a program or package has to offer and how to get this functionality:

For a maintainer of the code, additional information would include:

A possible structure in which all this information can be stored and retrieved is sketched below:

    source files (list of)
    packages (list of)
    procedures (list of)
    command-line arguments:
       description of each option and the values (if any)
       associated with it
       description of other arguments (such as file names)
 source files:
    packages (list of)
    procedures (list of)
    package A:
          packages that are required
       local and global variables:
          variable D:
             used by which procedures?
          variable E:
             used by which procedures?
          procedure F:
             exported or not
                argument G:
                argument H:
          procedure I:

Except for the descriptions, all these items can - at least in principle be extracted automatically. So, even though the programmer has been too lazy to describe his/her procedures in detail, some information can be retrieved about the use of global data for instance and the complexity of the argument lists.

Note that as far as the three methods of organisation is concerned, there is no attempt to define a practical relationship between them.

To refer to the example above:

 procedure: pkg_compareExtension
       "Used internally by pkg_mkIndex to compare the extension of a file to
       a given extension. On Windows, it uses a case-insensitive comparison
       because the file system can be file insensitive."
       argument fileName:
             "name of a file whose extension is compared"
       argument ext:
             "(optional) The extension to compare against; you must
             provide the starting dot.
             Defaults to [info sharedlibextension]"
          "Returns 1 if the extension matches, 0 otherwise"

By adopting a tree structure to represent the information extracted from the source code, one can be as flexible as probably needed. For instance, suppose one would like to extract certain metrics, like the number of lines or the cyclometric complexity. This could then be an additional node in the subtree for the command procedure, besides the list of arguments, the result and the description.


This document is placed in the public domain.