Tcl Source Code

View Ticket
Login
Ticket UUID: 3600058
Title: Doctools nroff/groff output not supported by tcltk-man2html
Type: Bug Version: current: 8.6.0
Submitter: twylite Created on: 2013-01-09 09:20:41
Subsystem: 55. Other Tools Assigned To: dkf
Priority: 8 Severity: Minor
Status: Open Last Modified: 2015-05-01 14:28:55
Resolution: None Closed By: nobody
    Closed on:
Description:
Tcl's tools/tcltk-man2html is used to generate HTML documentation from nroff/groff sources on multiple platforms.  It supports a subset of nroff that is used by the man pages in the Tcl core and bundled packages.

Doctools, as used for documentation in Tcllib, produces nroff output that is not supported by tcltk-man2html.  This means that the Tcl core utilities for HTML doc generation cannot also generate Tcllib documentation in the same style.  It would be desirable to extend tcltk-man2html to support doctools output.

There are a number of issues, and complete support will entail changes to tcltk-man2html and fixes to doctools.

PATCH

For all issues below with proposed fixes, the fix has been implemented and can be found on the branch bug-(thisbug)-td.

BACKGROUND

Tcllib nroff documentation is generated using 'tclsh sak.tcl doc nroff' which places the .n files in doc/nroff.  The files are then moved to be directly under doc/ so that they can be found by tcltk-man2html, which is invoked as 'tclsh tcltk-man2html.tcl --tcl --pkgdir=..\..\ --verbose=1'

The tcltk-man2html source was modified to provide more detailed output in some cases.


ISSUE #1: Section name with no description crashes

-----
scanning page C:/User/Tcl_BUILD/tcl-8.6.0/tcllib/doc/pkg/doc/coro_auto.n
coro_auto: NAME: output-name: bad section name: coroutine::auto -
can't read "head": no such variable
can't read "head": no such variable
    while executing
"man-puts "$head — $tail""
    (procedure "output-name" line 8)
-----

Proposed fix: Adjust the regex in output-name to not require a space after the dash (consume 1 optional whitespace).


ISSUE #2: Doctools escapes single quote at start of line

-----
scanning page C:/User/Tcl_BUILD/tcl-8.6.0/tcllib/doc/pkg/doc/csv.n
csv: process-text: uncaught backslash:
IN {Takes a \fImatrix\fR object following the API specified for the
struct::matrix package and returns a string in CSV format containing
these values. The separator character can be defined by the caller,
but this is optional. The default is ",". The quoting character
can be defined by the caller, but this is optional. The default is
\'"'. Each row of the matrix is considered a record, these are
separated by newlines in the result. The elements of each record are
formatted as usual (via \fB::csv::join\fR).}
OUT {Takes a <I>matrix</I> object following the API specified for the
struct::matrix package and returns a string in CSV format containing
these values. The separator character can be defined by the caller,
but this is optional. The default is ",". The quoting character
can be defined by the caller, but this is optional. The default is
\'"'. Each row of the matrix is considered a record, these are
separated by newlines in the result. The elements of each record are
formatted as usual (via \fB::csv::join\fR).}
-----

csv.man contains: 
-----
Takes a list of values and returns a string in CSV format containing
these values. The separator character can be defined by the caller,
but this is optional. The default is ",". The quoting character can
be defined by the caller, but this is optional. The default is '"'.
-----

csv.n contains: 
-----
Takes a \fImatrix\fR object following the API specified for the
struct::matrix package and returns a string in CSV format containing
these values. The separator character can be defined by the caller,
but this is optional. The default is ",". The quoting character
can be defined by the caller, but this is optional. The default is
\'"'. Each row of the matrix is considered a record, these are
separated by newlines in the result. The elements of each record are
formatted as usual (via \fB::csv::join\fR).
-----

Proposed fix: add pair ( {\'} "'" ) to charmap in process-text.


ISSUE #3: Doctools may generate redundant font changes, which are ignored but result in an unhandled backslash error

-----
scanning page C:/User/Tcl_BUILD/tcl-8.6.0/tcllib/doc/pkg/doc/docidx.n
docidx: process-text: impotent font change: If not, the list of per-object search paths is searched. For each
directory in the list the package checks if that directory contains a
file "\fIidx.\fIfoo\fR\fR". If yes, then that file is taken as the
implementation.
docidx: process-text: uncaught backslash: IN {If not, the list of per-object search paths is searched. For each
directory in the list the package checks if that directory contains a
file "\fIidx.\fIfoo\fR\fR". If yes, then that file is taken as the
implementation.} OUT {If not, the list of per-object search paths is searched. For each
directory in the list the package checks if that directory contains a
file "<I>idx.foo</I>\fR". If yes, then that file is taken as the
implementation.}
-----

Proposed fix: doctools is arguably generating a redundant \fR, but we shouldn't be crashing out because of it.  The font handling logic could be rewritten to give every span of text terminated by a \\fx or \\f(xy its own open and close fonts, so we can generically handle nested and redundant cases.


ISSUE #4: oops: Copyright (c)

There are a number of these due to the variety of copyright statement styles in Tcllib.

Proposed fix: Support common styles "Copyright (c) YEAR, ..." and "Copyright (c) YEAR,YEAR,YEAR,... ...", and possibly "YEAR-YEAR,YEAR,...".  Less common styles should be handled by warning (as currently) and fixing the source in Tcllib.


ISSUE #5: Crash from treating string as list

-----
scanning page C:/User/Tcl_BUILD/tcl-8.6.0/tcllib/doc/pkg/doc/snitfaq.n
list element in quotes followed by "INFO"" instead of space
list element in quotes followed by "INFO"" instead of space
    while executing
"llength $rest"
    (procedure "make-manpage-section" line 100)
    invoked from within
"make-manpage-section $html $arg"
    (procedure "make-man-pages" line 35)
    invoked from within
"make-man-pages $webdir  [list $tcltkdir/{$appdir}/doc/*.1 "$tcltkdesc Applications" UserCmd  "The interpreters which implement $cmd
esc."]  [plus-base ..."
    ("try" body line 94)
-----

Proposed fix: In make-manpage-section the .SS case does a check {[llength $rest] == 0}, but rest is a string (not necessarily a valid list).  Check should be {$rest eq {}}.


ISSUE #6: unrecognised format directive in .CS block

-----
scanning page C:/User/Tcl_BUILD/tcl-8.6.0/tcllib/doc/pkg/doc/docidx_lang_intro.n
docidx_lang_intro: make-manpage-section: unrecognized format directive: ...
docidx_lang_intro: make-manpage-section: unrecognized format directive: ...
docidx_lang_intro: ADVANCED STRUCTURE: output-directive: unexpected .CS format:
[<B>include FILE</B>]
[<B>vset VAR VALUE</B>]
[index_begin GROUPTITLE TITLE]

[index_end]

docidx_lang_intro: ADVANCED STRUCTURE: output-directive: unexpected .CE
-----

docidx_lang_intro.n contains:
-----
.CS
[\fBinclude FILE\fR]
[\fBvset VAR VALUE\fR]
[index_begin GROUPTITLE TITLE]
...
[index_end]
.CE
-----

Proposed fix: Various other pages have example lines starting with a period (some are widget names, e.g. .text in sitfaq.n, .plot in statistics.n).  This is (to my knowledge) invalid output generated by doctools, and should be fixed in doctools.  As a workaround the handling of an unrecognised directive should be adjusted to treat the line as text if in a .CS block (a common case).  Outside a .CS block the current behaviour (ignore line) should be maintained.

The reason for the workaround in the .CS case is that an unrecognised directive in the middle of text will cause the line buffer to contain ".CS text text .CE" instead of ".CS text .CE" which causes the .CS/.CE output processing to fail.


ISSUE #7: Empty .CS block is unupported

While processing pki.n:
-----
pki: EXAMPLES: output-directive: unexpected .CS format:
.CE

.CS
pki: EXAMPLES: output-directive: unexpected .CE
-----

pki.n contains:
-----
.SH EXAMPLES
.CS



.CE
.CS



.CE
-----

Proposed fix: This occurs because the empty lines cause the line buffer to contain ".CS .CE" instead of ".CS text .CE", which causes the .CS/.CE output processing to fail.  Permit empty code section by having the .CE handler force the line buffer to flush, even if it is empty.


ISSUE #8: Unsupported .TP \fB...\fR

-----
scanning page C:/User/Tcl_BUILD/tcl-8.6.0/tcllib/doc/pkg/doc/struct_list.n
struct_list: make-manpage-section: ignoring .TP after .TP
struct_list: make-manpage-section: ignoring .TP after .TP
-----

struct_list.n contains:
-----
.TP \fB...\fR
.TP
\fBi\fR
Application of the command to the result of the last call and the
\fBi\fR'th element of the list.
.TP \fB...\fR
.TP
\fBend\fR
Application of the command to the result of the last call and the last
element of the list. The result of this call is returned as the result
of the subcommand.
-----

Proposed fix: None.  This appears to be invalid markup generated by doctools, and should be fixed there.


ISSUE #9: Spaces in package names

-----
scanning page C:/User/Tcl_BUILD/tcl-8.6.0/tcllib/doc/pkg/doc/graph1.n
graph1: NAME: output-name: name has a space: {struct::graph v1}
from: struct::graph v1 - Create and manipulate directed graph objects
...
scanning page C:/User/Tcl_BUILD/tcl-8.6.0/tcllib/doc/pkg/doc/matrix1.n
matrix1: NAME: output-name: name has a space: {struct::matrix v1}
from: struct::matrix v1 - Create and manipulate matrix objects
...
scanning page C:/User/Tcl_BUILD/tcl-8.6.0/tcllib/doc/pkg/doc/struct_tree1.n
struct_tree1: NAME: output-name: name has a space: {struct::tree v1}
from: struct::tree v1 - Create and manipulate tree objects
-----

Proposed fix: Unknown.
User Comments: dgp added on 2015-05-01 14:28:55:
status?  dkf?

twylite added on 2014-04-28 19:45:31:
Final updates are in branch [bug-3600058-td], commit [cdb671dd50].  Tcllib fixes have been incorporated into the trunk of that project.

This branch DOES NOT merge cleanly onto trunk: 3 minor conflicts have to be resolved.  It does merge cleanly onto core-8-6-0 (which is the baseline I'm working against).

Summary of all changes in branch:
  - Add ".SH CATEGORY" to all Tcl doc files (.1, .3 and .n)
  - Introduce an option "--pkgdir" (default $tcldir/pkgs) to tcltk-man2html.tcl. The tool will source in this folder for subfolders containing 'configure.in', 'configure.ac' or 'unix/configure.in', and if found treat the folder as a package.  The tool will also look in --pkgdir *in addition to $tcldir/pkgs* for a packages.list.txt file.
  - Separated status (stdout) and error (stderr) outputs in the tool.
  - Fixed handling of nroff markup to accommodate output from Tcllib doctools.
  - Added support for category headers (.SH CATEGORY) in man pages.  A categorised table of commands is displayed on the package contents page below the alphabetical list of commands.  The alphabetical list is now rendered using CSS columns (used to be a table).
  - More forgiving parsing of Copyright messages, to accommodate copyrights in Tcllib and other packages.

The two problems noted in the previous comment on this bug (multiple definitions and \fP) have not been resolved - this is bad data coming from Tcllib and needs to be addressed there.

I believe the bug is now addressed; any further issues can be raised independently.

Please review this change and let me know whether to commit to trunk, or what to fix.  Thanks.

twylite added on 2013-01-18 23:28:21:
Updates pushed to branch bug-3600058-td.  See Tcllib bug 3601370 and (tcllib) branch bug-3601370 for corresponding fixes to Tcllib.

There are still some multiple cross reference / multiple definition problems to be addressed, e.g.

  try is defined in more than one file: TclCmd/try TcllibCmd/tcllib_try
  multiple cross reference to interp in TclCmd/interp TcllibCmd/tcllib_interp from TkCmd/clipboard.n

I haven't looked at the \fP issue mentioned in the last comment.

Otherwise most issues appear to have been fixed.  I'll provide a summary in a future comment.

dkf added on 2013-01-15 21:15:00:
Doctools shouldn't generate \fP at all, but we would have enough information to be able to handle it correctly where it comes up in manually-written nroff.

twylite added on 2013-01-15 16:37:48:
For #1, #6 and #7 (not entirely appropriate nroff output by doctools) my intention was to warn (but not crash) and be able to continue sanely.  This is the fastest route to working doc generation for tcllib, and gives time for fixes upstream.

#3 (doctools generates redundant font changes), not that the "previous font" is a state not a stack, so doctools cannot safely generate /fB.../fP if there is a chance that the contents contain a font change.  The specific case here is that docidx.man contains '...contains a
file [file idx.[term foo]]. If yes...', which doctools transforms into '...contains a file "\fIidx.\fIfoo\fR\fR". If yes...'.  If [file] and [term] produced \fI...\fP instead, the result would be '...contains a file "\fIidx.\fIfoo\fP\fP". If yes...', which (because font previous doesn't stack, per nroff docs) would render 'If yes...' in italics.

The logic of the change is to say that if you have an nroff paragraph with a series of font changes 'text1/fItext2/fBtext3/fR...' then you can transform this into a logic structure { regular => "text1", italics => "text2", bold => "text3", regular => "..." } and from there generate HTML start and end tags for each text fragment (regular needs no tags): text1<i>text2</i><b>text3</b>.

I'll look at the /fP hack you mention - I didn't notice it.

For #4 (copyright) I don't think tcltk-man2html should handle every conceivable way of declaring copyright.  Handling a reasonable subset and warning for the rest is good enough, and these can be fixed upstream.

Thanks for the clarification on #9.  I agree that #8 (.TP) and #9 (NAME with spaces) should be fixed upstream, and no workaround introduced to tcltk-man2html.

I will update the code for #5 as suggested.

dkf added on 2013-01-15 16:14:56:
Going through these issues:

#1: Strictly not a bug in Tcl, as the line after .SH NAME *must* consist of “blah \- blah blah” (with some scope for flexibility in the “blah”s) or the external tool mkdirhier won't work. This isn't our restriction! OTOH, there's no reason for us to fail to accept it in our own conversion code.

#2: Reasonable.

#3: Still trying to wrap my head around the change here, but if we're moving to a state-maintaining approach for conversions within a paragraph, we can move to supporting \fP properly (it should mean “previous font” and not “roman font” so the current hack higher up is wrong).

#4: The important thing is to get the first and especially last year. It might be worth using a two-stage match for this rather than trying to do everything in the one RE.

#5: Reasonable, but recommend using [string trim $rest]eq{} just in case.

#6: This is a doctools bug, but if we can still make this a warning it would be a help.

#7: Why are there empty examples? That would seem to me to be the real problem here. (Again, happy to support if we can spit out a warning.)

#8: That's just wrong. Doctools bug. The argument to .TP (as opposed to the following line) should be the indent level to use. Either generate .IP (with "quoted" leading material) or do .TP right.

#9: That's also incorrect output by doctools; NAME sections have very restricted formats (not because of anything done by Tcl, but because of the ways manpages are processed by other parts of the Unix ecosystem) and the version number doesn't belong there.