manServer - convert manual pages to HTML for viewing with a web browser
Structure and layout macros
Text style support
manServer [ -s [ port ] ] [ filename ]
manServer is a troff (or nroff) to HTML interpreter written in Perl. It is designed specifically to convert manual pages written using the man(7) troff macros into HTML for display and navigation with a web browser. To this end it includes direct support for man macros (also limited support for tbl, eqn and the doc macros) as well as the most common troff directives.
It differs from programs like man2html which merely take rather ugly nroff output and put a thin HTML wrapper around it.
The following macros and directives are supported by manServer, with varying degrees of correspondence to their troff counterparts. Where there are significant differences these are usually due to the fact that troff tends to be layout-oriented while HTML is more content-oriented and does not provide the same kind of detailed layout control.
Standard man paragraph macros are quite well implemented, using tables to control indenting. These include:
.TP .HP .IP Hanging paragraph types .LP .PP .P Flush left paragraphs .RS .RE Indented blocks .SH .SS Section and subsection headings .br .sp Line break and spacing
Tab and tab stops (.ta, .DT) work, but only when using a constant width font. In practice this means within a no-fill block (.nf) that doesnt include any font style or size changes.
Temporary indents (.ti) and spacing control (.sp, paragraph spacing with .PD) dont work terribly well because HTML does not provide appropriate control.
Standard troff directives (including in-line directives) are reasonably well implemented, including:
\fB \f2 \fC \fP etc. Font control .B .I .R Single word in alternate fonts .RI .BI .BR etc. Alternating roman/italic/bold text styles \s±n \s0 Point size control .ft .ps .SM Font & size control .nf .fi No-fill block (eg. code examples) \u \d Super/sub-scripting
Only one level of super or subscript within a line is supported when using \u or \n or eqn sup and sub tags.
Special characters in troff are geared towards typesetting mathematical equations, whereas the iso8859 character set supported by most browsers primarily includes foreign accented characters. Special characters are implemented wherever possible, but a large number of troff characters (such as Greek letters, square root signs and so on) have no equivalent in HTML. Conversely, HTML supports many more accented characters than troff, and these can be included by inserting HTML character entities such as Þ (Þ) directly in the troff source.
Overstriking (\z, \o) and local motion (\h, \v) directives have no equivalent in HTML and are filtered out. A handful of overstrike combinations (combining a colon or apostrophe with a vowel to produce an accented character) are recognised and converted however.
A key benefit of using HTML is being able to turn the manual pages into hypertext for convenience of browsing:
- Items that look like a man page reference, especially following a .BR tag, are turned into links. (This only applies if manServer is running as a server, not from the command line). - URLs (http, ftp and mailto) are converted to links. - Included text (using a .so directive) is implemented as a link to a separate page. - A table of contents with links to each section (.SH or .SS) is generated at the start of the page. - Text in bold which starts with a capital letter and matches the name of a section is turned into a link to that section. - An index of man page sections and the contents of each section are produced to allow browsing throughout the man pages. - As well as browsing through the index the name of a man page can be entered in a search dialog. Whenever a search is ambiguous a choice of matching pages is given.
The following are well implemented:
The following features are not implemented as fully as they might be.
- Strings can be defined and interpolated (.ds). - Simple macros can be defined (.de) and used, also renamed (.rn, .rm). - Ignore blocks (.ig) and comments (.\") are propagated through as HTML comments.
A number of features are not yet implemented though arguably they could or should be.
- Conditional expressions (.if, .ie, .el) are barely implemented. - Number registers and interpolation are partly implemented, but dont support auto-increment or formatting styles.
Theres no way of measuring the width of text when a proportional font is used for rendering, so a very approximate guess is calculated when using the \w directive (normal characters count as 1, whereas spaces and punctuation count as half a character).
- Text diversions and input traps.
Tables using the tbl(1) preprocessor (between .TS and .TE macros) are rendered quite well, the main deficiency being that HTML tables do not give you control over whether individual cells have a border around them or not.
A number of heuristics are applied to try to determine when a row is actually a continuation of the previous row so the data can be merged.
A half-hearted attempt at interpreting eqn(1) tags (between .EQ and .EN and inline delimeters) is made and copes reasonably well with simple expressions including things like super and subscripts. It makes no attempt to implement features that would result in more than one line of output however.
Manual pages written using the Berkely doc macros are recognised and implemented to about the same extent as the corresponding man macros. (If anything, doc macros are easier to implement because they are slightly more content-oriented, if a little odd).
Supported doc macros include:
.Dt .Sh .Ss Title and section headings .Nd .Os .Dd Document name etc. .Bd .Ed Fill block .Bl .El .It Lists .Xr .Sx Cross references .Op .Fl .Pa .Ns .No .Ad .Em .Fa
.Ft .Ic .Cm .Va .Sy .Nm .Li .Dv
.Ev .Tn .Dl .Bq .Qq .Qo .Qc etc.
Assorted content-based style tags (option, flag, etc.)
Synopsis: manServer [ -s [ port ] ] [ [-ddebuglevel] filename ]
manServer works in one of three modes, depending on how it is invoked:
manServer processes requests for individual man pages, usually specified as just the base name (eg. ls) or as a name plus section number (eg. ls.1) in which case they are searched for in the normal MANPATH hierarchy. They may also be fully qualified (eg. /usr/man/man1/ls.1) and this form is necessary if the page appears in more than one manual hierarchy.
o If given a filename the file is processed and the HTML generated is written to standard out. o If invoked with the -s option manServer runs as a standalone HTTP server, directly responding to requests from a web browser, either on port 8888 or the port specified. manServer enters a loop and continues processing requests until it crashes, is killed, or the universe ends. Note that for speed and simplicity, a new process is not forked to process each request, so if the server dies while processing a request you will have to restart it. o If invoked with no arguments and the GATEWAY_INTERFACE environment variable is set it is assumed to be running as a CGI script invoked by a standalone web server such as CERN HTTPD and a single request is serviced. (If no arguments are specified and GATEWAY_INTERFACE is not set then manServer enters its HTTP server mode.)
gzipd manual pages (ending in a .gz suffix) are automatically expanded with zcat, similarly .bz2 compressed pages.
MANPATH Determines which manual pages are available and where to find them. If not specified /usr/man/man* is used. GATEWAY_INTERFACE Determines whether manServer runs as a CGI script. SCRIPT_NAME CGI parameter. PATH_INFO CGI parameter. QUERY_STRING CGI parameter.
Definition of MANPATH under Linux. /usr/lib/tmac/tmac.an
Man macro definitions, used to get definitions of section and referred document abbreviations on various platforms. /tmp/manServer.log Log file that may or may not contain some debug information.
man(1), man(7), nroff(1), eqn(1), tbl(1)
The program is available at http://www.squarebox.co.uk/download/manServer_107.pl
Rolf Howarth (email@example.com)
16 July 2001 (1.07) Added support for bz2 compressed pages. Fixed .so inclusion from other directories. Changes for compatibility with RedHat. Added support for removing and renaming macros. Allow use of \& to suppress URL expansion. Fixed bug when $ used as a table delimiter. Fixed various mdoc bugs for compatiblity with Mac OS X. With thanks to the following for suggesting patches: Marache Mathieu, Hans Kristian Fjeld, Carl Mascott, Simon Lai, Martin Kraemer, Dan Terpstra, Kathryn Andersen, Eric S. Raymond, and Jayan Arayanan. 30 November 1999 (1.06) Fixed a font style bug. Minor tidying of contents. Support preformatted text as well as nroff pages. Convert special Perl references like perlfunc to links. 4 August 1999 (1.05) Fixed URL now that my website is at www.squarebox.co.uk instead of www.parallax.co.uk. Fixed a taint problem and a minor Netscape table oddity (specifying cols=2). 12 Jan 1999 (1.04)
Use readdir instead of shell expansion to fix tainting problem under perl 5.005. Fix to table of contents generation (prevent redefinition of SH tags). Fix to URL detection so punctuation characters arent treated as part of link.
10 Nov 1997 (1.03) Use Socket.pm to get socket definitions. Fixed a tainting problem under perl 5.004. Fixed problem with spurious line breaks when using doc macros. 30 Sept 1997 (1.02) Amended some of the character entities. Fixed a couple of standalone server regression bugs. Script now runs with -T taint checking. 22 Sept 1997 (1.01) Introduction of version numbers. Table spanning fixes. No longer introduces broken links when invoked on a file from the command line. 11 Sept 1997 (1.00) First release. May - Aug 1997 Initial development
Use of newer HTML features (especially style sheets) would improve the look of the pages produced.
manServer gets confused by some tags, particular within poorly structured manual pages. Some people write some very peculiar manual pages and working out what semantic layout they actually meant from the inconsistent layout tags they used becomes more and more of a lost cause...
Searching man pages (either a free text search or in the manner of apropos(1)) is not yet implemented.
The order of parsing and processing troff input (especially things like special character codes) differs from troff and there may still be one or two bugs in certain situations, eg. where text is parsed twice, contains backslashes and/or angle brackets, or where it interacts with eqn or tbl emulation.
Tags like .B without an argument should apply to the next input line.
\fP should restore previous font, not always revert to roman.
Styles that span more than one tbl table cell need to be set and reset for each HTML table cell.
Various additional directives, including setting indent (.in), multi-line conditionals, renaming registers and macros, and diverts could and should all be implemented.
Different Unix platforms may define their own man macro package with variations from those implemented here. manServer has been tested with pages under SunOS, Solaris and Debian Linux.
|v1.07||MANSERVER (l)||16 July 2001|