MANSERVER(l) MISC. REFERENCE MANUAL PAGES MANSERVER(l) NAME manServer - convert manual pages to HTML for viewing with a web browser SYNOPSIS manServer [ -s [ port ] ] [ filename ] DESCRIPTION manServer is a troff (or nroff) to HTML interpreter written in Perl. It is designed specifically to convert manual pages written using the man(7) troff macros into HTML for display and navigation with a web browser. To this end it includes direct support for man macros (also limited support for tbl, eqn and the doc macros) as well as the most common troff directives. The following macros and directives are supported, with varying degrees of correspondence to their troff counter- parts. Where there are significant differences these are usually due to the fact that troff tends to be layout- oriented while HTML is more content-oriented and does not provide the same kind of detailed layout control. Structure and layout macros Standard man paragraph macros are quite well implemented, using tables to control indenting. These include: .TP .HP .IP Hanging paragraph types .LP .PP .P Flush left paragraphs .RS .RE Indented blocks .SH .SS Section and subsection headings .br .sp Line break and spacing Tab and tab stops work, but only when using a constant width font. In practice this means within a no-fill block (.nf) that doesn't include any font style or size changes. Temporary indents (.ti) and spacing control (.sp, paragraph spacing with .PD) don't work terribly well because HTML does not provide appropriate control. Text style support Standard troff directives (including in-line directives) are reasonably well implemented, including: \fB \f2 \fP etc. Font control .B .I .R Single word in alternate fonts .RI .BI .BR etc. Alternating roman/italic/bold text styles \s+-n \s0 Point size control .ft .ps .SM Font & size control .nf .fi No-fill block (eg. code examples) \u \d Super/sub-scripting Only one level of super or subscript within a line is sup- ported when using \u or \n or eqn sup and sub tags. Special characters in troff are geared towards typesetting mathematical equations, whereas the iso8859 character set supported by most browsers primarily includes foreign accented characters. Special characters are implemented wherever possible, but a large number of them (such as Greek letters, square root signs and so on) have no equivalent. Overstriking (\z, \o) and local motion (\h, \v) directives have no equivalent in HTML and are filtered out. A handful of overstrike combinations (combining a colon or apostrophe with a vowel to produce an accented character) are recog- nised and converted however. Hyperlinks A key benefit of using HTML is being able to turn the manual pages into hypertext for convenience of browsing: - Items that look like a man page reference, especially following a .BR tag, are turned into links. - URLs (http, ftp and mailto) are converted to links. - Included text (using a .so directive) is implemented as a link to a separate page. - A table of contents with links to each section (.SH or .SS) is generated at the start of the page. - Text in bold which starts with a capital letter and matches the name of a section is turned into a link to that section. - An index of man page sections and the contents of each section are produced to allow browsing throughout the man pages. - As well as browsing through the index the name of a man page can be entered in a search dialog. Whenever a search is ambiguous a choice of matching pages is given. Troff emulation The following are well implemented: - Strings can be defined and interpolated (.ds). - Simple macros can be defined (.de) and used. - Ignore blocks (.ig) and comments (.\") are propagated through as HTML comments. The following features are not implemented as fully as they might be. - Conditional expressions (.if, .ie, .el) are barely imple- mented. - Number registers and interpolation are partly imple- mented, but don't support auto-increment or formatting styles. A number of features are not yet implemented though arguably they could or should be. - Text diversions and input traps. There's no way of measuring the width of text when a propor- tional font is used for rendering, so a very approximate guess is calculated when using the \w directive (normal characters count as 1, whereas spaces and punctuation count as half a character). Tbl support Tables using the tbl(1) preprocessor (between .TS and .TE macros) are rendered quite well, the main deficiency being that HTML tables do not give you control over whether indi- vidual cells have a border around them or not. A number of heuristics are applied to try to determine when a row is actually a continuation of the previous row so the data can be merged. Eqn support A half-hearted attempt at interpreting eqn(1) tags (between .EQ and .EN and inline delimeters) is made and copes reason- ably well with simple expressions including things like super and subscripts. It makes no attempt to implement features that would result in more than one line of output however. Doc macros Manual pages written using the Berkely doc macros are recog- nised and implemented to about the same extent as the corresponding man macros. (If anything, doc macros are easier to implement because they are slightly more content- oriented, if a little odd). OPERATION Synopsis: manServer [ -s [ port ] ] [ filename ] manServer works in one of three modes, depending on how it is invoked: + If given a filename the file is processed and the HTML generated is written to standard out. + If invoked with the -s option manServer runs as a stan- dalone HTTP server, directly responding to requests from a web browser, either on port 8888 or the port specified. manServer enters a loop and continues pro- cessing requests until it crashes, is killed, or the universe ends. Note that for speed and simplicity, a new process is not forked to process each request, so if the server dies while processing a request you will have to res- tart it. + If invoked with no arguments and the GATEWAY_INTERFACE environment variable is set it is assumed to be running as a CGI script invoked by a standalone web server such as CERN HTTPD and a single request is serviced. (If no arguments are specified and GATEWAY_INTERFACE is not set then manServer enters its HTTP server mode.) manServer processes requests for individual man pages, usu- ally specified as just the base name (eg. 'ls') or as a name plus section number (eg. 'ls.1') in which case they are searched for in the normal MANPATH hierarchy. They may also be fully qualified (eg. '/usr/man/man1/ls.1') and this form is necessary if the page appears in more than one manual hierarchy. gzip'd manual pages (ending in a .gz suffix) are automati- cally expanded with zcat. ENVIRONMENT MANPATH Determines which manual pages are available and where to find them. If not specified /usr/man/man* is used. GATEWAY_INTERFACE Determines whether manServer runs as a CGI script. SCRIPT_NAME CGI parameter. PATH_INFO CGI parameter. QUERY_STRING CGI parameter. FILES /etc/manpath.config Definition of MANPATH under Linux. /usr/lib/tmac/tmac.an /usr/lib/tmac/an /usr/lib/tmac/tz.map /usr/lib/groff/tmac/tmac.an Man macro definitions, used to get definitions of sec- tion and referred document abbreviations on various platforms. SEE ALSO man(1), man(7), nroff(1), eqn(1), tbl(1) The program is available at http://www.parallax.co.uk/~rolf/download/manServer.pl AUTHOR Rolf Howarth (rolf@insect.demon.co.uk) BUGS manServer gets confused by some tags, particular within poorly structured manual pages. Some people write some very peculiar manual pages and working out what semantic layout they actually meant from the inconsistent layout tags they used becomes more and more of a lost cause... Searching man pages (either a free text search or in the manner of apropos(1)) is not yet implemented. The order of parsing and processing troff input (especially things like special character codes) differs from troff and there may still be one or two bugs in certain situations, eg. where text is parsed twice, contains backslashes and/or angle brackets, or where it interacts with eqn or tbl emula- tion. Tags like .B without an argument should apply to the next input line. \fP should restore previous font, not always revert to roman. Pages whose extension does not match the section they are in are not found. Styles that span more than one tbl table cell need to be set and reset for each HTML table cell. When processing a file from the command line links to other pages need to be modified (relative to current location, .html suffix). Various additional directives, including setting indent (.in), multi-line conditionals, renaming registers and mac- ros, and diverts could and should all be implemented. Different Unix platforms may define their own man macro package with variations from those implemented here. man- Server has been tested with pages under SunOS, Solaris and Debian Linux. Last change: 11 September 1997