BUILDING AN EFFICIENT HELP SYSTEM

Knowing how help files and the hypertext engine interact is essential

Leo Notenboom and Michael Vose

Leo, a software development engineer and manager for Microsoft, is the designer of the Microsoft Advisor. Michael is a coeditor of OS/2 Report newsletter.

On-screen documentation has become ubiquitous in contemporary software, from business applications to programming tools. Like anything else, the quality and usefulness of this aid varies from package to package. The technology of hypertext, however, promises to solve many of the organizational and speed problems currently found in some on-screen help systems.

On-screen documentation is useless unless it makes finding information easy, provides that information quickly, and delivers all the information users need. Some kinds of information -- command references, API references, example code, and the like -- lend themselves more readily to on-screen help than do others. Hypertext can equip a help-system programmer with the tools needed to create on-screen documentation that is fast, easy to navigate (even through voluminous text), and comprehensive. Hypertext isn't magic; it's actually quite a simple idea. In a help system, it's the association of a programmed action with an area of viewed text.

This article explains how help files and the hypertext help engine of a typical hypertext-based on-screen help system fit together, using the Microsoft Advisor as the example. With an understanding of how this technology works, you can use it and associated tools to add on-screen documentation to your libraries and programs or access on-screen help provided by any other program that uses similar technologies.

Hypertext in Context

As already defined, hypertext is simply the association of an action with an area on the screen. Thus a "button," which is just a visible region on a screen, might have associated with it the action of looking up help text on a predefined string of characters. When a user selects the button, the associated help text is located and displayed.

In practice, this movement between help screens via buttons or an action such as pressing a help key is generally referred to as a link or cross-reference. There are two primary types of links:

Implicit Links When a user selects a word on the screen and requests help by pressing F1 or clicking a mouse, the application looks up help on that word and displays the resulting help text. Within the help text itself, a user can place the cursor on any word within the help text and press F1 again to get further help. This second lookup operation is made possible by implicit links.

Explicit Links Encoded by the help file author, explicit links specify a region within the text (often represented by a button) and the help text to which that region refers. For example, a help screen may present an area labeled <Example Code> that was defined by the help author to be linked to the phrase abs.example. When the user selects that button, the application looks up the phrase abs.example, and displays the resulting help screen.

Explicit links come in two flavors -- normal and local. Normal explicit links operate exactly as in the previous example. They differ from local explicit links in that they can reference different help files from the one actually containing the button. Local explicit links, on the other hand, are restricted to linking to help text present in the same help file as that containing the button. As we'll see later, local links have advantages and disadvantages.

More Details.

To implement these hypertext links to assist users in navigating help text, a help-system author can benefit from understanding the data structure known as a help database.

Help Files and Databases

To make on-line help fast requires a way to quickly map individual words or phrases to some textual or graphic information. Associating the explanatory text or graphic information with more than one word or phrase may also be necessary. The help file contains the information that allows help retrieval to be both fast and flexible. A help file comprises one or more help databases. A help database contains the words that help is available for and the information associated with those words.

The words and phrases for which help can be requested, called "context strings," and their associated help text, referred to as "topic text," are authored using a text editor or word processor and are compiled into help files using the HELPMAKE utility. HELPMAKE creates the links between the context strings and topic text and also compresses the topic text. HELPMAKE can also decompress, or "uncompile," an entire help file into its original, editable form. (See the accompanying text box, "Turning the Tables" for information about the structure of a help database and the techniques used to compress it.)

During help-file compilation, the actual connections between context strings and topic text are made through a series of three lookup tables used by the Advisor when a help request is made (see Figure 1). The context strings supported by the help file are stored as a simple array in the first table, and the help text contained in the file can also be viewed as an array of topics. The job at lookup time, then, is to map the given context string to the correct piece of topic text to which it refers.

The first table, the context-string table, is just a simple array of context strings. The index of a string in this table becomes its "context number." The second table maps the context number to its corresponding topic. The entry indexed by the context number in this "context map" contains the corresponding "topic number." The last table maps the topic number to the actual position within the help file that contains the topic text. From this last table, the compressed size of the topic can be calculated (the difference between two successive file offsets; this number can then be used for memory allocation) along with the predicted end of the help file (the file offset of the last-plus-one help topic used for concatenated help files, described later).

The Advisor optimizes this lookup process by loading the mapping tables (and the decompression tables described in the text box) into memory once, wherever possible, and leaving them there until the application instructs it to discard them.

Because a context string is transformed into a number representing its position in the context-string table, an application can discard this string once the transformation has been made. In fact, this context number is returned to the application in a form that also identifies the specific help database to which it applies when more than one help database is being used. This context number can be used to map backward if necessary to determine what string was originally looked up and what database it came from.

One special type of context number provided by the Advisor for the application is called a "local context number," or simply "local context." Local contexts are not associated with a context string; rather, they are directly encoded with the desired topic number.

Local contexts are identified by the help-file author at the time the file is written and are compiled specially by the HELPMAKE utility. Explicit links often use local contexts instead of normal contexts because the 2-byte topic number with which a local context is encoded is much smaller than the string normally coded into an explicit link. The only restriction on local contexts is that explicit links that reference local contexts must have the topic text associated with each local context present in the same help database.

Multiple help databases can be concatenated to form a single help file for a given application. When a help file is opened, the hypertext system, using the three tables previously mentioned, examines what should be the end of the database in that file. If instead it finds the beginning of another help database, it opens that as well and repeats the end-of-database examination. By repeating this operation, multiple help data files can be combined simply by using the MS-DOS Copy command, and even applications that support only one help file can be "fooled" into using several help files disguised as one.

The Help Engine

The help engine is nothing more than a data retrieval tool. The engine takes a string and maps it to the appropriate topic text. It simply searches the data structure that is a help file and makes the desired information available to an application.

The Advisor help engine knows very little about its environment -- except for the OS/2 version, which does make some assumptions about memory management and file I/O. The MS-DOS version of the help engine relies on the application to handle memory management and file I/O.

The engine enforces the help-file format and its data structures but leaves determinations about how to use retrieved data to an application. The help engine has to be told what help files to deal with, and an application can specify multiple help files.

In addition to routines to query for and retrieve help text, the help engine provides routines that the application must use to perform decompression after the topic text has been located. The help engine also has routines that an application can query to discover if a topic has cross-references or to read only the color or control information in a topic.

The help engine uses application-provided handles for memory management and for file I/O and can therefore deal with many different memory and I/O schemes.

The Application's Job

The application program that uses the help system must perform several important functions of its own (see Figure 2), including providing the user interface to the help system and supplying the interface to its environment.

The most important of these functions is the user interface for displaying help information and interacting with the user and the help screens. The application defines what text gets parsed into a context string when a help request is made, and it then passes that string to the help engine, which mounts the search for that string. For example, if the cursor resides on a menu or an open dialog box, the application's parser must be able to decide if help is appropriate for that object. The application also interprets all control and cross-reference commands, as well as handling multiple help files. For example, if an application has five open help files and a user asks for help on a string, the application must use the Advisor to query each help file in turn to find the desired string match.

The application must also handle failures and be able to display a message or beep the speaker when no help is available. Conversely, the application must, determine what action to take if there is help on the same context string in more than one help database.

In addition, the application must process control information. The Advisor offers a history function that lets a user move backward through the 20 most recently viewed help screens. Because the context numbers returned by the hypertext system uniquely identify each help database and context string, the history function simply keeps a circular queue of the 20 most recently accessed context numbers. The application must provide a keystroke or clickable screen button to engage this history function. The application parses a history request and then calls the help engine's history function to retrieve and display previous help screens.

Under MS-DOS, the application must also provide the interface between the Advisor and its environment, including memory management and all direct file I/O support.

Taking Advantage

The hypertext technology described here can be leveraged in two ways: By supplying supplemental help files for use by applications using the hypertext system and by incorporating the system into new applications. Supplemental help files offer the ability to customize and expand on-line help to fill a variety of needs. New applications being written can benefit from the enforced consistency in file format in that all help files can be used by any application that also uses the Advisor. This technology also provides a simple framework for implementing hypertext in any application and provides a painless way to compress text without any significant access-speed penalties.

Searching for information was the primary application of computers from the beginning. Today's hypertext-based on-screen help systems can begin to make the information that people need the most more readily accessible than ever before.

Topic text itself contains more than just the displayed text -- it also carries control, color, and cross-reference data. The control information includes a number representing the size of the uncompressed topic in bytes. Each subsequent line of text consists of the ASCII text to be displayed; attribute/color information, such as bold or underline, that an application can then map to colors if desired; and explicit link information, which specifies the areas within the line that are hot spots, and the string local context number with which the explicit links cross-reference.

When a help file is compiled, the HELPMAKE utility makes three compression passes on the topic text. Run-length, keyword, and Huffman compression are each applied to the text in turn.

Run-length encoding is the replacement of runs of characters (three or more of the same character) with a special token signifying the run, the character that is to be repeated, and the number of times it is to be repeated. Encoded this way, any run of 4 to 255 characters can be replaced by 3.

Keyword encoding is the replacement of commonly occurring words or phrases with a 2-byte token. The text is analyzed for the words occurring most frequently, and the most frequent words are collected into a key-phrase table. Occurrences of these words in the text are then replaced by a token whose value is an index into this key-phrase table. For example the word "the" is used frequently in normal English text. If you remove this word and replace it with a 2-byte token, the savings of 1 byte per occurrence adds up quickly.

Huffman encoding is simply a bit-for-byte replacement. The text is again analyzed for frequently occurring bytes, and the most frequent bytes are replaced with a shorter bit pattern. For example, if you can replace 1000 8-bit bytes with a 2-bit pattern, you save 6000 bits, or 750 bytes. To restore a Huffman-compressed file requires a table that a decompression routine can use to restore the 2-bit pattern to its original 8-bit value. The side effect of Huffman compression is that less frequently occurring bytes often get replaced by a bit pattern longer than 8 bits. However, the net gain due to the frequency of the smaller bit patterns quickly outweighs the loss due to the growth of the longer ones.

Microsoft offers a "Programmer's Workbench Toolkit" that contains the Advisor API Library and Documentation. Call 1-800-426-9400 to obtain details. --L.N., M.V.

Turning the Tables

The tables that connect the different parts of a help database are created by the HELPMAKE utility and are stored along with a header at the beginning of every help file (see Figure 3). Every help file has the following structure: A header that contains the identifier and version of the file, location of the subsequent tables, and some additional information; the topic map, an array of file positions of each of the topics in the help file, plus an extra entry to mark the end of the file; the context map, which matches a context number to a topic number; the context string table, which defines the strings when help exists and determines their context numbers; optional keyword compression table, which contains the keywords removed from the topic text during compression; an optional Huffman compression table, which contains the decode tree created during compression; and topic text.

Figure 3: The components of a help database

                                  Header
          Identifier, version number, location of tables information

                                 Topic map
  An array of file positions of topics plus an entry to mark the end of the file

                                 Context map
            A table to match a context number to a topic number

                             Context string table
              A table that defines the strings for which it exists

                        Keyword compression table
    Contains the key words removed from the text during compression

                        Huffman compression table
            Contains the decode tree created during compression

                                 Topic text