Hypertext '93

by Ray Valdes

Hypertext, one of those terms that was once fresh and exciting, has acquired a patina of age, even though the field has yet to come into its own. Last winter's Hypertext '93, subtitled "The Fifth ACM Conference on Hypertext" and held in Seattle, provided an ideal opportunity to get a reading on where the field is at, and on its prospects for the future.

The conference is sponsored by three ACM special-interest groups (SIGs): SIGLINK (Hypertext), SIGIR (Information Retrieval), and SIGOIS (Office Information Systems). The conference attendees numbered about 800 and consisted of university researchers, professors, students, publishers, librarians, and a contingent of "hypertext authors"--writers who are using hypertext systems to produce creative literary works. Initially, you might wonder whether hypertext is still a viable field. After all, "hypertext" was coined as far back as 1965, by Ted Nelson, and some of the most visible projects, such as Nelson's Xanadu, seem destined to remain unfinished after more than 20 years of work. (Autodesk, recent funders of the Xanadu Project, have pulled the plug on its involvement. Efforts from other, more mainstream, vendors seem equally stagnant.)

So you might legitimately conclude that the time for hypertext has passed. But this would, of course, be a mistake. The deployment of hypertext has become pervasive in certain areas, in an almost mundane and prosaic form that has rendered it invisible to even some of its fiercest advocates. Possibly the most widespread use of hypertext is Windows HLP files, which are found on over 50 million PCs today. If you're a software developer on the DOS/Windows platform, chances are you use a hypertext system every day, in the form of the online reference manuals that replace the bulky and almost unusable printed volumes on the Windows API, in order to look up details of Windows API calls. And, of course, more and more publications are appearing on CD-ROM--not as individual issues, but as complete sets that allow the user to browse and query through years of issues.

However, conference sessions did not focus on these everyday uses of hypertext, but on concepts and technologies for next-generation systems such as the automatic generation of links. In the classic scenario of creating a hypertext document, the document author places each link with the same deliberate care used in choosing each word. This classic definition still holds true in the case of literary works, such as those from the so-called "Eastgate School," a crew of poets and writers spun out of Brown University's IRIS project. But in industrial-strength uses involving large amounts of information (one project consists of 3.5 gigabytes of text) from a variety of sources, it's no longer feasible to manually place each link.

One presentation, by Gerard Salton (professor of computer science at Cornell and known as "the father of information retrieval" for having worked on these issues since the mid-sixties), focused on the problem of automatically establishing links when you have a large heterogeneous collection of text from a variety of sources and you want to integrate this corpus into an information database. Salton's approach generates "term vectors" for each text segment, which characterize the content of each section of text and which can be mathematically compared to using a vector-similarity function. In this approach, there is no specific link when the user clicks on a paragraph; instead, the system displays a list of paragraphs whose content most resembles the selected text.

Another presentation described the approach used at the University of Waterloo's Centre for Text Research. Their system is architected around multiple concurrently executing processes known as "link-resolving components"--servers that provide different kinds of information simultaneously (each in its own window) when the user clicks on a portion of a document. One window might respond with a dictionary definition of the selected word, while another might display a relevant graphic, and yet another might show a portion of a related document. The user can add or remove these link-resolving applications long after the original document has been created, tailoring a working set of information servers to fit his or her needs.

A topic of perennial interest is that of navigating hypertext links. As hyperdocuments become larger and more complex, users find themselves "lost in hyperspace," disoriented after having followed one link too many. A number of techniques address this problem, mostly involving graphical representations of the web of links in the information space. Emmanuel Noik of the University of Toronto presented a paper on "fisheye views" of hypertext documents, which can display both local and global context by varying the magnification around points of interest in a document's structure.

One reason why the term hypertext has become slightly quaint is its suffix, "text." Many, if not most, systems described at the conference now call themselves "hypermedia" rather than hypertext and allow incorporation of graphics, images, and (often) sound, plus animation. This leads to the question of how to query and browse such systems. The Miyabi system developed at NEC in Japan uses an interesting graphical query technique. The user draws a rough sketch of the desired image and the system matches this sketch (which can be distorted, incomplete, or not quite correct) against its library and comes up with an appropriate set of hits.

Despite these and many other interesting papers, the largest crowds and the most dynamic discussions seemed to center around the World-Wide Web project and its graphical interface, known as "Mosaic." The Web is a project that originated at CERN, the European physics-research facility in Switzerland. Because CERN's members are located in numerous countries, the Web is used to communicate information and ideas among them, using distributed hyperdocuments. The system's architecture consists of information servers residing on Internet nodes accessed by clients (front ends also known as "browsers") which speak the HTTP communications protocol and understand the HTML format for hypertext documents. The most popular browser is Mosaic, developed by the National Center for Supercomputing Applications (NCSA) at the University of Illinois. The Mosaic browser, which initially ran on XWindows workstations, is now available for Macintosh and Microsoft Windows platforms.

Although Internet usage has been increasing dramatically, traffic on the CERN Web Server is increasing at a rate twice that of Internet expansion. It helps that much of this software is available free (you can get Mosaic via ftp from NCSA at ftp.ncsa.uiuc.edu).

As you may have noticed from this report, hypertext researchers are located all over the globe. This yearly series of ACM conferences alternates between the U.S. and Europe. The next one will be held in Edinburgh in September 1994.