XLink: The XML Linking Language

Dr. Dobb's Journal December 1998

Enhancing hypertext linking

By Sean McGrath

Sean, chief technical officer and cofounder of Digitome Electronic Publishing (http:// www.digitome.com/), is a member of the World Wide Web Consortium's XML Special Interest Group and the Python Software Activity (PSA). He is also the author of XML by Example: Building E-commerce Applications (Prentice Hall, 1998). Sean can be reached at sean@digitome.com.

In HTML, the time honored <A> and <IMG> tags are used to achieve simple hypertext-linking effects in and between resources referenced by URLs. As a model of hypertext, HTML is very simple -- so simple, in fact, that hypertext theorists have been somewhat surprised at the level of user acceptance it has achieved. However, the increasing need for high-volume hypertext creation and management are showing up the limitations of HTML's simple hypertext model. Like many successful technologies before it, the simplicity of HTML hypertext is at once its greatest strength and greatest weakness.

As XML (eXtensible Markup Language) takes hold on the Web for creation and management of structured data, the limitations of the HTML approach to hypertext become all the more apparent. XML is all about structuring data with meaningful component names so that you can make use of the structure and the descriptive names in building web applications. Layering hypertext onto such structures is clearly desirable and holds out many exciting possibilities for innovative use of the Web.

The XML Linking Language (XLink) is a draft proposal from the World Wide Web Consortium (W3C) that addresses the shortcomings of HTML's simple hypertext model and allows the rich structure of XML documents to be fully utilized in hypertext creation and management. XLink has been under development since January 1996 and has been known by a variety of names, most notably "XLL." Indeed, XLink is still in its gestation period within the W3C process and may change its name again. That XLink is a moving target should be borne in mind when reading this article.

XLink can be thought of as one third of a triumvirate of W3C initiatives:

XML (eXtensible Markup Language) is a W3C recommendation for capturing the structure and content of information in a plain text form. (See my article "XML Programming with Python," DDJ, February 1998.)
XSL (eXtensible Style Language) is a W3C draft proposal for an XML rendering and transformation language. (See my "Rendering XML Documents using XSL," DDJ, July 1998.)
XLink is a W3C draft proposal for adding rich hypertext functionality to XML documents.

Although no such beast exists at this time, future web browsers will be able to download XML, create richly formatted renderings with XSL, and manipulate the rich hypertext created with XLink.

A Motivating Example

To demonstrate XLink's use, I'll use a simple XML document for illustration purposes. As an homage to a writer who knew a thing or two about hypertext, I'll use a snippet of James Joyce's Finnegan's Wake (see Figure 1); Figure 2 shows the Document Type Definition (DTD) for the document.

XML excels at facilitating the description and control of hierarchical structures. Visualizing XML documents as hierarchies is especially useful when thinking about hypertext. Figure 3 is a simple hierarchical view of this document.

Some hypertext linking effects you may wish to achieve for this document are:

Create a link to the last "p" element in episode 1 of book IV.
Create a link to the first occurrence of the word "shore."
Create a link allowing the user to select from a list of all the episode elements.
Make the word "riverrun" a link to the word "the" at the very end of the book. You would like to do this without changing the document in any way. Moreover, you would like users to be able to navigate the link both ways; that is, from the word "riverrun" at the start of the document to the word "the" at the end of the document and vice versa. (This is more than just a fanciful example. The first sentence in Finnegan's Wake really does begin at the end of the novel, creating a loop.)
Make the phrase "Howth Castle" a link to a graphic so that when the user clicks it, a picture of Howth Castle appears inline with the text at that point.

These forms of hypertext illustrate some of the limitations of HTML's simple point-and-shoot model of hypertext. The principle limitations are:

You can only link to a file or a named point within a file.
You must activate links by hand.
Links always cause the current document to be replaced by the linked-to document.
Links must be explicitly added to documents; they cannot be layered onto documents from the outside.
Links only go one way.
Links are one to one. You cannot link X to a user selection of A, B, or C.

There are, of course, numerous ways around these limitations using today's sophisticated document serving technology and client-side scripting tools. The intent of XLink is to provide a standard way of doing this sort of thing so that you do not need to resort to exotic (and often proprietary) programming to achieve these effects.

The XLink Model of Hypertext

XLink splits the overall issue of hypertext functionality along three main axes.

Link types. In XLink, links are classified as being either inline or out-of-line. An inline link is a link in which the linking machinery (the tagging required) is stored as part of the resources being linked. Both the <A> and <IMG> elements of HTML are thus inline link types as the link machinery (principally the name and href attributes of the A element type) are stored as part of the links themselves. An out-of-line link, by contrast, is stored separately from the resources it links. In other words, out-of-line linking lets you layer hypertext functionality on top of documents without changing the documents. This is a powerful concept, of which Finnegan's Wake scholars will undoubtedly make extensive use. It also has enormous scope in fields of endeavor where changing the document is not an option; for example, annotating medical records or legal documents.

Link behavior. In HTML, the behavior of a link is fixed by the processing software, typically the browser. Following a link will replace the currently viewed document with the linked-to document. Moreover, browser users must activate links by hand by clicking on them.

XLink generalizes link behavior along two axes -- the show axis and actuate axis. With the show axis, the display behavior of a link in XLink can be specified as:

embed, indicating that the resource retrieved by activating a link should be spliced into the resource that started the link.
replace, indicating that the resource retrieved by activating the link should replace the resource that started the link.
new, indicating that the resource retrieved by activating the link should get its own resources and not affect the resource that started the link. For an XLink-aware browser, this would typically mean creating a new window for the retrieved resource.

With the actuate axis, the mechanism by which a link is activated can be specified as:

Auto, indicating that the link should be traversed automatically when the containing resource is loaded.
User, indicating that the link should not be activated until the user requests it, typically by clicking on one of the link ends.

Using this classification system, the A element of HTML can be classified as having the show value "replace" and the actuate value "user" (users click and the content of the current window is replaced with the new resource). The IMG element of HTML can be classified as having a show value of "embed" and an actuate value of "auto" (images are automatically retrieved and spliced into the document content).

The permitted values of these two XLink specification axes lead to six permutations. Examples (with their interpretations) from an XLink-aware browser's perspective include:

show = new and actuate = user. When the user activates the link, open the linked resource in a new window.
show = new and actuate = auto. Load the linked resource automatically into a new window.
show = embed and actuate = user. When the user activates the link, retrieve the linked-to resource and splice it inline into the current resource.

Link complexity. The final axis is a convenient segregation of links into two camps -- simple links and extended links. Simple links are essentially enhanced HTML style links. They are usually inline and always unidirectional. Extended links, on the other hand, are much more general. They can involve any number of resources and can be traversed in multiple directions. Of particular interest in simple links and extended links alike is the powerful addressing language XPointer. It can be used to specify addresses of objects that take part in links. A treatment of the full power of extended links is beyond the scope of this article.

Adding Hypertext to XML Documents

In HTML, elements with hypertext semantics are wired directly into the language. Simply using the magic element names A and IMG in your documents serves to signal to the HTML-aware processor that hypertext is present. With XML, there are no such magic element types. Users of XML are free to create elements with whatever names they feel are most appropriate to the underlying data: <invoice>, <annotation>, <molecule>, and so on. This raises the question: How does the XLink processor recognize the presence of hypertext link semantics on such elements?

In XLink, hypertext is recognized by means of certain reserved attribute names. Principle among these is the xml:link attribute, which is used to specify whether a link is simple or extended. Setting this attribute to simple causes the XLink-aware processor to look for a href attribute to use as the address of the linked resource. Thus, an XLink-compatible version of HTML's A element might look like Example 1(a).

It is important to note that it is the xml:link attribute -- not the element type name -- that signals the presence of hypertext. For example, the document snippet in Example 1(b) has exactly the same hypertext semantics.

You can simplify the markup required to add XLink hypertext semantics to elements using the XML concept of fixed attributes. For example, if you add an xref element for simple linking in Finnegan's Wake, you might create a simple unidirectional link from the last word back to the first paragraph; see Figure 4.

You can remove the need to specify the xml:link attribute every time you use xref by declaring it in the DTD. Figure 5 is the modified version of the DTD in Figure 2. You can now dispense with the xml:link attribute; see Figure 6.

When this document is processed by a DTD-aware XML processor, all occurrences of the xref element will be automatically assigned an attribute xml:link with the value "simple."

In XML, DTDs are optional and may or may not be present when a document is processed. It is possible to get the same markup simplification by including the fixed attribute declarations at the top of the XML document in a portion known as the "internal declaration subset;" see Figure 7. With this declaration subset in place, the DTD does not need to be present in order for an XLink-aware processor to correctly interpret the hypertext.

XPointer

The primary weapon for addressing objects with XLink is the Universal Resource Identifier (URI) -- the familiar URL being a particular form of URI. XLink supports the concept of a fragment identifier from HTML so that "foo.htm#bar" is a valid way of addressing the location bar within the resource foo.htm. However, XLink goes significantly beyond this by providing an entire little language for addressing fragments known as the XML Pointer Language (XPointer).

For many years now, an initiative known as the Text Encoding Initiative (TEI) has been underway capturing literary documents of all forms -- including Finnegan's Wake -- in Standard Generalized Markup Language (SGML). Over the years, the developers of TEI have developed a powerful addressing mechanism for SGML documents known as "Extended Pointers." This work has heavily influenced the design of XLink's XPointer language -- suitably simplified for the subset of SGML that is XML.

An XPointer is a string consisting of a series of terms separated by "." characters known as "location terms." Terms come in two flavors -- absolute and relative. An XPointer will typically start with an absolute term and use relative terms thereafter to gradually zoom in on the object it is addressing.

Table 1 provides some XPointer examples. Refer to the tree diagram in Figure3 to get a feel for how the addressing mechanism works.

XPointers can also be used to link into locations in HTML documents. In HTML, the syntax "foo.htm#bar" is interpreted to mean "the first A element within the resource foo.htm with a name attribute set to bar." In XPointer syntax, this can be expressed as: foo.htm#root().descendant(1,A,NAME,"bar"). XPointer includes a convenient shorthand for this so that foo.htm#root().html(bar) is equivalent.

In fact, this can be further shortened. Unless told otherwise, an XPointer string is presumed to begin with the root() term, so Foo.htm#html(bar) will also suffice.

XLink and Existing Standards

XLink's design has been heavily influenced by a variety of existing hypertext systems and standards primarily HTML, HyTime, and TEI.

HTML. XLink has inherited from HTML the principle that a lot can be achieved with a little. Also, the fragment identification mechanism of HTML has been carried over thus facilitating linking from XML documents into HTML.

HyTime. The ISO international standard for hypertext (HyTime) is a general hypertext standard that goes significantly beyond XLink allowing linking between anything in any place at any time. The attribute approach to recognizing the presence of hypertext semantics on arbitrary elements used in XLink is inherited from HyTime. More generally, the attribute-based approach is an example of an even more general and powerful technique known as Document Architectures. For more information, see http://www.hytime.org/.

Text Encoding Initiative. The XPointer addressing language in XLink is heavily based on TEI's Extended Pointers for SGML addressing. Also see http://www .uic.edu/orgs/tei/.

Conclusion

The core XLink documents can be found at http://www.w3.org/TR/WD-xlink, http://www.w3.org/TR/WD-xptr, and http:// www.w3.org/TR/NOTE-xlink-principles.

Although it will be some time before XLink Version 1.0 becomes an official recommendation, that does not mean that there are no tools that allow us to play with the concepts. XLink draws heavily on concepts from HyTime, SGML, and TEI. You can look to tools in this space for an idea of what XLink implementations will look like. MultiDoc Pro and Softquad Panorama are examples of simple, but powerful, SGML/XML viewers that incorporate useful parts of TEI's Extended Pointers and HyTime. A freely downloadable evaluation edition of MultiDoc Pro is at http://www.citec.fi/.

Lars Marius Garshol's implementation of XPointers for the Python Programming language can be found at http://www .stud.ifi.uio.no/~larsga/download/python/xml/xptr.html.

For information about URIs and how they relate to URLs, see http://www .ics.uci.edu/pub/ietf/uri/.

Eliot Kimber, a prominent figure in the world of XML and Hypertext has developed a Visual Basic Application for illustrating and experimenting with HyTime and Document Architecture concepts; see http://www.phylis.com/.

Dave Meginnson has implemented a Document Architecture processor in Java. It sits on top of the SAX API (also designed by Dave Meginnson) for XML processors; see http://www.meggison.com/.

Geir Ove Grönmo of Step Infotek has developed a Document Architecture processor in Python http://www.infotek.no/ ~grove/software/xmlarch/xmlarch.html.

An experimental and partial web version of Finnegan's Wake (Finnegan's Web) is at http://www.trentu.ca/faculty/jjoyce/fw.htm.

DDJ