Dr. Dobb's Journal July 1998
Central to the eXtensible Markup Language (XML) philosophy is that the structure and content of information should be captured without concern for how the information will be rendered on a computer display, paper, voice synthesis, and others. Responsibility for rendering XML has been delegated to a sister standard known as eXtensible Style Language (XSL). (For more information on XML, see my article "XML Programming in Python," DDJ, February 1998.)
Like XML, XSL is a World Wide Web Consortium (W3C) initiative. In August of 1997, a draft proposal for XSL was made available as a discussion document by the W3C (http://www.w3.org/TR/NOTE-XSL.html). Although, the working draft for XSL is just that, a number of XSL applications have already appeared. In particular, Microsoft has released MSXSL, a "technology preview" implementation that is freely available at http://www.microsoft.com/xml/. In this article, I will present an overview of XSL and illustrate how it can be used with MSXSL.
As Figure 1 illustrates, the XSL philosophy can be summed up as "late binding of presentation semantics." In simple English, the idea is that information about how a document should look when rendered (presentation semantics) is separated from the document content and housed in a stylesheet. The process of creating a rendition of the content happens late -- preferably right at the point that someone wants to view it (hence, late binding).
This late binding approach has some significant benefits:
There are a number of core concepts that are central to XSL, including:
Flow Objects. In XSL, the process of transforming an XML document into a notation such as RTF, HTML, or Postscript, is expressed in terms of the construction of flow objects, which are pages, columns, paragraphs, table cells, and so on.
Platform-Independent Flow Objects. XSL specifies a set of standard flow objects such as paragraph, page sequence, table, and the like. Using these platform-independent flow objects lets you create multiple output notations with a single XSL stylesheet. The type of notations that can be created is limited only by the back-end notations supported by the XSL processor. Strong candidates for XSL back ends include RTF, FrameMaker MIF, and TeX.
HTML-Specific Flow Objects. To facilitate the use of XSL stylesheets to generate HTML, XSL provides a set of HTML-specific flow objects. Given the vast amount of HTML-aware software in existence, it makes sense to use this software, while simultaneously retaining the advantages of XML over HTML as a data representation.
Construction Rules. Flow-object construction in XSL is controlled by rules in the XSL stylesheet. These rules specify what flow objects are to be created and what they should contain. Flow objects can be thought of as containers for document content and/or other flow objects creating a tree-like hierarchy known as a "flow-object tree." Flow-object construction rules take the form of a pattern and action. The pattern part specifies the conditions under which the rule triggers. The action part specifies what flow objects to construct.
Characteristics. Flow objects can have associated characteristics that differ depending on the type of flow object being constructed. A paragraph flow object, for example, might have margin and tab characteristics. A table cell might have border and spanning characteristics. The characteristics to be applied to flow objects can be controlled in the XSL style-sheet by means of style rules. Style rules take the same general form as construction rules, and consist of pattern and action components.
Scripting. No stylesheet language that provides a fixed set of rendering capabilities can provide all the processing power needed. There comes a point where a "Turing Complete" programming language is the best way to get the job done. The XSL draft specifies ECMAScript (a standardized version of JavaScript -- ECMA 262) as a built-in scripting language. A number of mechanisms are provided in XSL for escaping to ECMAScript to perform calculations, define functions, and so on.
MSXSL is Microsoft's technology preview implementation of the XSL draft specification. Don't confuse it with MSXML, which is Microsoft's implementation of an XML parser. Indeed, MSXSL uses MSXML to parse and load XSL stylesheets.
MSXSL focuses on creating HTML from XML and, for the time being, only supports HTML flow objects. The simplest way to use MSXSL is via the provided command-line utility that takes the input XML file (-i), input stylesheet file (-s), and output HTML file (-o). For example, the command C>msxsl -i foo.xml -s foo.xsl -o foo.htm processes the foo.xml file with respect to the foo.xsl stylesheet specification, then generates the foo.htm output file.
To illustrate how to use XSL and MSXSL, I'll return to the XML document (see Figure 2) presented in my February 1998 article.
Listing One(a) reates a simple stylesheet to convert the XML document in Figure 2 to HTML. Some things to note about this stylesheet:
Listing One(b) is the result of processing the XML document with this style-sheet. While it's hardly the world's most exciting HTML file, there are some important things to note:
Listing Two(a) adds a few more construction rules to create slightly more pleasing HTML output, while Listing Two(b) presents the result of processing the XML document with this stylesheet. Things to note about this stylesheet and the resultant HTML include:
The Car elements in Figure 2 have attributes for price and currency information. These can be accessed in XSL by escaping to the ECMAScript scripting language. XSL provides an eval element that can be used to house script code. In Listing Three(a), the rule is modified to let Car elements access the attribute information. Listing Three(b) is the HTML from this modified stylesheet.
The CDATA section in the eval element is an XML construct that shields text from the attentions of the XML parser. The CDATA section begins with the "<![CDATA[" string and ends at the "]]" string. It is a good idea to use CDATA sections to shield script code, since characters such as "<" and "&" can have special meanings to an XML parser.
Listing Four(a) is a stylesheet creating a simple HTML table layout of "car for sale" information. Listing Four(b) is the result of applying this stylesheet to the XML file.
Figure 3 shows what the generated HTML file looks like in Internet Explorer 4.0.
With XSL, it is possible to exert control over the order in which elements in the source document are processed. This allows document content to be selected and rearranged prior to creating the output. In Listing Five(a), a table of car maker names is created; Listing Five(b) shows the result of applying this stylesheet to the XML file.
Only the Maker element data has appeared in the output. This is because the select-element element indicates that only the Maker children of Car elements are processed. By default, the select-element element looks at the children of the current element to find matches. It is also possible to arrange for select-element to search all descendants by specifying the value ''Descendants'' to the optional from attribute: <select-elements from = ''Descendants''>. The <children/> element used in previous examples is shorthand for <select-elements from = ''Children''>
As a final example, the stylesheet in Listing Six(a) uses ECMAScript to present all prices in Irish Punts in the generated HTML; Listing Six(b). The define-script element is used to create global variables and functions, and the eval element is used to invoke functions and access global variables.
Although Cascading Style Sheets (CSS) can be used to render XML documents, XSL provides many more capabilities than CSS. With CSS, the document structure is essentially fixed and is simply mapped onto the available flow objects. With XSL, the document structure can be rearranged and can be processed multiple times. For example, with XSL is it possible to perform a traversal to generate a table of contents, then perform a second traversal to render the content proper. Also, XSL is programmable via ECMAScript, thus providing a Turing Complete environment in which to create rendering effects.
(On the other hand, CSS is simple and familiar to many HTML users, and work is underway at Hewlett-Packard to create an extended implementation of CSS, known as "Spice," which makes up for some of these deficiencies.)
XSL draws heavily on the concepts used in the Document Style and Semantics Specification Language ISO 10179 (DSSSL) Standard for SGML rendering. The DSSSL Standard uses a programming language based on Scheme as its expression language. Many of the DSSSL designers have been instrumental in the design of XSL, and work is underway to make the XML-based expression language of XSL formally a part of the DSSSL international Standard. Henry Thompson of the University of Edinburgh has developed the XSLJ conversion utility that converts XSL specifications into DSSSL specifications. These can then be used with implementations of the DSSSL Standard such as Jade (http://www.jclark.com/).
XSL is an important part of the overall vision of XML. The core XML effort has three separate strands -- XML itself, XML rendering (XSL), and XML hypertext linking (XLL). The speed with which companies such as Microsoft have moved to implement XSL has come as something of a surprise to many. Even in its current basic state, MSXSL is capable of real work and provides a glimpse of the capabilities you can expect in the next generation of web browsers, which should process and render XML directly.
DDJ
(a)C>type cfs1.xsl <!-- Ultra simple XSL stylesheet --> <xsl> <rule> <!-- Pattern --> <root/> <!-- Action --> <HTML> <HEAD> <TITLE>Cars for sale - Example 1</TITLE> </HEAD> <BODY> <children/> </BODY> </HTML> </rule> </xsl> (b) C>msxsl -i cfs.xml -s cfs1.xsl -o cfs1.htm C>type cfs1.htm <HTML> <HEAD> <TITLE>Cars for sale - Example 1</TITLE> </HEAD> <BODY> ToyotaRedFordWhite </BODY> </HTML>
(a)<!-- Process Car elements by processing all children and then adding a horizontal rule --> <rule> <!-- Pattern --> <target-element type = "Car"/> <!-- Action --> <children/> <HR/> </rule> <!-- Process Maker elements by prefixing some literal text and then processing all children --> <rule> <!-- Pattern --> <target-element type = "Maker"/> <!-- Action --> <P> Make of Car: <children/> </P> </rule> <!-- Process both Condition and Color elements in the same way--simply create HTML paragraphs --> <rule> <!-- Pattern --> <target-element type = "Condition"/> <target-element type = "Color"/> <!-- Action --> <P> <children/> </P> </rule> (b) C>msxsl -i cfs.xml -s cfs2.xsl -o cfs2.htm C>type cfs2.htm <HTML> <HEAD> <TITLE>Cars for sale - Example 2</TITLE> </HEAD> <BODY> <P> Make of Car: Toyota </P><P> </P><P> Red </P><HR><P> Make of Car: Ford </P><P> </P><P> White </P><HR> </BODY> </HTML>
(a)<rule>
<target-element type = "Car"/>
<P>
Price = <eval><![CDATA[
getAttribute("Price") + " " + getAttribute("Units")
]]></eval>
</P>
<children/>
<HR/>
</rule>
(b)
C>msxsl -i cfs.xml -s cfs3.xsl -o cfs3.htm
C>type cfs3.htm
<HTML>
<HEAD>
<TITLE>Cars for sale - Example 3</TITLE>
</HEAD>
<BODY>
<P> Price = 10000 Dollars
</P><P> Make of Car: Toyota
</P><P>
</P><P>
Red
</P><HR><P> Price = 20000 Irish Punts
</P><P> Make of Car: Ford
</P><P>
</P><P>
White
</P><HR>
</BODY>
</HTML>
(a)C>type cfs4.xsl
<xsl>
<rule>
<!-- Pattern -->
<root/>
<!-- Action -->
<HTML>
<HEAD>
<TITLE>Cars for sale - Example 4</TITLE>
</HEAD>
<BODY>
<TABLE BORDER="1">
<TR>
<TD>Number</TD>
<TD>Price</TD>
<TD>Maker</TD>
<TD>Condition</TD>
<TD>Color</TD>
</TR>
<children/>
</TABLE>
</BODY>
</HTML>
</rule>
<rule>
<!-- Pattern -->
<target-element type = "Car"/>
<!-- Action -->
<TR>
<!-- Automatically number the table rows -->
<TD><eval>childNumber(this)</eval></TD>
<TD>
<eval>getAttribute("Price") + " " + getAttribute("Units")</eval>
</TD>
<children/>
</TR>
</rule>
<rule>
<!-- Pattern -->
<target-element type = "Maker"/>
<target-element type = "Color"/>
<!-- Action -->
<TD>
<children/>
</TD>
</rule>
<rule>
<!-- Pattern -->
<target-element type = "Condition"/>
<TD>
<eval><![CDATA[
getAttribute("Type")
]]></eval>
</TD>
</rule>
</xsl>
(b)
C>msxsl -i cfs.xml -s cfs4.xsl -o cfs4.htm
C>type cfs4.htm
<HTML>
<HEAD>
<TITLE>Cars for sale - Example 4</TITLE>
</HEAD>
<BODY>
<TABLE BORDER="1">
<TR>
<TD>Number</TD><TD>Price</TD><TD>Maker</TD><TD>Condition</TD>
<TD>Color</TD></TR>
<TR><TD>1</TD><TD>10000 Dollars</TD><TD>Toyota</TD><TD>Good</TD><TD>Red</TD>
</TR>
<TR><TD>2</TD><TD>20000 Irish Punts</TD><TD>Ford</TD><TD>Good</TD><TD>
White</TD></TR>
</TABLE>
</BODY>
</HTML>
(a)C>type cfs5.xsl <xsl> <rule> <root/> <HTML> <HEAD> <TITLE>Cars for sale - Example 5</TITLE> </HEAD> <BODY> <TABLE BORDER="1"> <children/> </TABLE> </BODY> </HTML> </rule> <rule> <target-element type = "Car"/> <TR> <select-elements> <target-element type = "Maker"/> </select-elements> </TR> </rule> <rule> <target-element type = "Maker"/> <TD> <children/> </TD> </rule> </xsl> (b) C>msxsl -i cfs.xml -s cfs5.xsl -o cfs5.htm C>type cfs5.htm <HTML> <HEAD> <TITLE>Cars for sale - Example 5</TITLE> </HEAD> <BODY> <TABLE BORDER="1"> <TR> <TD> Toyota </TD> </TR><TR> <TD> Ford </TD> </TR> </TABLE> </BODY> </HTML>
(a)C>type cfs8.xsl
<xsl>
<define-script><![CDATA[
// 1.5 Dollars to every Irish Pound
var ExchangeRate = 1.5;
// Convert price into Irish Pounds based on the ExchangeRate variable
// if units is Dollars
function getPriceInIrishPunts(price,units)
{
if (units == "Dollars")
return price * ExchangeRate + " Irish Pounds";
else
return price + " Irish Pounds";
}
]]></define-script>
<rule>
<!-- Pattern -->
<root/>
<!-- Action -->
<HTML>
<HEAD>
<TITLE>Cars for sale - Example 8</TITLE>
</HEAD>
<BODY>
<P><B>
Note: Exchange Rate Used <eval>ExchangeRate+" Dollars per Irish Pound"</eval>
</B></P>
<children/>
</BODY>
</HTML>
</rule>
<rule>
<!-- Pattern -->
<target-element type = "Car"/>
<!-- Action -->
<P>
Price in Irish Punts= <eval><![CDATA[
getPriceInIrishPunts(getAttribute("Price"),getAttribute("Units"))
]]></eval>
</P>
<children/>
</rule>
<rule>
<!-- Pattern -->
<target-element type = "Make"/>
<target-element type = "Color"/>
<!-- Action -->
<P>
<children/>
</P>
</rule>
</xsl>
(b)
C>msxml -i cfs.xml -s cfs8.xsl -o cfs8.htm
C>type cfs8.htm
<HTML>
<HEAD>
<TITLE>Cars for sale - Example 8</TITLE>
</HEAD>
<BODY>
<P><B> Note : Exchange Rate Used 1.5 Dollars per Irish Pound
</B></P>
<P> Price in Irish Punts= 15000 Irish Pounds
</P>Toyota<P>
Red
</P><P> Price in Irish Punts= 20000 Irish Pounds
</P>Ford<P>
White
</P>
</BODY>
</HTML>