Kent Dahlgren is the strategic planner for graphics products at Paradise Systems, a manufacturer of graphics devices. He can be reached at 800 E. Middlefield Rd., Mountain View, CA 94043.
Inspired by the success of the Apple Macintosh family with its user windowed interface, the personal computer industry has adopted the use of graphical interfaces as one of its dominant trends. The market is rapidly advancing to a point when advanced graphics will be expected of all new applications. Unfortunately, the proliferation of windowing standards, graphics libraries, and printer interfaces has created confusion among both hardware and software developers. A wide range of interfaces is available with an equally wade range of functionality. In addition, issues of compatibility between various subroutine libraries and windowing interfaces are present. This article provides programmers with an overview of some of the more important graphical interfaces, as well as considerations in selecting one.
Consider a model representing the components that might be present in a graphical interface package. This will provide a common frame of reference for the purposes of discussion, the same way that the OSI networking model serves as a means of describing networking interfaces. Figure 1, page 33, shows the components of the model, as well as their relationships.
User
Application User
Interface
Window Manager
API-->
Display List Manager
Mapping / Translation Layer
VDI-->
Rendering Interface
Driver A Driver B
Device A Device B
In addition to the components themselves, consider the terminology describing two of the key interfaces in most graphics system implementations. The Applications Programmer's Interface (or API) is a set of routines that allows an application programmer to communicate with both the Window Manager and the Graphics Engine. Systems programmers who are porting new graphical interfaces to a machine, as well as hardware vendors who are interfacing graphics hardware, are concerned with the Virtual Device Interface (VDI). These represent the front end and back end of a computer graphics system.
In windowing environments, the user interface is commonly referred to as the "look and feel." This is the channel through which the user communicates with the Window Manager. This channel allows the user to alter the size, shape, and arrangement of windows, as well as open and close them. The user interface also handles pull-down and pop-up menus, dialog boxes, and other graphical elements of communication.
Note that the user interface is, for the most part, separate from the remainder of the windowing system. In many systems, the differentiation is purely conceptual. On the other hand, some systems (such as X Windows) don't define any user interface as part of the specification. In those cases, the implementer must decide how screen actions will affect the system state.
As a result of this separation between the User Interface and the Window Manager, you can map a common look and feel onto differing windowing systems. Or, you can map several user interfaces onto a single windowing system. If done properly, such differences between user interfaces are transparent to applications.
The recent popularity of X Windows in the Unix system community has led the development of the Open Look user interface. This interface is being championed by Sun Microsystems and AT&T. The rival OSF Unix camp has been examining contenders for their own X Windows user interface. (There is considerable speculation that OSF is considering the IBM/Microsoft Presentation Manager as its standard user interface.) The ability to tailor a distinctive user interface in order to differentiate between various products in the marketplace is one reason why X Windows is popular with Unix systems vendors.
The Window Manager is responsible for the abstraction of a bit-mapped display image to multiple, virtual display surfaces. It maintains the system's data structures and informs both the user interface and the applications about the size, shape, and visibility of the various windows displayed on the screen. In the case of Microsoft Windows, this is the portion of the system that layers multitasking capabilities on top of DOS.
Two classes of interactions occur between applications and the Window Manager. Applications request the manager's services through a library of subroutines that handle such nondrawing activities as the opening and closing of windows. The Window Manager responds to user-generated events by sending messages to the affected applications. Clicking on a window's close box, for example, sends a close message to the application that owns that window, as well as other applications owning windows uncovered by the closing receive messages that indicate what needs to be redrawn.
This type of asynchronous, event-driven environment requires program structures closer to that of real-time control environments than to typical applications programs. Figure 2, page 33, shows the basic flow of a typical windowing application as expressed in pseudocode.
BEGIN
Initialize data structures
Setup menus
Open main window
WHILE (not done)
BEGIN
CASE (message type) OF
type_A_message : execute_A_handler( );
type_B_message : execute_B_handler( );
type_C_message : execute_C_handler( );
type_D_message : done = TRUE
default handler
END
Clean up environment
END
Although the structure of the program requires rethinking, the windowing system itself typically offloads many common chores from the application. For example, a single function call within the Macintosh programming environment initiates the opening of a window that allows the user to select a file. Putting this level of functionality in one call not only simplifies programming, but also ensures that the user will encounter the same menu structure in different applications. This uniformity allows Macintosh users to quickly learn new applications.
The Display List Manager, Mapping/Translation Layer, and Rendering Interface are collectively referred to as the Graphics Engine. Through its drivers the Graphics Engine handles the task of putting graphical objects on the screen. Applications use the Graphics Engine to display objects within their windows. The user interface uses it to construct window borders, menus, screen backgrounds, and other visual elements.
In nonwindowed graphical systems, the User Interface and the Window Manager are not present, and the entire interface consists of components of the Graphics Engine. The Graphics Engine itself is also considerably simplified in these cases since there is no requirement to map multiple logical screens to the physical screen.
The Display List Manager decouples the generation of drawing requests of the application from the hardware (or software) that performs the rendering. This requires some form of buffering, which could be as simple as a queue or as sophisticated as a hierarchical object-oriented database.
Even a simple queuing arrangement can be beneficial when a graphics coprocessor performs the rendering. A display list queue eliminates the requirement of the host CPU waiting for completion of the first drawing command before issuing the next. Applications tend to issue these drawing commands in bursts; the display list queue evens out the workload.
More sophisticated drawing interfaces (such as GKS, PHIGS, and HOOPS) allow the application to group drawing primitives and manipulate them as single objects. PHIGS and HOOPS carry this idea one step further by arranging these display list groupings into hierarchies that allow children to inherit characteristics from their parents. Inheritance is particularly important in 3-D graphics, where the inheritance allows the programmer to manipulate one component, several related elements, or an entire complex object, all with equal ease.
An example is the image of a robot arm, which might consist of a base, an upper arm, a lower arm, and a hand. All of these parts share a positional relationship to each other. If one rotates the base, all the other components remain fixed with respect to one another and move as a unit. Therefore, the base is the parent node of the hierarchy, and all other parts inherit the attribute of position from the base. Each child component also has some freedom of movement, which affects its children but not its parents. If you move the lower arm, for example, the hand must go with it--obeying the law of inheritance--but the upper arm and base are unaffected.
Most graphical interfaces map drawing primitives to the display through layers of coordinate transformations. Figure 3, page 35, shows how GKS implements coordinate transformations in a two-dimensional space. The coordinates that the application uses to describe objects are referred to as the World Coordinate (WC) system and use a floating-point representation. GKS then maps these points as an internal abstract display using what are called Normalized Device Coordinates (NDC). These coordinates are unsigned values normalized between 0 and 1 in both the X and Y directions. The rendering interface maps NDC to the actual Device Coordinates (DC) of the output medium. This two-stage mapping allows GKS applications to zoom or pan the viewing area over the database by changing transformation parameters.
Figure 3: 2-D coordinate trsnsformations in GKS
While the bulk of the older interfaces supports only two-dimensional drawing spaces, the increasing interest in CAD-type software has created a demand for three-dimensional graphics capabilities in new interfaces. The development of personal computers with the power of a workstation has only recently made three-dimensional graphics practical on small systems.
Handling three-dimensional-display lists is not challenging. Rather, the problem is mapping the data to the screen and rendering it in the display buffer. In order to give the WC system sufficient dynamic range, floating-point coordinates normally are used. The transformation of each point requires at least one matrix multiplication (and usually several), and it incurs significant additional overhead for hidden line removal and surface shading. These programmers who use 3-D graphics on the present generation of personal computers must either be content with nonreal-time image creation, or they must invest in expensive special-purpose hardware.
The-dimensional graphics standards (whether formal or vendor-specified) have several attractive features. For one, the programmer does not have to deal with the complex issues of transforming internal representations of objects into visual images. These tasks are done by making simple API subroutines, passing the names of an object to be manipulated, and various control parameters. For another, the level of data abstraction allows you to achieve performance improvements transparently as new and faster hardware becomes available--assuming, of course, that the new platform supports the graphical system under which the application and its data were developed.
Two methods exists for displaying text in graphics systems. The most common in personal computers is raster text. Fonts are stored in memory as an array of bitmaps indexed by the character value, and then placed in the desired position on the screen by using a copy operation. If the graphics hardware includes logic for performing these memory transfers, the operation occurs quickly. The primary problem with this type of representation is the difficulty in rotating the character images and scaling them to different point sizes.
More sophisticated environments use stroke or vector fonts instead of bitmaps. Here the outline of a character is formed from straight and curved line segments and then filled. The beauty of this approach is that, unlike raster fonts, the characters can be rotated and scaled cleanly and the fonts are not intimately linked to the resolution of the device.
Metafiles provide a means of archiving collections of graphic primitives, either for future use by the same application or for use by other applications (image libraries). Since collections of primitives are inherently device-independent, image libraries can be used by other systems that support the same graphical interface. The key international standard for this capability is the Computer Graphics Metafile (CGM), a specification approved by both ISO (ISO 8632) and ANSI (ANSI/X3.122).
Having covered the general characteristics of the components and issues surrounding graphical interfaces, let's survey the most significant standards in use today.
GKS was the first formally approved two-dimensional-graphics interface standard (ISO 7942, ANSI/X3.124). It provides a rich set of graphics primitives and a flexible mapping scheme. It also includes facilities for applying geometric transformations to primitives or groups of primitives. Beyond these fundamental concepts, GKS supports object-oriented graphics programming. This is implemented as the recording of a series of calls in a structure referred to as a segment, which can then be replayed to reproduce a complex series of operations.
This facility has some serious limitations. Once a segment has been defined, you have no way to edit it. If changes have to be made, you must delete it and rebuild it from scratch. Another limitation is that segments cannot be defined hierarchically, which means that segments cannot contain references to other segments. Later standards (such as PHIGS) remove these limitations.
PHIGS is a draft standard (ANSI/X3.144) developed to fill the need for a three-dimensional graphical interface standard, as well as to correct some of the deficiencies of GKS. Its drawing model was derived from GKS and the syntax of the function calls is similar, but PHIGS goes far beyond the capabilities of its predecessor.
The core of the PHIGS Display List Manager is the Centralized Structure Store (CSS). The CSS is a hierarchical database for maintaining models of graphical objects. The segments used by GKS are referred to as structures in PHIGS. Not only has the name changed, but structures can be edited and can refer to other structures. The only limitation on structure references is that recursion is not permitted (that is, an object cannot be defined in terms of itself). PHIGS has the additional capability of using "generalized structure elements," which allow the inclusion of implementation-dependent extensions within the database.
Such tight coupling of the graphical database and the drawing components greatly simplifies the development of applications that must deal with complex objects. For example, you can construct an image of a complete jet aircraft from the specifications of its individual parts. The entire image can be rotated as an entity by using one system call, with PHIGS responsible for translating all of the components. The application is only required to convert the data format of the parts into a PHIGS representation and define their interrelationships.
The downside, of course, is the amount of horsepower required to support such comprehensive functionality. Running PHIGS on anything less than a 68020- or 80386-based platform results in unacceptable performance.
Currently, a set of proposed extensions to PHIGS is collectively referred to as PHIGS+. The chief thrust of these extensions is the addition of shading capabilities. Shading algorithms include both Gouraud and Phong, but not ray tracing. This is consistent with the philosophy of keeping PHIGS an interactive standard, inasmuch as the gigaFlops required to produce ray traced images in real time are not likely to be readily available for several years.
Postscript was developed in 1982 by Adobe Systems. 2-D in nature, its imaging model is based on concepts derived from the graphic arts. It is entirely output-oriented, and currently has no constructs for user interaction. It is an interpreted language with a Forth-like syntax.
Currently the bulk of Postscript implementations are printer-based. It is a testimony to the language's power and elegance that can be found in everything from the Apple's LaserWriter to Linotronic typesetting equipment. A screen-based derivative is used by Sun Microsystems as the drawing interface for its NEWS windowing environment. The official Adobe version of Display Postscript is also the graphical interface on the recently released NeXT personal computer.
Postscript's approach for handling fonts represents one of the most sophisticated text-rendering interfaces developed to date. Characters consist of Bezier cubic splines, which allow very precise images of complex shapes to be described with a few control points. In addition to this parametric information, Postscript font files contain heuristics rules for translating characters through rotation and scaling. These rules allow the interpreter to correct anomalies that may creep in as a result of rasterization.
The primary factor limiting the growth of Postscript is that, until recently, Adobe Systems was the sole source for ports to new hardware. Several software houses (including Phoenix Technologies) have or will soon be releasing compatible implementations.
CGI is the ISO/ANSI draft of a VDI specification (ISO DP9636, ANSI/x3.122). It was designed to be the VDI for GKS and its functionality is tailored for that purpose, although considerable efforts were expended to make the interface useful for other higher-level standards. CGI currently supports a rich set of drawing primitives, attribute and segment manipulation functions.
CGI has influenced the development of interfaces from Digital Research, Microsoft, GSS, Nova Graphics, and others. Although these interfaces had their origins in CGI, most have diverged from it. One reason is that few developers are able to wait for a proposed standard to follow the long and winding road to becoming an official standard. Another reason is that many developers want to strip out functionality in order for the implementation to run efficiently on the lowest common denominator (a 4.77 MHz 8088). At this point, the $64,000 question is if these specifications will move closer to CGI as the standard is finalized and higher-performance hardware becomes the norm.
Pixar's RenderMan rendering interface was designed to address the needs of applications that want to present a photorealistic representation of objects. Unlike DGIS and CGI, RenderMan is specifically designed to operate in a 3-D graphics environment. In many areas, RenderMan and the proposed PHIGS+ extensions overlap. Overall, RenderMan is far more sophisticated in its imaging model. This is apparent in its support for ray tracing and numerous shading models. RenderMan's emphasis is on the quality of the image, not on real-time interactivity (as with PHIGS+). In addition, RenderMan makes no attempt to be a complete interactive graphics environment. It does not support user input, text, and nonsurface primitives (such as lines and curves).
The Macintosh Finder is the user interface for the Apple Macintosh family of computers. It consists of a set of modules that includes the Window Manager, Resource Manager, Font Manager, Control Manager, Menu Manager, and so on. The Finder handles graphics through a 2-D interface called QuickDraw, which (like the bulk of the system software) resides in ROM. The amount of ROM-based firmware (256K in the Mac II) is one of the reasons no one has cloned the Macintosh. The Finder allows multiple overlapping windows and the recently introduced Multifinder allows limited multitasking. Graphical information may be transferred between windows through the use of the clipboard, which can be thought of as a graphical paste buffer.
With the Mac II, Apple introduced an enhanced version of the Rendering Interface called Color QuickDraw. As the name suggests, the major improvement is enhanced color support. The original QuickDraw interface supported only eight colors, which was seldom a limitation since earlier Macs had a monochrome display. With Color QuickDraw, applications specify colors from a 48-bit color space. The system firmware is responsible for mapping these logical colors to physical colors on the screen.
Another feature of the Mac II drawing environment is seamless support for multiple-display adaptors. All of the screens attached to the system are logically concatenated to form a single display space. The user can drag windows between displays and can even have windows straddle screens. The logical color space makes differences in the color depth of the physical displays transparent to the application. From the application standpoint, the elegance and simplicity of this approach is only matched by its complexity from the implementation viewpoint.
After the Macintosh Finder, the most prevalent windowing interface for personal computers is Microsoft Windows. Unlike the Finder, Windows has had nonpreemptive multitasking capabilities built into it from the beginning. Although its user interface intentionally differs from Finder in numerous ways, an experienced Macintosh user can quickly become comfortable in Windows. In fact several applications (including Microsoft Excel and Aldus Pagemaker) are nearly identical under the two interfaces.
To complement its multitasking capabilities, Windows supports an inter-process communications protocol called the Dynamic Data Exchange. This protocol is like X Windows in that it is based on a client-server model. It differs from the X protocol in that it is used for both graphical data and general-purpose interprocess communication. The reason for such a difference is that Windows runs under DOS, a single-tasking operating system. On the other hand, X Windows was designed to be hosted by the Unix system, which has a complete set of interprocess communications utilities already in place.
The Windows Graphics Interface is known as the Graphics Device Interface (GDI). It is a 2-D interface that uses a one-stage logical-to-device coordinate conversion. Like QuickDraw, Windows has a somewhat limited set of drawing primitives, but it does feature very powerful and flexible raster operations.
The most noteworthy aspect of Windows is its driver interface. The designers of Windows were faced with supporting devices as simple as the CGA and as complex as the PGA with its graphics coprocessor. The challenge was to design a driver interface that would allow the vendors of simple display controllers to write simple drivers with a few capabilities (since every function has to be performed in software), while at the same time be able to take advantage of highly sophisticated graphics devices.
The solution was to create an interface that requires any driver to perform only a small number of essential functions. Beyond this core is a wide range of functions that the driver can optionally support. When the GDI wants to perform an operation, it checks a data structure that indicates the services the driver supports. If the particular operation is supported, the GDI calls the driver directly, and otherwise it emulates the function through calls to other services that the driver does furnish. For example, if the driver doesn't support circles, the GDI constructs a circle via multiple calls to the line-drawing functions (which is required).
X Windows originated as part of Project Athena at MIT. It was developed with the help of several computer manufacturers and has become the windowing interface of choice for Unix workstations. It is supported (in one form or another) on systems produced by DEC, HP, Apollo, Sun, and others. Because the drawing interface for X is fairly minimal, most implementers have extended it, usually by the addition of a supplementary interface.
One of the most significant contributions of X is client-server architecture, which enables nodes to transparently interchange graphical information over a network. This is one of the architectural features that differentiates the X Window system from Microsoft Windows and the Macintosh environments. When an application opens a window under X, it specifies the network address. The system automatically routes the graphics output calls made by the application running on one node to the node that will do the actual rendering. Of course, a high-performance network is required to handle all the traffic resulting from such a facility. For this reason, such a feature will not likely be added to any of the PC-based windowing environments in the immediate future.
The choice of a graphics system depends on a number of factors that attempt to match as closely as possible capabilities with known and probable requirements. For example, if the primary purpose is to provide a consistent user interface through the use of windows and menus, Microsoft Windows or Finder are more appropriate than, say, Postscript or HALO, which have no explicit user interface support. On the other hand, if highly realistic three-dimensional graphics is the chief goal, it something like PHIGS or RenderMan is a better choice. Table 1, this page, lists the major features of several commercial graphics interface systems.
Table 1: Major features of several graphic rendering packages
Support for various hardware/software platforms and data portability among them might also be an important consideration in selecting a graphics system. For example, a work group using both 386-based PCs running DOS and Sun workstations under Unix and must share data and programs. In that case, Finder is definitely out of the running, since it's only available on Macintosh machines, but something like the widely-implemented HOOPS might be ideal.
Another very important factor is ease of programming and the quality of documentation. One of the chief objectives in using a packaged graphics interface is productivity: relieving programmers of the complex tedium required to get images onto the screen or manage the user interface (or both). A well-designed API allows the programmer to concentrate on the purpose of the application, rather than on display management, thus achieving the goal of increased productivity. One that is poorly designed or--even worse, badly documented--merely replaces one set of complexities with another.
Graphical interfaces are powerful and complicated toolsets. The key to selecting a graphics package and using it effectively is knowing what components it has and how they interact to solve your programming problems.