Optimizing the Performance of VRML Worlds

Creating high-performance 3-D images

David R. Nadeau, Andrea L. Ames, and John L. Moreland

Dave and John are staff at the San Diego Supercomputer Center (SDSC) and can be reached at nadeau@sdsc.edu and moreland@sdsc.edu, respectively. Andrea is an SDSC affiliate and works for Informix Software. She can be reached at andreaa@informix.com. They are coauthors of The VRML Sourcebook (John Wiley & Sons, 1995).

Virtual Reality Modeling Language (VRML) is a file format used to describe 3-D shapes and scenery that can be served on the World Wide Web. VRML is platform-neutral, enabling authors to create 3-D worlds that can be viewed on PCs as well as high-end 3-D graphics workstations. Performance differences between these platforms, however, require that VRML authors consider performance issues, as well as aesthetic issues, during the creation of VRML worlds.

From a URL to pixels, a VRML file's scene content is handled by three main software and hardware components:

The browser application.
The 3-D graphics library.
The 3-D graphics hardware (if any).

The browser's role is to download a VRML file from a Web server, check it for syntax errors, and create an internal representation of the world for use by the 3-D graphics library. Thereafter, the VRML browser handles user interaction, including interaction with user-interface widgets and movement within the VRML world.

The 3-D graphics library uses the internal representation of the VRML world to draw it as quickly as possible each time the user moves. Graphics-library drawing takes advantage of 3-D graphics hardware, when available, to speed up some or all of the stages necessary to turn 3-D shapes into pixels.

Drawing performance is largely dictated by the 3-D graphics library and hardware-not by the browser. Understanding how 3-D graphics software and hardware work provides important clues to optimizing VRML worlds for better performance.

The 3-D Graphics Pipeline

The task of converting 3-D shapes into pixels is composed of several standard algorithm stages, fit together in a 3-D graphics pipeline. Three-dimensional polygons are poured into the top of the pipeline by the VRML browser, and pixels pour out the bottom and onto the screen. The pipeline itself is implemented by software, hardware, or a mixture of both.

Table 1 describes the stages of a 3-D graphics pipeline. Each of these stages can be characterized by the type of data on which it operates, shown in the third column of Table 1.

Graphics-pipeline designs are tuned so that the computation cost of the stages are each approximately equal for a typical world. This enables each stage to work efficiently in parallel-when in graphics hardware-so that just as one stage finishes its task and outputs its result, output from the previous pipeline stage becomes available as new input. For example, while the Transform stage is working on polygon N, the Backface Cull stage is working on the transformed results of polygon N-1, the Compute Intensity stage is working on polygon N-2 (if it wasn't culled), and so on down the pipeline. When the Compute Intensity stage finishes polygon N-2, it starts on the Backface Cull stage's output of polygon N-1, the Backface Cull stage starts on the Transform stage's output of polygon N, and the Transform stage starts work on polygon N+1.

The computation cost of most pipeline stages varies from polygon to polygon. For instance, a large polygon takes longer to draw in the Rasterize stage than a small polygon. The Transform stage, however, has the same computation cost regardless of polygon size. This computation cost variation can cause a pipeline stage to block or starve, if the next or previous stage isn't keeping up with it.

When tuning a pipeline design, the hardware/pipeline designer must consider a benchmark world to determine the average computation costs in the different stages and decide where to allocate more hardware logic, or where to focus efforts on software optimization. Unfortunately, every VRML world is different and rarely conforms to the attributes of a benchmark world. No matter how well designed a graphics pipeline is, there are always situations where a world can cause pipeline stages to block or starve, thereby reducing drawing speed. The trick in authoring VRML worlds is to recognize when these block or starve situations are occurring and alter the VRML-world design or implementation to minimize them.

The performance of a 3-D VRML world can be characterized by the type of data that is causing reduced performance during one or more pipeline stages. Table 2 identifies typical performance problems, along with descriptions of the data types causing reduced performance.

Since drawing usually involves far more pixels than vertices, drawing speed is primarily affected by the speed of the Rasterize stage. Worlds that block the Rasterize stage are pixel bound. PC graphics-accelerator cards, therefore, focus on hardware features to speed the generation of pixels in the Rasterize stage of the pipeline. This can shift the performance bottleneck to another pipeline stage, typically resulting in worlds that are vertex bound. If the graphics accelerator does not support texture mapping, worlds can also become texture bound.

Transform-bound, state-bound, and light-source-bound worlds are uncommon. Most optimization efforts, therefore, focus on problems with vertex-bound, texture-bound, and pixel-bound worlds.

In all cases, a world's performance problems will shift during a user's exploration of that world. Consider, for instance, a detailed fractal terrain for a planet. When the planet is a distant dot, far from the user, the number of pixels drawn for the planet's geometry is very low, but the number of vertices in use to define that geometry is high. From such a distant vantage point, the world will be vertex bound. As the user moves closer, an increasing number of screen pixels are covered by the planet, and the performance problem may shift to being pixel bound.

More expensive graphics cards, such as those using 3-D Labs' GLINT or PERMEDIA chips, include hardware support for all stages of the graphics pipeline. High-end graphics workstations, such as those from Silicon Graphics, include highly sophisticated hardware that can achieve speeds 20 times greater than the fastest PC graphics card. In any case, the amount of hardware devoted to each of the 3-D graphics pipeline stages can shift the performance bottleneck from pixels to textures, vertices, transforms, light sources, or elsewhere.

Determining the Performance Bottleneck

Determining the performance bottleneck for a VRML world involves the following steps:

1. Display the VRML world in a full-screen VRML browser window. Move about within the world, and observe how quickly or slowly the screen updates.

2. Reduce the size of the browser's window to one-quarter its full-screen size. Move about again, and observe the screen update rate. If the update rate is now quicker, the world is pixel bound. Reducing the window size reduces the number of pixels to be drawn.

3. If the VRML world uses texture mapping, turn textures off either by editing a temporary copy of the world or disabling textures via a menu option found in many VRML browsers. Move about again, and observe the update rate. If the scene is now updating quicker than when textures were on, then the world is texture bound.

4. If neither steps 2 nor 3 affects the update rate of the world, then the world is transform, state, light-source, or vertex bound. Transform- and state-bound worlds are uncommon and can be safely removed from consideration. To determine if the world is vertex or light-source bound, edit a temporary copy of the world and delete all light sources from the world. Move about again in the darkened world and observe the update rate. If the scene is now updating quicker than when light sources were included, then the world is light-source bound.

5. If neither steps 2, 3, nor 4 affect the update rate of the world, then the world is vertex bound.

VRML worlds displayed with PC graphics software and hardware are typically pixel bound. Worlds viewed with mid-range UNIX workstation graphics systems-from Sun and Silicon Graphics, for example-are typically texture or vertex bound. Worlds viewed with high-end workstations from Silicon Graphics are typically vertex bound.

Optimizing VRML Worlds for Better Performance

For light-source bound worlds, the solution is to either remove light sources or scope them so that, for any shape in the world, there are only a few light sources contributing to its vertices' intensity calculations. While VRML supports light-source scoping within Separator groups, many graphics libraries used by VRML browsers do not support this feature. Without scoping, the author with a light-source bound world must remove light sources to achieve better performance.

For pixel-, texture-, or vertex-bound worlds, the solution is always the same: Reduce scene content. The difference is only in what scene content to reduce first.

In a vertex-bound world, focus on removing small detail that requires many vertices but has relatively little impact on the scene. Remove a few knick-knacks from the shelves of a virtual living room, for instance.

In a texture-bound world, focus on reducing the number of textured shapes, particularly if they fill a lot of pixels on the screen. Textured-background skylines, for instance, can sometimes be replaced with cheaper polygon skylines-or removed altogether-without severely damaging the impact of the VRML world.

In a pixel-bound world, focus on removing large shapes that fill a lot of screen pixels. Because large shapes such as walls and floors tend to be important, when optimizing pixel-bound worlds, you may have to focus instead on removing lots of smaller bits and pieces that collectively result in filling a lot of screen pixels.

Regardless of the performance bottleneck, removing cherished scene content is painful. While these techniques may help, avoiding the problem entirely is best.

Even with good designs, there are bad implementations where performance optimization is necessary. The following techniques will help you recognize and deal with common problems:

Use simple designs. The best optimization technique is always to use a better design in the first place. Programmer culture is filled with pithy aphorisms like "KISS: Keep It Simple Stupid," and "garbage in, garbage out." The same aphorisms apply to VRML world design. Simpler designs can reduce vertices, pixels, textures, and other performance-relevant features of your world and significantly affect drawing speed.

Figure 1(a) shows the design for an arched doorway in a dungeon room. (Source code that generates this and subsequent figures is available electronically.) The arch, room, and door require 195 vertices. Figure 1(b) shows a simpler, flat-top doorway design with a similar impact, but requiring only 126 vertices.

Remove unseen polygons. Shapes designed independent of their use frequently have extra detail that is unseen when the shapes are placed in an environment. Removing this unseen detail reduces the number of vertices, pixels, and textures to be drawn and increases drawing speed.

Figure 2 shows a dungeon room with a treasure chest. The chest, designed separately, has four sides, a bottom, an arched top, brass bands, and a lock, and requires 348 vertices. In the dungeon context, the bottom, left side, and back of the chest will never be seen and should be removed. The optimized chest uses only 288 vertices.

Remove internal polygons. A quick way of building complex shapes is to build them up from simpler component shapes. Later, the component shapes are positioned on top of each other to build the final complex shape. This kind of approach can create polygons internal to the complex shape. The internal polygons are drawn by the browser, even though they are never seen. Removing these internal polygons reduces the number of vertices, pixels, and textures to be drawn and increases drawing speed.

The internal polygon problem is similar to the unseen polygon problem discussed earlier. Internal polygon problems, however, are an artifact of how you design a shape, while unseen polygons typically result from the way you use a shape. In either case, removing the polygons increases drawing speed.

Figure 3 shows a segment of a castle wall. The wall is built from four Cube nodes: one for the main wall, and one for each of the three blocks sitting on top of the wall. Though this wall implementation uses only 144 vertices, the approach creates three internal faces-one under each of the three blocks on top. Additionally, three portions of the top face of the main-wall Cube node also are internal because they are obscured by the three blocks on top.

In an alternate implementation, the front, back, and edges of the wall are defined using indexed face sets and require 126 vertices. There are no internal faces in this implementation.

Cull backfacing polygons. A backfacing polygon is simply a polygon that faces away from the viewer. On solid shapes, such as a sphere or cube, the front of the shape always obscures the back of the shape. If the VRML browser knows a shape is solid, it can automatically skip, or cull, backfacing polygons and increase drawing speed. By default, however, the browser assumes that all shapes are not solid unless the author indicates otherwise by using the ShapeHints node.

Figure 4 shows the castle wall implementation from Figure 3, but all front faces are temporarily removed so that the back faces are clearly visible. Without backface culling enabled, the browser will draw each of these back faces, as well as the front faces. This wastes valuable drawing time. To turn on backface culling, set two fields in a ShapeHints node positioned before the shape to be drawn (see Example 1).

Use textures instead of detail. Adding lots of detail to a world increases its impact and realism, but also increases the number of vertices, pixels, and textures drawn, reducing drawing speed. For many types of detail, a texture image of that detail is nearly as good as the original, detailed shape. The texture image may require only a few vertices, while the detailed VRML shape may require dozens or hundreds. Converting detailed shapes to textures can significantly reduce the number of vertices in a world and increase drawing speed without compromising realism.

Figure 5(a) shows a control panel using 3375 vertices. While the control panel uses 3-D shapes for the lever, wheel, and buttons, the panel as a whole is essentially flat. Figure 5(b) shows a straight-on view of the control panel captured as a texture image file. Using the texture image, the visual effect is virtually the same but requires only six vertices.

Control the level of detail. As a viewer moves away from a shape in a world, the shape occupies less screen space, and its detail becomes less apparent. The drawing time spent in the Rasterize stage of the pipeline decreases as the shape gets smaller, but the time spent processing the shape's vertices remains constant in the Transform and Viewing Clip and Cull stages. To decrease the drawing time spent on a shape as it grows distant, the detail on the shape should also decrease.

VRML's LOD (Level Of Detail) node lets you create multiple implementations of the same shape, each with a different amount of detail. The browser uses the distance between the viewer and the LOD node to select which version of the shape to display. As the viewer becomes more distant, lower-detail shapes are drawn, and drawing speed increases.

Figure 6 shows three implementations of a torch for a dungeon. The left torch is intended for use when the viewer is close by. It requires 621 vertices for all detail. The center torch eliminates some of the detail, uses 363 vertices, and is intended for use at medium distances. The right torch is stripped of all but the most essential components, uses only 18 vertices, and is intended for use when the viewer is far away. When the torches are placed on either side of a dungeon doorway, they increase and decrease in detail as the viewer moves toward or away from the doorway.

Beware the use of VRML primitives. VRML's Cone, Cube, Cylinder, and Sphere nodes are convenient tools for building complex shapes quickly. Because they are generically implemented by the VRML browser, they may include more polygons and vertices than needed in a particular application. The extra polygons and vertices often end up internal to the final, complex shape and, as discussed earlier, result in wasted drawing time to process polygons that are never seen. To avoid this hidden cost of using VRML primitives, it is usually more efficient to generate your own shapes using Perl or C programs and VRML's IndexedFaceSet node. The resulting custom primitives use fewer vertices and speed up drawing.

Figure 7 shows a gadget to be found in a wizard's dungeon laboratory. The gadget uses a sphere node for a central blue globe and smaller sphere nodes for red jewels around the rim. Cylinder nodes, cube nodes, and a cone node make up the remainder of the shape. The gadget requires 6120 vertices.

In the design, the top hemisphere of the central blue globe is intended to be seen, but the lower hemisphere is internal to the shape. All polygons composing this lower half of the VRML primitive are wasted and make for slower drawing. Similarly, the lower hemispheres of the red spheres on the rim are also internal and waste drawing time.

When each of the sphere nodes is replaced with custom hemisphere shapes built using IndexedFaceSet nodes, the internal polygon problem is eliminated. As a bonus, the hemispheres build the shape more coarsely than the VRML primitives, further reducing the polygon and vertex count of the gadget. The optimized gadget requires 2232 vertices.

Beware the use of VRML text. Using a text string, the AsciiText node creates filled polygons for each character in the string. Once again, this may be convenient, but the number of vertices required can be very high. A single character, for instance, may use 30 vertices or more. Extended over a large block of text, a simple note left on a table in a virtual room can cost more to draw than the rest of the entire room. Words are usually drawn more efficiently by replacing the text with a texture image.

Figure 8 shows a scroll with text implemented using the AsciiText node. The scroll with VRML text requires 25,590 vertices! A revised scroll with texture-mapped text requires only 234 vertices.

Putting it all together

Before optimizing the dungeon room in Figure 9 (page 17), the image required 36,249 vertices. When we applied only one optimization technique to each object in the room, the room required 3567 vertices. The fully optimized room only requires 1737 vertices.

To put these optimizations in context, a typical software implementation of the graphics pipeline for a Pentium PC can draw about 90,000 triangles per second. At a 60-Hz video refresh rate, the software can draw about 1500 triangles per refresh. Before optimizing, the room contains 12,083 triangles and can be drawn at an update rate of once every 9 refreshes (about 1/7 of a second). This rate feels sluggish to the user. A partially optimized room contains only 1189 triangles and can be drawn at an update rate of once every refresh. This rate feels smooth and interactive to a user. The fully optimized room contains 579 triangles and can be drawn at an update rate faster than once every refresh. This extra time allows you the design freedom to add more scene detail and still retain a smooth, interactive experience.

Each technique can reduce the number of vertices, pixels, and textures that need to be drawn by the appropriate stages of the graphics pipeline. Avoiding the conditions that cause the graphics pipeline to become starved or blocked helps your worlds display more smoothly and efficiently, making your games more enjoyable for users.

Table 1: Stages comprising the 3-D graphics pipeline.

    Stage       Description                    Data Types

    Transform   Transform model-space          Vertices, normals,
                triangle vertices into         transformation
                eye-space vertices.            matrices

    Backface    Skip triangles facing away     Normals
    Cull        from the eye.                  Normals

    Compute     Compute vertex intensities     Normals, colors,
    Intensity   based on light-source          light sources
                locations.

    Texture     Transform texture vertices to  Texture vertices,
    Transform   texture space.                 transformation
                                               matrices

    View Cull   Skip triangles outside         Vertices, normals,
    and Clip    the viewing volume, and        colors
                clip those that intersect it.

    View        Project eye-space triangle     Vertices
    Project     vertices into screen space.

    Rasterize   Fill 2-D, screen-space         Vertices, texture
                triangles, interpolating       vertices, colors,
                colors and textures.           texture pixels, pixels

Table 2: Typical VRML performance problems.

Problem Description

Vertex-Bound World The computations involving vertices are the dominant problem. Decreasing the number of vertices
reduces the number of vertices to transform, backface check, clip, texture map, and light during the
Transform through View Project stages and speeds drawing. This problem is common.

Texture-Bound World The computations involving texture-coordinate calculation during the Texture Transform stage and
texture-pixel lookup during the Rasterize stage are the dominant problem. Reducing the number
and screen size of textured shapes speeds drawing. This problem is common.

Pixel-Bound World The computations involving filling screen pixels during the Rasterize stage are the dominant problem.
Reducing the application window size, the screen size of shapes, or the number of overlapping
shapes speeds drawing. This problem is common.

Transform-Bound World The computations involved in transform concatenation during the Transform stage are the
dominant problem. Decreasing the number of transforms used reduces the number of transform
concatenations done and speeds drawing. This problem is rare.

State-Bound World The overhead in saving and restoring state (VRML properties) is the dominant problem. Decreasing
the number of Separator nodes used reduces the number of state-save and restore operations and
speeds drawing. This problem is rare.

Light-Source-Bound World The computation of vertex intensities during the Compute Intensity stage is the dominant problem.
Reducing the number of light sources reduces the number of intensities per vertex that must be
computed and speeds drawing. This problem is rare.

Example 1: Turning on backface culling.

ShapeHints {
   vertexOrdering COUNTERCLOCKWISE
   shapeType SOLID
}