DEMYSTIFYING 16-BIT VGA

There's more to VGA than meets the eye

Michael Abrash

Michael works on high-performance graphics software at Metagraphics in Scotts Valley, Calif. He is also the author of Zen Assembly Language published by Scott, Foresman & Co., and Power Graphics Programming, from Que.


A year or two ago, a friend in the industry made the mistake of mentioning to a headhunter (excuse me, an employment recruiter) that he was interested in hiring someone with object-oriented programming skills, which were rare at the time. From that moment forth, every resume that came through that particular agency boasted of object-oriented programming experience. My friend would ask each candidate if he or she had any experience with object-oriented programming, and each one would answer yes. Then my friend would ask exactly what object-oriented programming is. Not a one of them had a clue.

Which brings us, in a slightly round-about way, to 16-bit VGA.

What 16-Bit VGA Really Is

16-bit VGA. Those seductive words, promising what every PC user craves -- performance -- are everywhere. "16-bit VGA," ads shout. "16-bit VGA: Does It Matter?" articles and reviews ask. "Do I need a 16-bit VGA?" every power user wonders, and, "Can I afford one?"

It seems as if 16-bit VGAs have been with us forever, or at least since IBM came down from the mount with the PC, but in fact they're a relatively new development, dating back only a year or so. (IBM's Display Adapter, the PC-bus version of the VGA, was -- and still is -- an 8-bit adapter.) As such, they're a lot like object-oriented programming was when my friend was looking to hire a programmer: Widely used as a buzzword, claimed by many, and not particularly well understood by the reviewers who write about them, the users who buy them, or the developers who must deal with them. Most significantly, 16-bit VGA isn't a standard, but rather a catchall name for a variety of VGA enhancements. Any VGA with a 16-bit bus interface (that is, with two connectors that plug into the AT bus) is bound to be advertised as a 16-bit VGA, but all 16-bit VGAs are not created equal, and the value of a given 16-bit VGA varies greatly depending both on the sorts of 16-bit operations it offers and on how you use it.

DDJ's readers, who are both developers and users (and sometimes reviewers as well) would surely benefit from a solid understanding of what 16-bit VGA really means and what benefits the various types of 16-bit VGA offer. It's from that perspective that I'll attempt to clear up some misconceptions and confusion about 16-bit VGA in this article; most importantly, we'll see why (happily and contrary to some reports) programmers need not treat 8- and 16-bit VGAs differently.

Performance

Let's begin by placing the one and only reason for the existence of 16-bit VGAs -- performance -- in context. 16-bit VGAs can allow screen-oriented programs to run faster than other VGAs, but it's not really correct to say that they run those programs faster; more accurately, 16-bit VGAs slow programs down less.

What's the distinction? More powerful graphics adapters, such as the 8514/A and adapters built around the TI 34010 graphics chip, have dedicated processors and specialized hardware that allow them to offload work the CPU would otherwise have to do and to perform that work very rapidly, so they really can run screen-oriented programs faster than if the CPU were required to do all the work itself.

In contrast, the best a VGA can do is get in the processor's way less. You see, all VGA-based graphics operations are performed directly by the CPU -- the 8088, 80286, 80386, or whatever processor happens to be in a given PC. The VGA has only a bit of hardware assist on board, and has no independent processing ability at all. The VGA is basically a set of I/O ports and a memory map to be manipulated directly, and at a very low level, by the CPU. Given that, the only way a VGA can contribute to improved performance is by not slowing the CPU, that is, by allowing the CPU rapid access to I/O ports and memory. Ideally, the CPU would be able to make every access to VGA memory as rapidly as to system memory, and likewise for I/O ports.

That's the ideal, but it's far from the reality. To understand why, we must first understand how the AT bus handles 8-bit adapters. That discussion has two facets: The splitting up of 16-bit accesses to 8-bit adapters, and the automatic slowing down of all accesses to 8-bit adapters. Before we can cover those topics, however, we must talk about wait states.

Wait States in the AT

Wait states are cycles during which the CPU does nothing because the bus or some memory or I/O device tells it to wait. Put another way, they're states that are thrown away by the CPU at the request of external circuitry. While wait states aren't desirable because they reduce performance, they're necessary because they allow slower memory and I/O devices to function properly with a fast CPU. For example, the 80286 is capable of performing a memory or I/O access in just two cycles. However, the bus inserts one-wait-state on each access to most 16-bit devices, including system memory, in a standard AT, as shown in Figure 1. This increases access timze to three cycles, reducing overall performance but allowing the use of slower, cheaper chips.

Wait states are also inserted by an adapter whenever the adapter can't respond at the maximum speed of the bus or processor. As we'll see, some VGAs insert additional wait states, while others avoid additional wait states at least some of the time.

16-Bit Accesses to 8-Bit Adapters

There are two fundamental classes of adapters that may be plugged into the AT bus: 8-bit adapters and 16-bit adapters. The two are distinguished by the extra bus connector that appears only on 16-bit adapters; in addition, 16-bit adapters must announce to the bus that they are indeed capable of handling 16-bit accesses, by raising a particular bus line on the 16-bit connector early on during each access.

What happens if an adapter doesn't have the 16-bit connector, or if it doesn't announce that it's a 16-bit device? Why, then the AT's bus does two things. First, the bus splits each word-sized access to that adapter into 2-byte-sized accesses, sending the adapter first 1-byte and then the other. That's not all the bus does, though: During each of those byte-sized accesses to an 8-bit adapter, the AT bus inserts three extra wait states (in addition to the one-wait state that's routinely inserted), effectively doubling the access time per byte of such adapters to six cycles, as shown in Figure 2. These extra wait states, which I'll refer to as 8-bit-device wait states, form a pivotal and little-understood element of 16-bit VGA. Together with the splitting of word-sized accesses into 2-byte-sized accesses, 8-bit-device wait states can quadruple the access time per word of 8-bit adapters; instead of accessing one word every three cycles, as is possible with 16-bit adapters, the AT can access only 1-byte every six cycles when working with 8-bit adapters.

Three extra wait states are inserted on accesses to 8-bit adapters because the first 8-bit adapters were designed for the PC's 4.77-MHz bus, not the AT's 8-MHz bus. In order to ensure that PC adapters worked reliably in ATs, the designers of the AT decided to slow accesses to 8-bit adapters to PC speeds by inserting wait states to double the access time. Modern adapters, such as the VGA, can easily be designed to run at AT speeds or faster, whether they're 8- or 16-bit devices -- but the AT bus has no way of knowing this, and insists on slowing them down -- just in case. It should be obvious that true 16-bit operation, where an adapter responds as a 16-bit device and handles a word at a time, is most desirable. Not at all obvious is that it's also desirable, that an adapter respond as a 16-bit device even if it can internally handle only a byte at a time. In this mode, an inherently 8-bit adapter announces to the bus that it's a 16-bit device; on writes, it accepts a word from the bus and then performs two 8-bit writes internally, and on reads, it performs two 8-bit reads internally and then sends a word to the bus. From the perspective of the bus, each word-sized operation seems to be a 16-bit operation to a true 16-bit adapter, but in truth two accesses are performed, so the operation takes twice as long as if the adapter were a 16-bit device internally.

Why bother? The advantage of having an 8-bit adapter respond as if it were a 16-bit adapter is this: The bus is fooled into thinking the adapter is a 16-bit device, so it doesn't assume that the adapter must run at PC speeds and doesn't insert three extra wait states per byte. From now on, I'll use the word "emulated" to describe the mode of operation in which an adapter that's internally an 8-bit device responds as a 16-bit adapter; this mode contrasts with the true 16-bit operation offered by adapters that not only respond as 16-bit devices but are 16-bit devices internally. AT plug-in memory adapters, for example, are true 16-bit adapters. 16-bit VGAs, on the other hand, may be either true or emulated 16-bit adapters; in fact, as we'll see, a single VGA may operate as either one, depending on the mode it's in.

Emulated 16-bit operation is at heart nothing more than a means of announcing to the AT bus that an inherently 8-bit adapter can run at AT speeds thereby making the three 8-bit-device wait states vanish. While emulated 16-bit adapters can run up to twice as slowly as true 16-bit adapters (word-sized accesses must still be performed a byte at a time), emulated 16-bit operations can double the performance of an inherently 8-bit adapter that is otherwise capable of responding instantly, by cutting access time from six to three cycles.

8-Bit and 16-Bit Adapters Don't Mix

If there is one 8-bit display adapter in an AT, all display adapters in that AT must be 8-bit devices. Consequently, all 16-bit VGAs automatically convert to 8-bit operation if an 8-bit adapter is present. If you put a monochrome adapter in your AT along with your expensive 16-bit VGA, what you'll get is 8-bit VGA performance.

Why this happens is a function of the addressing information available to an adapter at the time it has to announce it is a 16-bit device; I lack both the expertise and the space to explain it in detail. The phenomenon does exist, however, and the conclusion is simple: Don't bother getting a 16-bit VGA if you're going to put an 8-bit display adapter in your system as well.

Wait States in Other AT-Bus Computers

All AT-bus 80386-based computers slow down both 8- and 16-bit adapters considerably. (Obviously, 16-bit VGAs are wasted in 8-bit PCs, in which they operate as 8-bit devices.) AT-bus 80386 computers insert wait states -- often a great many wait states -- on accesses to 16-bit devices in order to slow the bus down to approximately the 375 nanoseconds excess time speed of the AT bus, so that AT plug-in adapters will work reliably. A 33-MHz 80386 is capable of accessing memory once every 60 nanoseconds (two cycles); ten wait states must be inserted to slow accesses down to about the 375-nanosecond access time of a standard AT. Clearly, memory on 16-bit plug-in adapters responds considerably more slowly than 32-bit memory in 80386 computers; the 80386 in the above example is idle more than 80 percent of the time when accessing plug-in 16-bit memory. Because of this, you can expect to see VGAs built onto the motherboards of most high-performance computers in the future, thereby completely bypassing the many wait states inserted by the AT bus.

In many 80386 computers, 8-bit adapters are worse still. A number of 80386 motherboards slow accesses to 8-bit adapters down to about the PC's bus speed of 838 nanoseconds per access, which could mean as many as about 25 wait states in the above example. However, a number of 80386 computers slow both 8- and 16-bit adapters down to AT speeds; in these computers, the performance distinction between 8- and 16-bit adapters vanishes.

What of Micro Channel computers? They don't distinguish between 8- and 16-bit devices -- but that's a moot point, because Micro Channel computers have VGAs built right onto the motherboard. For the remainder of this article, I'll talk only about AT-bus VGAs.

To summarize, byte-sized accesses to an 8-bit adapter take twice as long on ATs and many 80386 computers as accesses to a 16-bit adapter if both adapters are otherwise capable of responding instantly; word-sized accesses take twice as long again on 8-bit adapters. Most VGAs aren't capable of responding instantly, though, and in memory and I/O response time lies another part of the 16-bit VGA performance tale.

VGA Memory

Before we can look at the response time of VGA memory, we must clarify exactly what sort of VGA memory we're talking about. For practical purposes, there are three types of VGA memory: ROM, text-mode memory, and graphics-mode memory. A 16-bit VGA might actually provide 16-bit access in one, two, or all three areas, and 16-bit access might be either true or emulated for ROM and text-mode memory. In addition, 16-bit access provides different benefits in each area. Next, we'll look at 16-bit operation in each VGA memory area, and 16-bit I/O, as well.

ROM

It's easy to provide true 16-bit access to a VGA's ROM, and most 16-bit VGAs do so. If a 16-bit VGA has two ROM chips, it's a pretty safe bet that it offers true 16-bit ROM operation, which in turn translates into a performance improvement of close to four times for VGA ROM code. Just what does that massive speedup do for us?

The VGA's ROM contains an extended video BIOS that's responsible for the text placed on the screen by DOS and BIOS functions, a category that includes the DOS prompt, directory listings, and text drawn with printf and writeln, but not the test drawn by virtually any major word processor, text editor, or other program that offers a full-screen interface. The VGA's BIOS also provides functions for drawing and reading dots in graphics mode at relatively low speeds; again, most graphics programs ignore these functions and access the video hardware directly. Finally, the VGA's BIOS supports miscellaneous functions such as setting the color palette and returning configuration information.

Consequently, the primary benefit of 16-bit ROM access is speeding up directory listings, the TYPE command, program output sent to the standard output device, and the like, which can make the computer feel sprightlier and more responsive. On the other hand, most manufacturers provide RAM-loadable BIOSes, which are generally faster than even 16-bit ROM BIOSes, especially in fast 80386 computers, because they run from system memory. Also, many computers can copy the VGA's BIOS into shadow RAM, non-DOS system memory reserved especially for the purpose of replacing ROM with fast RAM.

On balance, 16-bit access to the VGA's ROM BIOS is nice to have when a RAM or shadow RAM BIOS is not in use, because it speeds up certain common operations. 16-bit ROM access does not, however, affect the performance of programs that do direct screen output, including most commercial PC software.

A Brief Aside on Benchmarks

It's not easy to benchmark VGAs in ways that correspond to meaningful user-performance improvements. A number of programs used to test VGAs are actually BIOS tests, because it's easy to exercise the various BIOS functions; alas, that falls far short of fully exercising the VGA standard, for many programs ignore the BIOS altogether and access the VGA's hardware directly. Other benchmarks just measure raw memory access speed; they measure ideal conditions that are rarely achieved in the real world. Raw memory access speed doesn't necessarily map well to the sorts of operations -- bitblts, line draws, scrolling, fills, and the like -- that performance-sensitive screen-oriented programs actually perform.

There are two types of meaningful benchmarks. Benchmarks that measure actual programs doing useful, time-consuming work (redrawing screens in Autocad or scrolling through a document in a Windows-based word processor, for example) are certainly relevant. Better yet are benchmarks you perform yourself with the VGA software you plan to use. Performance is not absolute: It is a relative measure that is meaningful only in context. What good will it do you to buy the VGA that runs Autocad fastest if you do all your drawing work in Fastcad? There's no guarantee that a VGA that performs well with one program will perform well with another, if for no other reason than that the performance of driver-based programs varies as much with driver quality as with hardware speed, and each VGA manufacturer provides its own set of drivers.

Use your own software to test drive any VGA you're considering buying. You'll be glad you did.

Text-Mode Memory

The second type of VGA memory is text-mode memory. Some VGAs provide true 16-bit access to text-mode memory, while others provide emulated 16-bit access. True 16-bit VGA is clearly the superior of the two in this context; word-sized writes actually happen quite often in text mode, as both a character and its attribute are frequently written to memory with a single instruction. Emulated 16-bit access alone may or may not improve performance, depending on whether or not a given VGA can respond fast enough to take advantage of the three cycles per byte-sized access that 16-bit emulation saves. The end result is that emulated 16-bit text-mode memory can as much as double the speed of access to text-mode memory, while true 16-bit text-mode memory improves performance by up to four times over 8-bit VGA.

A two- to four-fold increase in text-mode memory access certainly makes for snappy response for text-mode programs that go directly to display memory, including most spreadsheets and word processors. However, because relatively few bytes of display memory control an entire screen, display memory access speed is rarely the primary limiting factor in the performance of text-mode programs, so the perceptible effect of 16-bit text-mode VGA is not as dramatic as the numbers might suggest. All in all, 16-bit access to text-mode memory, such as 16-bit access to the BIOS ROM, can make for an enjoyable, if not stunning, increase in the responsiveness of certain programs.

Graphics-Mode Memory

That leaves us with just one VGA memory area to check out graphics-mode memory. While this is the area in which 16-bit VGA can do the least to increase memory access speed, because true 16-bit accesses can't be supported within the VGA standard, it is ironically also the area that brings out the best in 16-bit VGA -- given the right circumstances.

As any VGA user knows, it's in graphics mode that the VGA feels slowest; that's becoming all the more apparent with the rise of graphical interfaces such as Windows and Presentation Manager. Making a non-VGA- compatible display adapter that supports faster graphics than a VGA is not difficult at all; the aforementioned 8514/A and 34010-based adapters fit that description, for example. The trick is to improve graphics performance without losing VGA compatibility, so that the improved performance automatically benefits every one of the hundreds of programs that support the IBM VGA.

Display Memory Access Speed in Graphics Mode

As I mentioned earlier, the best a VGA can do is get out of the way of the CPU to the greatest possible extent. To see why this is most important in graphics mode, let's look at a few numbers. The VGA has up to 152K of graphics memory per screen, and needs to scan about nine million bytes of video data onto the screen every second. That alone takes close to 80 percent of all available memory accesses on a standard VGA. VGA memory must also be accessed many times by the CPU in order to draw any sizable image, because there are so many pixels on the screen, in so many colors; however, those accesses must be shoehorned between the video data reads described earlier. (For further information about the conflicting demands on display memory, see my article "Display Adapter Bottleneck," PC Tech Journal, January 1987.)

How to resolve these heavy dual demands on display memory? One choice is to give priority to the video data and make the CPU wait frequently. This is simple and inexpensive to implement; the only drawback is that CPU performance suffers.

There are a variety of other approaches to VGA graphics-mode memory design, all of which improve performance to some degree. Some VGAs use faster memory than IBM's VGA does, freeing up more display memory accesses for the CPU. Other VGAs use different memory architectures, such as paged-mode or video RAM, that reduce the overhead of supplying video data and allow the VGA to service the CPU faster. There are a number of other performance-enhancing techniques in use; the point is that there are a variety of means by which fully IBM-compatible VGAs can insert fewer wait states and slow the CPU less, improving overall graphics performance.

Interesting, but what does it have to do with 16-bit VGA? Simply this: Only emulated 16-bit VGA can be implemented in graphics mode; true 16-bit VGA is a physical impossibility, as we'll see shortly. Emulated 16-bit VGA matters only because it eliminates wait states; three wait states per byte-sized access to display memory on an AT and often more in an 80386-based machine. If a VGA inserts more than three wait states per access anyway, because the memory is inherently slow or because the CPU must wait while video data is fetched, then emulated 16-bit VGA won't make a blessed bit of difference. If, on the other hand, a VGA is inherently capable of responding as quickly as normal AT system memory (as is theoretically the case with a VGA built around 120-nanoseconds VRAM), then the 8-bit-device wait states spell the difference between a VGA that responds in three cycles and one that responds in six cycles.

In a nutshell, the faster a VGA's memory architecture, the more the 16-bit interface matters. The 16-bit interface allows VGAs with inherently fast memory access times to respond up to twice as fast as they otherwise would, slowing the CPU less and allowing higher graphics performance overall.

Of course, not all 16-bit VGAs are twice as fast as 8-bit VGAs; for instance, VGAs that provide slow memory access won't benefit from the 16-bit interface at all. In addition, the overall performance improvement experienced by graphics software on even the fastest 16-bit VGA depends on the frequency with which that software accesses display memory. Plotting software that performs several floating-point calculations for each point drawn is not going to be measurably affected by VGA speed, while software that spends most of its time copying blocks of display memory around (scrolling or updating large areas of the screen, for example) may indeed run nearly twice as fast on a 16-bit VGA as on an equivalent 8-bit VGA, and the advantage over slower VGAs may be greater still. Drivers weigh heavily into the performance equation as well, as noted above.

The Myth of 16-Bit Operations

There's a myth that 16-bit VGA improves graphics performance only when 16-bit accesses to memory are used; on the basis of this myth, many people have concluded that because graphics software written for the IBM VGA generally performs 8-bit operations, it won't benefit from 16-bit VGA. Not true. As we've just seen, in graphics mode the great virtue of 16-bit VGA has nothing to do with 16-bit operations; rather, it is that the 16-bit interface serves to fool the AT bus into not inserting 8-bit-device wait states.

In fact, 16-bit VGAs must operate as 8-bit devices internally in graphics mode, offering emulated but not true 16-bit operation. This is unavoidable because the VGA architecture is an 8-bit architecture, with 8-bit internal latches, 8-bit data masks, and so on. While VGA designers could certainly create 16-bit latches and the like, standard VGA software, which expects the normal 8-bit setup, wouldn't work properly anymore -- and running standard VGA software faster is the object of 16-bit VGA. Consequently, 16-bit VGAs actually break each 16-bit access into two 8-bit accesses internally, just as the AT bus does during 16-bit accesses to 8-bit devices, but without the three-wait states the AT bus inserts. (As I noted earlier, some VGAs really do support single 16-bit accesses in text mode; this is possible because in text mode display memory appears to the CPU to be a single, linear plane of memory, and none of the VGA's 8-bit hardware assist features come into play.)

What does all this mean? It means that programmers need not worry about altering or fine-tuning graphics code for 16-bit VGAs; standard VGA code will run fine (but faster), and 16-bit operations are no more desirable on 16-bit VGAs than on 8-bit VGAs. Hallelujah!

I/O Access Speed

Finally, we come to the last aspect of 16-bit VGA performance: I/O. I/O to the VGA ports is performed frequently in graphics mode in order to set the bit mask, the map mask, the set/reset color, and so on. These I/O accesses are subject to the same 8-bit device wait states as memory accesses, so it's desirable that VGAs respond as 16-bit devices to I/O as well as memory accesses. I/O is less critical in text mode, where it is used primarily to move the cursor, but 16-bit I/O can help there, too.

As it happens, not all 16-bit VGAs do support 16-bit I/O, so this is yet another area in which 16-bit VGAs can differ widely, and yet another feature for a VGA purchaser to check out.

Conclusion

What's the bottom line on 16-bit VGA? First and most important, 16-bit VGA is a user issue, not a programming issue; developers need not spend time worrying about separate drivers or code optimized for 16-bit VGAs. 16-bit VGAs may have extended modes or special features that require or benefit from custom code, but that has nothing to do with 16-bit VGA itself, which is primarily a way to trick the AT bus into not inserting wait states, and sometimes a way to provide true 16-bit access to ROM and text-mode memory, as well.

Second, while 16-bit VGA can make text-mode operation more responsive, it produces the most visible and sorely needed improvement in graphics mode, but only for VGAs that provide memory-access times close to that of system memory. In those cases, however, 16-bit VGA can provide an appreciable performance boost, as much as doubling the execution speed of graphics software over 8-bit VGA, although the improvement depends heavily on the frequency with which the software accesses display memory and the VGA's I/O ports.

In summary, the speedup from 16-bit VGA is incremental, not revolutionary, but is significant nonetheless. The VGA is the last gasp of directly CPU-controlled, bit-mapped graphics, and 16-bit VGA squeezes the last ounce of performance from that old standard. Whether you need that extra edge depends on the software you use, but at least now you understand the many facets of 16-bit VGA and can better match your needs to the features of the many 16-bit VGAs on the market.