DSP architecture: MIPS are only part of the equation

By Panos Papamichalis, TI Fellow and manager of Digital Communications Technology at the TI DSP Research and Development Center

It seems inevitable that every announcement of a new DSP or other microprocessor must be accompanied by confusing specsmanship. Unfortunately, even when the numbers are valid, potential users can find themselves scrambling to figure out exactly where the new device fits along various measures of application performance. While DSP vendors have (quite rightfully) taken to characterizing processors in mixed terms such as price/performance, performance/power, performance/area, and so on, the least common denominator still seems to be the MIPS (million instructions per second) rating. Measuring MIPS remains somewhat non-standard and subjectively application dependent, leaving many users uncertain as to just how level the playing field is when it comes to comparing various processors.

The truth is there is no single ideal architectural model for the "perfect" DSP. There are, rather, architectural characteristics that make a certain DSP an optimal performer within a particular application class. Longer instruction and data word lengths make sense sometimes, and deep pipelines improve performance sometimes but neither are appropriate all the time. To use an automotive metaphor, nothing beats the "architecture" of a drag racer when used in a drag race "application."

However, that kind of car will never be the best for an Indy 500 long distance race. Conversely, while Indy cars are "architected" for long distance "applications," they could never compete in situations that require short bursts of super-fast acceleration.

Longer words can be overkill

VLIW and other long-word architectures provide fine-grained parallelism that raises the aggregate MIPS rating, and indeed the raw performance, of a microprocessor, all else being equal. For some DSP applications, however, long words can be overkill, only providing an advantage where multiple on-chip ALUs and other functions must be supported in parallel. The main disadvantage of longer data and instruction words is system-level cost increases. At the system level, a processor must typically be supported by one bit of RAM for every bit of instruction-word length to take advantage of the full word length. Consequently, the cost of the memory portion of a DSP system rises linearly as the length of the instruction word increases.

Likewise, while 24-bit data words can be extremely valuable in applications that require higher dynamic range, if you only need 16 bits, and you're developing a consumer product that has you counting penny profit margins, those extra bits of data can be a cost burden. Data word length can have a direct effect on power consumption, the most important metric of all in many consumer-oriented portable applications, including hot growth markets like wireless communications. A DSP with a 24-bit data word length requires a 50 percent bigger ALU register file, more control logic, and a multiplier that is roughly double that of a 16-bit processor. It also requires 50 percent more on-chip memory and since die size is mostly driven by memory size, a 24-bit DSP, as a first-order approximation, consumes 50 percent more power than a 16-bit chip.

Interrupts cause pipeline degradation

If pipelining a few instructions makes a processor execute faster, then pipelining lots of instructions should make it go even faster, right? Not necessarily. Pipelines can be like fertilizer. Just the right amount for the right plant and you've got tremendous results. Too much and you've got a dead plant. Problems can arise for DSP customers when a processor with a deep pipeline gets benchmarked using code that likes a deep pipeline. The results are great if your application uses the same class of code. However, if your code is not pipeline friendly, you may be extremely disappointed to find that your deep pipeline slows your code down!

While deep instruction pipelines are great performance enhancers for highly linear, in-line code, asynchronous, interrupt-driven branching and real-time context switching can cause severe degradation in pipeline utility, causing the need for frequent, high-overhead pipeline refreshes and state storage.

Consider the application

The utility of various DSP architectural innovations varies greatly across the application domain, sometimes providing remarkable performance enhancement, and sometimes causing disappointing performance degradation. Because of this, it becomes crucial for system designers to scrutinize processor specifications and topology, making certain to match architectural features with intended use. All such features can be truly powerful within certain application contexts, but you have to make sure those contexts closely match those of your application.

Excerpted from the January/February issue of DSP & Multimedia Technology.

April 1996, vol. 13, no. 3

Return to Table of Contents

 TI Home     Search     Feedback      Semiconductor Home