
The truth is there is no single ideal architectural model for the "perfect" DSP. There are, rather, architectural characteristics that make a certain DSP an optimal performer within a particular application class. Longer instruction and data word lengths make sense sometimes, and deep pipelines improve performance sometimes but neither are appropriate all the time. To use an automotive metaphor, nothing beats the "architecture" of a drag racer when used in a drag race "application."
However, that kind of car will never be the best for an Indy 500 long distance race. Conversely, while Indy cars are "architected" for long distance "applications," they could never compete in situations that require short bursts of super-fast acceleration.
VLIW and other long-word architectures provide fine-grained parallelism that raises the aggregate MIPS rating, and indeed the raw performance, of a microprocessor, all else being equal. For some DSP applications, however, long words can be overkill, only providing an advantage where multiple on-chip ALUs and other functions must be supported in parallel. The main disadvantage of longer data and instruction words is system-level cost increases. At the system level, a processor must typically be supported by one bit of RAM for every bit of instruction-word length to take advantage of the full word length. Consequently, the cost of the memory portion of a DSP system rises linearly as the length of the instruction word increases.
Likewise, while 24-bit data words can be extremely valuable in applications that require higher dynamic range, if you only need 16 bits, and you're developing a consumer product that has you counting penny profit margins, those extra bits of data can be a cost burden. Data word length can have a direct effect on power consumption, the most important metric of all in many consumer-oriented portable applications, including hot growth markets like wireless communications. A DSP with a 24-bit data word length requires a 50 percent bigger ALU register file, more control logic, and a multiplier that is roughly double that of a 16-bit processor. It also requires 50 percent more on-chip memory and since die size is mostly driven by memory size, a 24-bit DSP, as a first-order approximation, consumes 50 percent more power than a 16-bit chip.
If pipelining a few instructions makes a processor execute faster, then pipelining lots of instructions should make it go even faster, right? Not necessarily. Pipelines can be like fertilizer. Just the right amount for the right plant and you've got tremendous results. Too much and you've got a dead plant. Problems can arise for DSP customers when a processor with a deep pipeline gets benchmarked using code that likes a deep pipeline. The results are great if your application uses the same class of code. However, if your code is not pipeline friendly, you may be extremely disappointed to find that your deep pipeline slows your code down!
While deep instruction pipelines are great performance enhancers for highly linear, in-line code, asynchronous, interrupt-driven branching and real-time context switching can cause severe degradation in pipeline utility, causing the need for frequent, high-overhead pipeline refreshes and state storage.
The utility of various DSP architectural innovations varies greatly
across the application domain, sometimes providing remarkable
performance enhancement, and sometimes causing disappointing performance
degradation. Because of this, it becomes crucial for system designers
to scrutinize processor specifications and topology, making certain
to match architectural features with intended use. All such features
can be truly powerful within certain application contexts, but
you have to make sure those contexts closely match those of your
application.
Excerpted from the January/February issue of DSP & Multimedia Technology.
April 1996, vol. 13, no. 3
TI Home
Search
Feedback
![]()
Semiconductor Home