Editorial Backgrounder - TI Extends Industry's Highest-Performing DSP Architecture To Offer Greater Range of Price/Performance Options

Related Documentation:
TMS320C62x Product Information

Digital Signal Processing Solutions

World's Highest Performance DSP Platform Delivers Best of Both Worlds

TI Extends Industry's Highest-Performing DSP Architecture To Offer Greater Range of Price/Performance Options

New DSP from Texas Instruments Designed to Deliver Processing Power Required by Third-Generation Wireless Base Stations

New DSP Technology from Texas Instruments Brings Power and Cost Advantages to ADSL Solutions

News Releases

Publications:
Details on Signal Processing
Integration Magazine
Mixed Signal Showcase

Trade Shows

Search the News Archives

TI Extends Industry's Highest-Performing DSP Architecture To Offer Greater Range of Price/Performance Options

Texas Instruments launched the world's most powerful digital signal processor in February 1997. The TMS320C6201 DSP brought a new design paradigm to high-performance systems. Capable of executing up to 1600 million instructions per second (MIPS), the 'C6201 offered 10 times the performance of fixed-point DSPs previously available. In addition, the newly-introduced DSP featured an advanced, Very-Long-Instruction-Word (VLIW) architecture and an ultra-efficient C compiler that shifted a portion of the design complexity from hardware to software for greater ease of programming and faster development. Since its introduction, the 'C6201 has been successful beyond all expectations, with design-ins in a wide range of telecommunications, networking, imaging and other high-performance applications.

Today, TI is extending the range of its 'C6000 DSP platform with new devices that complement the groundbreaking 'C6201. At the high end, the new TMS320C6202 pushes operation to 250 megahertz (MHz) and performance to 2000 MIPS, or 500 million multiply-accumulate operations per second (MMACS). By providing twice the performance density of the industry-leading 'C6201, the 'C6202 will enable wireless base stations, voice-over IP gateways, remote access servers and other multichannel communications systems to pack more channels and features onto existing board space.

The TMS320C6211 brings the power of the 'C6000 advanced, VLIW architecture to a $25 DSP solution, bringing new possibilities to designers of cost-sensitive equipments, which include digital subscriber loop (DSL) client units, high-speed data transmission functions in switches and routers, wireless data clients, imaging, biometrics, remote medical diagnostics, automotive vehicle and drive train control and security systems.

TI First to Use Advanced VLIW Architecture for Maximum Performance

All of the 'C62x devices are based on the same CPU core, featuring VelociTI™, an advanced, VLIW architecture designed to achieve high performance through increased instruction-level parallelism. Surpassing the throughput of traditional superscalar designs, VelociTI provides eight execution units, including two multipliers and six arithmetic logic units (ALUs). These units operate in parallel and can perform up to eight instructions during a single clock cycle -- up to 2000 MIPS at 250 MHz.

VelociTI's advanced features include instruction packing, conditional branching, variable-width instructions, and pre-fetched branching, all of which eliminate problems that were previously associated with VLIW implementations. The architecture is highly deterministic, with few restrictions on how or when instructions are fetched, executed or stored. This architectural flexibility is key to the breakthrough efficiency levels of the 'C6000 compiler.

Each of the 'C62x devices -- the 'C6201, 'C6202 and 'C6211 -- combines the advanced CPU with a selection of on-chip memory and peripherals that is tailored to meet a unique price/performance point.

New Architectural Enhancements Double System Input/Output

The 'C6202 builds on the successful design of the 'C6201 to offer twice the system input/output (I/O) and 25 percent higher performance in fixed-point millions of instructions per second (MIPS). In addition to the higher clock rate of 250 MHz, two key factors in the 'C6202 architecture are responsible for the device's superior performance.

First, with three Mbits of on-chip RAM, the 'C6202 triples the available on-chip memory of the 'C6201, the 'C6202 divides its data memory into two blocks, either of which may be used by the CPU for processing while the other is being filled with data from an external source. The 'C6202 provides two, 128-Kbyte blocks of program memory to allow background direct memory access (DMA) of code while the CPU runs full speed from the other block.

The second key architectural enhancement of the 'C6202 is its 32-bit expansion bus. Replacing the 'C6201's Host Port Interface (HPI), the expansion bus is designed to complement the primary, synchronous expanded memory interface (EMIF) by removing I/O burdens and by servicing host functions.

When it is used for data I/O, the expansion bus can be programmed to perform accesses to asynchronous I/O devices, effectively doubling the data input from memory. It can also serve as a glueless interface to high-speed synchronous FIFO memories that might be used, for example, as buffers from T1/E1 telecom inputs.

When it is used for servicing host functions, the expansion bus can operate as a 32-bit version of the 'C6201 HPI, doubling bandwidth for host-to-DSP communications. Operating in a master-slave synchronous host mode, the expansion bus can gluelessly interface at up to 60 MHz to a wide variety of hosts, including PCI bridge devices and high-speed industry-standard reduced instruction set computing (RISC) CPUs.

New Memory Organization Brings Affordability in Line With High Performance

The 'C6211 is the first DSP to use a 2-level cache architecture. This unique memory organization brings the power of the 'C62x generation to more cost-sensitive solutions. The 'C6211 cache makes external memory seem like a large amount of on-chip memory, allowing system designers to use slower, less expensive external memory for greater storage at lower cost without sacrificing performance. In addition, a cache helps programmers to achieve their performance goals faster, shortening code development and accelerating time-to-market.

The on-chip memory is organized to allow greater design flexibility and ensure efficient memory usage. Of the 72 Kbytes of on-chip memory, eight Kbytes serve as a level-1 (L1) cache that the CPU can directly access. The L1 cache is divided into four Kbytes of program and four Kbytes of data memory. The program cache is direct-mapped, so that each instruction byte occupies a unique location in the cache. The data cache is two-way set associative, so that it can hold two different sets of information with independent address ranges.

The L1 data cache, which uses a least-recently-used replacement scheme, is dual-ported for simultaneous access from both CPU data ports, so that the CPU can load or store two 32-bit values in a single L1 data cycle. The dual-port feature, which would be expensive to implement in a larger memory, is highly cost-effective in the 'C6211 because the L1 cache is small.

The remaining 64 Kbytes of on-chip memory can serve as a level-2 (L2) cache, or be directly mapped to external memory, or serve as a combination of these functions. The L2 memory is divided into four, 16-Kbyte banks, each of which can be programmed as a way of associativity that is united for both data and program storage, or as direct-mapped RAM that can lock critical pieces of data or code into internal memory.

A programmable direct memory access (DMA) controller allows designers to optimize L2 memory usage for their systems. With the capability of accessing any addressable location in the L2 memory, the enhanced DMA controller can handle multiple transfers simultaneously and can interleave bursts. Thus, the DMA controller can be writing to one or more banks while a different bank is serving the L1 cache and CPU, providing greater flexibility and significantly increased overall bandwidth for the device.

L2 Cache Provides Flexibility and Performance

The ability to map L2 blocks as addressable locations differentiates the 'C6211 cache from its general-purpose counterparts. This feature allows designers to lock critical codes and data into L2, and it provides an on-chip location for DMA storage.

TI has tested the 'C6211 to determine how it performs with an enhanced full-rate global communications system (GSM) vocoder, system-level applications in asynchronous digital subscriber loop (ADSL), key routines in multichannel modems, and other commonly used algorithms. For both data and program, TI's tests indicate L1 cache hit rates greater than 98 percent. In other words, only one instruction or data word in twenty needs to be fetched from L2 or system memory.

The high L1 hit rate, combined with the flexibility of L2 memory organization, means that the 'C6211 can operate at more than 80 percent of the cycle performance of a device with a more expensive memory organization, where all system memory is on the chip. This high degree of efficiency allows systems such as DSL client units to rely on inexpensive external memory for program and data storage, while at the same time performing high-speed number crunching routines in real time.

To achieve absolute determinism -- that is, absolutely none of the slight variances between external and internal memory contents that caches allow -- it is possible to program all of the L2 banks as memory mapped. However, most targeted applications do not require this extreme level of determinism and can benefit from using some L2 banks as cache in order to significantly reduce the L1 miss penalty. With its configurable L2 memory, the 'C6211 enables designers to achieve an appropriate optimization during development.

TI Leads Industry In Performance With 'C6000 Platform

With the release of the 'C6202 and 'C6211, the industry's most powerful DSP generation extends its leadership into new application spaces. In addition, innovative memory and peripheral enhancements give designers several price/performance options. Instruction compatibility among the devices enables easy migration of code from design to design, so that developers can make the most of their intellectual property in a range of equipments. As the industry increasingly turns to high-performance signal processing, the 'C6000 generation is ready to provide advanced DSP solutions that meet the needs of the information age.

# # #

Trademark:
VelociTI is a trademark of Texas Instruments Incorporated.