Texas Instruments

TI's TMS320C82 DSP:

Integration Marries High Performance With Low Cost

The TMS320C82 digital signal processor (DSP) from Texas Instruments combines high performance with a price of less than $100, enabling many applications to be produced in high volumes for the first time. Among these applications are desktop videoconferencing and videophones, 3-D virtual reality graphics for games, training and other uses, and high-performance digital switching in equipment such as cellular telephone base stations.

The 'C82 is the newest member of TI's TMS320C8x generation of DSPs, the industry's most highly integrated and highest performing DSPs. TI's latest DSP solution* combines on the same silicon two 32-bit advanced DSPs, a high-performance reduced instruction set computing (RISC) master processor with a 100-MFLOP floating-point unit, transfer controller with a 400-Mbytes/s off-chip transfer rate, crossbar switch and 44 Kbytes of on-chip SRAM. This high level of integration makes the 'C82 capable of performing more than 1.5 billion operations per second (BOPS), equivalent to 10 times the performance of the highest-speed general-purpose microprocessors available in the industry.

The 'C82 represents a cost/performance optimization based on TI's TMS320C80 DSP, the first device in the industry to combine DSP and RISC technology on the same silicon. Since its introduction in 1994, the 'C80 has been designed into a variety of high-performance applications, including high-definition imaging and graphics, full-room videoconferencing, high-speed digital switching, and others. With fewer on-chip processors and a simplified system interface, the 'C82 brings the performance equivalent of a 'C80 to high-volume products in which low cost and space savings are important design constraints. Although many applications will still need the additional DSPs and dedicated video controllers of the 'C80, targeted applications of the 'C82 will be able to realize 'C80-comparable performance at a much lower price.

More than 1.5 BOPS Performance

The high level of integration in the 'C82 gives it the capability of performing more than 1.5 BOPS, equivalent to the performance of six typical 16-bit DSPs. In targeted imaging and video applications, the 'C82 will be equivalent to twice this number of 16-bit DSPs. This processing power will enable compression/decompression and 3-D graphics rendering in real time for next-generation desktop videoconferencing and virtual reality products. Used as either a stand-alone processor in a settop box, or as an accelerator processor as part of a larger system, the 'C82 replaces as many as 10 processors and additional support chips in one device.

Priced for High-Volume Applications

Planned pricing for the 'C82 in 1996 is $82 in quantities of 25,000 -- a drop of more than 90 percent in price in less than three years from the original 'C80 pricing. The new low price will make possible high-quality desktop videoconferencing systems at less than $500, a price commonly believed in the PC industry to be the breakpoint for high-volume demand. Affordable games systems will have workstation processing power, allowing developers to mix video and 3-D graphics on-screen simultaneously. Makers of digital switching equipment such as cellular telephone base stations will be able to fit several times the number of channels on-board in the same space, allowing cellular providers to offer more services to their customers at competitive prices.

DSP + RISC

TI took a leap forward in technology with its 'C8x family when it combined the best features of two of the industry's leading architectures-DSP and RISC. While multiple DSPs provide the raw horsepower and mathematical processing capabilities (coded in either assembly or C code) required for multimedia and digital switching applications, the embedded RISC machine orchestrates the parallel DSP processing through its ability to control the 'C82's multiple tasks via efficient C code execution. This combination of fixed-point DSPs and floating-point RISC technologies brings workstation processing power to video settop boxes and PC subsystems for the first time at an affordable cost.

Full Programmability

Since the 'C82 is fully programmable, it provides the highly adaptive processing needed to implement a wide variety of industry-standard software algorithms for audio, video, imaging, fax/modem and telecommunication functions. The device can be programmed to implement standards such as h.320 videoconferencing (including h.261); audio (g.728), JPEG, MPEG1 and MPEG2 audio/video compression; graphic Windows™ accelerators such as X11R6; and many others. These standards are supported by TI and its third-party DSP developers. Designers are also afforded the opportunity to blend standard algorithms with their own custom algorithms to provide added value.

Parallel Processing Simplified

The 'C82, like other members of the 'C8x DSP family, addresses the challenges of parallel processing by integrating parallelism onto one monolithic silicon substrate. Integration makes parallel processing much easier to handle, since the system designer does not have to manage communications among separate processors. The 'C82's three on-chip processors can execute instructions independently of each other in multiple-instruction, multiple-data (MIMD) parallel processing. A crossbar switch matrix supports the sharing of memory among all of the processors, providing the capability to support many different parallel processing programming models.

Advanced DSPs

The 'C82's two on-chip advanced DSPs provide the parallel processing throughput required for video, 3-D graphics and digital switching applications. The processors have 64-bit large instruction words (LIWs) that support many parallel operations in a single cycle. Compared with non-'C8x DSPs, the 'C82's advanced DSPs provide a higher degree of parallel processing and are much better suited for bit-field and pixel-size data structures. Each advanced DSP is fully programmable in either assembly or high-level language and can run either independently or tightly coupled via the crossbar-shared memory. Each DSP also has four Kbytes of instruction cache and can access 12 Kbytes of on-chip shared memory via the crossbar.

"The 64-bit large instruction word specifies more parallel operations and improves performance in every application we have examined," said Karl Guttag, TI fellow and architect of the 'C8x DSP family.

Within each advanced DSP, there are four major functional units: the program flow controller (PFC), the data unit and two address units. The PFC controls all instruction execution, including program counter incrementation, branches and interrupts. It contains three sets of zero overhead loop controllers that keep track of loop counts and starting and ending loop addresses without incurring any overhead. The loop controllers are prioritized to support up to three nested loops within a common loop address.

The data unit hardware within each DSP enables one 16-by-16 bit or two 8-by-8 bit single-cycle multiplies in parallel with arithmetic logic unit (ALU) data path operations. The 32-bit ALU can be split into two 16-bit ALUs or four 8-bit ALUs to perform more parallel operations on lower-precision data. This improves performance in classical algorithms utilized in imaging applications.

The data path feeding the ALU supports multiple-bit and pixel field processing with a number of architectural features not found in most other processors. The "expander unit" can replicate one 1-bit value 32 times, two 1-bit values 16 times, or four 8-bit numbers 8 times to fill out a 32-bit word. This expansion feature is helpful in many different graphics and image processing operations.

A bit detection unit can determine the leftmost or rightmost one or bit change-a necessary function in most image and data compression algorithms. A mask generator unit takes a 5-bit number and generates a mask with the specified number of right-justified ones. The mask generator value can be used in masking and merging operations required for graphics and imaging. Each DSP also has two address units that can perform load or store operations independently of the data and PFC units. In addition, the address units can be used to perform data computations in place of memory operations.

Exclusive Crossbar Architecture

The 44 Kbytes of total on-chip SRAM within the 'C82 are partitioned into small increments to support multiple parallel independent accesses to memory via a unique crossbar switch architecture. With the crossbar switch, the master processor and the parallel processors can share all the on-chip RAM, except for the cache RAMs. If more than one processor tries to access the same RAM in the same cycle, a hardware-controlled round robin prioritization allows only one processor to have access at a time. This high-speed parallel access to the 'C82's memory increases system performance while minimizing system cost because external SRAM cache memory for each processor is not necessary.

"Crossbar shared memory is clearly the best way to support parallel processing, and integrating it on the same chip as the processors makes it practical," said Guttag.

Transfer Control

The 'C82's transfer controller (TC) is a highly intelligent direct memory access (DMA) controller that prioritizes and performs all parallel accesses between the 'C82 and off-chip memory without having to interrupt the processors. The TC enables linear and X,Y addressing for 2-D and 3-D graphics processing. In addition to other its uses, the TC is a highly efficient bit block translator (BLTer) for writing pixel data to frame memory.

The TC includes a direct interface to synchronous DRAM (SDRAM), EDO DRAM and EPROM, providing the versatility to operate with a variety of types of memory. Also included is a priority task scheduler that is complemented by a dynamic bus sizing feature that can handle from 8- to 64-bit data transfers at a rate of 400 Mbytes/s. This flexible bus bandwidth is needed to run broadband applications such as videoconferencing, but without totally dedicating it to that function.

Master Processor

As its name implies, the master processor (MP) is intended to perform the overall management of tasks run on the 'C82, as well as communicate with any other processors in a system. The MP consists of a 32-bit RISC processor and an integral IEEE-754 100 MFLOP floating-point unit, with a 4-Kbyte instruction cache and a 4-Kbyte data cache. The MP can perform up to one 64-bit data access and one 32-bit instruction access per cycle. The MP also adds some features that are unique to basic RISC design, including integral floating-point instructions and a special set of vector floating-point instructions to speed image processing, audio processing and 3-D graphics applications.

Emulation and Testability Support

A full line of DSP development tools supports the 'C82, allowing system designers to begin development before they have 'C82 samples in hand. Among these tools are debuggers, a simulator and a full-scan emulation environment. The 'C82 tools share a common user interface with existing TMS320 DSP development tools to reduce the learning curve for system designers and speed up their development times.

Approximately nine percent of the 'C82's silicon is dedicated to embedded emulator control features, which coincides with the industry's recommendation for creating a highly integrated debug environment. Emulation is accessed via the JTAG/IEEE 1149.1 boundary scan port, which is also used for internal device testing as well as board-level testing.

Process and Packaging

The 'C82 operates at 3.3 volts and is implemented using TI's 0.5- micron CMOS process. A simplified system interface in the 'C82 allows TI to offer the device in a low-cost 240-pin plastic quad flat pack (PQFP) package. PQFPs are designed for the surface mounting assembly typical of high-volume electronics manufacture.

New Possibilities

The 'C82 brings high performance and flexibility together at a price predicted to spark a revolution in high-volume applications. Multimedia developers, as well as developers of more traditional DSP-based equipment, will take advantage of the low-priced power of this chip to bring end users new electronic applications affordably.

# # #

*DSP Solution Definition: a digital signal processing solution is a combination of DSPs, mixed-signal and complementary ICs, development tools, system software and support. This allows customers to create new products faster and with lower system costs.

Trademark:
Windows is a trademark of Microsoft Corporation.


Search the Semiconductor News Release Archives

(c) Copyright 1996 Texas Instruments Incorporated. All rights reserved.
Trademarks, Important Notice!