New TMS320C27x Architecture Unites the Best Features of DSPs and Microcontrollers for Real-Time Embedded Systems
Embedded computing systems are everywhere. Once employed almost exclusively for expensive capital equipment, embedded systems today are steadily bringing greater and greater intelligence to office machines, consumer electronics, automobiles and a wide variety of other everyday applications.
Although the range of embedded applications is almost boundless, a sampling includes mass storage systems such as hard disk drives (HDDs) and digital video disks (DVDs); network and computer peripherals such as printers, faxes, scanners and copiers; consumer products such as digital cameras and camcorders; communications systems such as feature phones; industrial systems such as digital motor control and robotics; and automotive systems such as drive train, active suspension and collision avoidance. These applications and others often operate in real time, where the time that is required to perform a task must correspond to events in the real world. Real-time operation demands high processing performance, and as applications become ever more sophisticated these demands increase correspondingly.
A Need for Both Performance and Control
Because embedded systems have proven to be so successful, designers are now faced with the challenge of what to do next for higher performance in central processing units (CPUs) -- especially when their designs are required to operate in real time. The number crunching that is characteristic of digital signal processors (DSPs) must be combined with the control functions performed by microcontroller units (MCUs). Cost pressures dictate that both types of functions be performed by the same processor, since the use of a single device holds down chip counts and speeds up development time for coding and debugging.
MCU suppliers have responded to the need for signal processing by adding multiply-and-accumulate (MAC) hardware to perform faster number crunching. Even with integrated MAC hardware, though, MCUs are constrained by an inadequate architecture when it comes to performing signal processing tasks. Since the most demanding part of the job is in signal processing -- not control -- the better solution is to add MCU structures to DSPs in order to create a processor that performs well at number crunching and is efficient to use for control.
The new TMS320C27x DSP architecture from Texas Instruments (TI) provides the high performance needed by real-time embedded systems, and at the same time offers the flexibility, ease of use and cost efficiency of MCUs. This combination of DSP and MCU strengths makes the 'C27x architecture a better choice for performing real-time embedded control tasks than any MCU available today. In addition, the outstanding performance of the 'C27x gives designers plenty of headroom for building tomorrow's applications today.
MCUs -- The Workhorses of Embedded Systems
Traditionally, designers have relied on MCUs as their CPUs for embedded systems because MCUs tend to be more multifunctional than other types of microprocessors. While the architectures of general-purpose microprocessors emphasize fast data processing and DSPs push processing speeds to the limit, MCUs are designed to support the handling of multiple tasks efficiently. Typical MCU features that support rapid task switching include efficient interrupt handling and the ability to read from and write to many registers quickly through specialized input/output (I/O) instructions.
Code density is another aspect of embedded system design that affects the choice of CPU type. Memory in these systems has traditionally been limited because of expense. Today, memory is less expensive than it once was, but applications are becoming more robust so that designers must still seek to compact code as much as possible in order to fit more software into program memory and leave more space for data memory. Two key elements to achieving high code density are an instruction set with task-specific instructions designed to minimize steps in performing operations and highly efficient compilers for high-level languages such as C.
Traditional complex-instruction set computing (CISC) MCU instruction sets have allowed software developers to maximize assembly code density. Recently developed MCU families rely on reduced-instruction set computer (RISC) architectures to achieve high performance through optimization for high-level languages. These languages, of which the most widely used are C and C++, are useful for writing the control tasks that often amount to the majority of code in a system. However, RISC architectures are not optimized for performing real-time tasks such as signal processing and servo functions. Such real-time operations may not require a large amount of code in comparison with that of control tasks, yet they often perform most of the processing load as measured in million instructions per second (MIPS). While MCU code is compact and efficient in performing control tasks, it is generally inefficient in performing real-time operations, resulting in MIPS wasted and increased power dissipated. Today, as more demands are being made on embedded systems, CPUs need greater performance for real-time tasks than conventional MCUs provide.
DSPs -- The High-Speed Number Crunchers
DSPs, on the other hand, are optimized for performing the high-speed number crunching that is essential to real-time tasks. DSP architectures feature separate instruction and data buses, allowing the processor to fetch multiple data operands simultaneously with a program instruction. This bus structure, combined with one-step MAC instructions and other features, makes it possible for a DSP to perform advanced math functions at extremely high speeds. MCU vendors have tried to provide DSP functionality by adding limited hardware plus MAC instructions for faster mathematical operations. But since MCUs are based on a single-bus architecture, the increase in performance is limited. For real-time operations, DSPs are essential.
Mass storage systems and digital cameras provide two good examples of real-time embedded systems that depend on DSP performance. In recent years, media density and I/O speeds have forced HDDs to migrate from MCUs to DSPs for servo control. DSPs have been an important enabler in the vast increase in overall HDD density this decade. In digital cameras, where users expect high photo resolutions, high-performance DSPs are needed to smooth the effects of the imager, balance colors, add in special visual effects and control the user interface. DSPs can also perform a high level of compression quickly in order to store more photos in memory and decrease time between snapshots. Like HDDs and digital cameras, other real-time applications also benefit from DSP performance for tasks that outrun the capabilities of MCUs.
The same specialization that makes DSPs outstanding in performance has also caused them to be perceived as inappropriate replacements for multifunction MCUs. DSP assembly code is considered more difficult to write than MCU assembly, since the instructions are not optimized for control functions. Often more steps were required to perform register I/O operations, so that DSP assembly code required more memory space than comparable MCU assembly code. DSP C compilers have not been as efficient as those of RISC MCUs, since they must compile into an assembly instruction set that is relatively inefficient for control operations.
It is important to remember that "inefficient" in this context relates to memory space and expense, not performance; a DSP is usually able to perform a multi-step operation in less time than an MCU needs to perform an equivalent single-step operation. Nonetheless, there has not been a DSP that provides the code efficiency of an MCU -- until now.
TMS320C27x -- Uniting the Strengths of DSPs and MCUs
As the performance requirements of real-time embedded systems increase, designers clearly need DSPs that provide the flexibility and ease-of-use of MCUs. The 'C27x DSP architecture from TI fulfills this need, bringing together the advantages of both DSPs and MCUs for future-ready embedded designs. Using 0.25-micron CMOS process technology, the 'C27x architecture is capable of operating at speeds over 100 MHz. The resulting level of performance -- five to ten times the speed of competitive MCUs -- allows a single 'C27x DSP to replace several components in a system. For example, a single 'C27x could perform control functions in a copier like paper handling, scanning, printing and the user interface that are currently distributed among several lower performance processors.
The 'C27x architecture is designed to handle both DSP and MCU operations efficiently for more flexible, cost-effective, easy-to-use embedded systems. Highly compact native assembly and compiled C code minimize program memory requirements. Low power requirements per MIPS result from a small die size, efficient design and high system-level integration including memory and peripherals. A simplified instruction syntax is more consistent with the code MCU programmers are accustomed to using, minimizing the learning curve required and reducing time to market. Advanced development support includes in-circuit emulation with a real-time window into processor operations for simplified debug and testing. In addition, a Windows -based integrated development environment (IDE) provides a familiar look and feel to MCU programmers.
A DSP Architecture with MCU-Like Instructions
The 'C27x architecture is highly unusual among DSPs, and even among RISC architectures, in its support for traditional CISC MCU-like functions such as memory-to-memory operations, along with efficient byte packing/unpacking and I/O operations. One example of how the 'C27x is optimized for control is the single instruction step that directs the processor to read data from a memory, modify it and write it back to the memory. By comparison, virtually all DSPs, and even RISC-based MCUs, require three steps to perform this I/O operation, which is commonly used in embedded systems. A unique feature of the 'C27x that greatly enhances its flexibility is that these read-modify-write instructions can operate on any location in memory instead of only on registers.
With the inclusion of such MCU-like I/O instructions, the 'C27x code required for control tasks is vastly reduced from that of other DSP architectures. Reduced code means that less program memory is needed, that more tasks can be performed using the same amount of memory or that the system requires slower, less expensive memory. 'C27x code is also more consistent with what MCU programmers are accustomed to using, helping to minimize learning and development times, simplifying code maintenance and reducing the likelihood of programming errors.
Ease of C Programming Without Code Penalty
The 'C27x architecture also features an efficient C compiler that can produce compiled code as dense as or denser than that of MCUs. By way of comparison, 'C27x compiled code has been demonstrated to be as much as 45% more dense than compiled code for widely used MCUs such as MC68HC16, 80196 and ARM7TDMI. Approaching 'C27x native assembly code in density, compiled code for the architecture requires roughly half as much memory as equivalent compiled code for earlier DSPs. For developers, this gain in density can double programming productivity, as well as saving half the cost of program memory in the system.
The efficiency of the compiler also supports the desire of many programmers to use C in embedded systems, since C code is easier to write, more readily ported to new designs and more maintainable than assembly. The advantages of writing C routines for the 'C27x will also support the development of third-party plug-in code modules for common tasks or industry standards such as SCSI, IEEE 1394, adaptive control, voice recognition, modems and fax. The availability of these modules from independent sources will help save development time for system manufacturers. Support tools allow programmers to mix C and 'C27x assembly, helping them to optimize high-performance routines while writing system-level code quickly.
Another MCU-like feature included in the 'C27x architecture is its efficiency in handling interrupts. A full context switch, allowing an application to quickly move from a control task to a signal processing routine, is performed in 160ns, an order of magnitude less than that for popular MCUs. The 'C27x core automatically saves critical registers onto the software stack upon recognition of an interrupt, reducing code requirements, freeing the user to concentrate on writing code for the interrupt service routines and speeding transitions to different system applications. Single-step operations like the read/modify/write instruction also reduce the risk of data being corrupted by interrupts, leading to greater system reliability during operation.
An Architecture Optimized for Real-Time Systems
T
he 'C27x architecture is optimized to provide the best ratio of price, performance and power in real-time embedded applications. The design is based on the premise that for a given task, the smallest code size requires the fewest cycles, helping to minimize clock speeds and power consumption and allowing designers to use slower, cheaper memories or more bandwidth for additional tasks. In other words, the architecture's highly compressed code maximizes the quality of MIPS for applications, achieving more work from less memory or the same work from slower memories.
In designing the 'C27x architecture, TI leveraged the expertise it has gained as a leading supplier of DSP solutions for HDDs, expertise that extends directly to DVDs and removable magnetic storage. In these and other mass storage applications, the 'C27x's MAC and I/O operations are balanced between the heavily used servo and interface functions. The architecture has similar advantages in other embedded applications where MCUs have difficulty performing the high-speed number crunching needed. Examples include digital motor control, automotive systems, robotic and industrial control, smart cards, feature phones, printers, faxes, scanners, digital cameras and camcorders, pagers, personal digital assistants (PDAs) and many other applications, including future high-performance systems employing distributed network intelligence.
To simplify designing 'C27x devices for a wide range of embedded systems, the architecture's core bus can interface directly to external buses, allowing the customization of signals and timing to application needs. For instance, devices using the 'C27x core can be designed to interface directly with any PC peripheral, a feature that will support mass storage systems well. Additionally, the high-performance architecture can handle the processing protocol required for the 400- and 800-megabits per second (Mbps) IEEE 1394 high-speed serial bus standards that are increasingly used for digital consumer products.
A Development Environment that Speeds Products to Market
Comprehensive development support for the 'C27x architecture is designed to rival and in some cases surpass the high level of support that designers are accustomed to having with MCUs.
A key advantage to developers is Real-Time Data Exchange (RTDX), TI's innovative analysis technology that allows developers to monitor, analyze and modify code executing at full 100-MHz speed without impacting results or halting applications. RTDX is incorporated in the 'C27x along with a JTAG-based visibility port. As an example of how this on-chip capability can speed development, a motion control system designer can modify registers and instructions, stream key data variables or set and execute hardware breakpoints without impacting operation of the motor.
In addition, RTDX allows designers of 'C27x-based systems to isolate software bugs quickly and produce higher quality code in a shorter time. The ability to perform non-intrusive, real-time debug at 100 MHz is a first in the industry, demonstrating TI's commitment to providing the most advanced development environment available.
The 'C27x takes RTDX technology to new heights by significantly increasing the bandwidth of data transfer between the host and DSP. The 'C27x can forward more than 300 kilobytes per second to data visualization tools, allowing sophisticated real-time display of the system operation. The visibility port also supports at-speed testing during product manufacture, a feature especially useful for real-time embedded control.
In addition to the C compiler and emulation capability, available tools include an instruction set simulator, debugger, evaluation modules and simulation models for ASIC design. A Windows-based integrated development environment (IDE) integrates editing, compiling and debug functions in a manner similar to Visual C++ from Microsoft or Borland. The 'C27x is also supported by TI's extensive third-party network with a wide range of software and hardware tools from both the DSP and the MCU industries.
Ready for the Future and Available Today
The TMS320C27x DSP core is available today running at 75 MHz using a 0.35-micron process technology. Core versions designed using a 0.25-micron CMOS process operating at speeds of 100 MHz are scheduled to be available in the second quarter of 1998. In 1999, with a roadmap for migration to TI's 0.18-micron TImeline technology, the 'C27x will reach 150 MIPS. The core is optimized for 3.3-V low-power operation and is designed to draw very little power per MIPS. Fully static registers and a low-power mode help increase efficiency, and a small 3 mm˛ core (in 0.25-micron technology) helps keep design sizes to a minimum. The core can be used in customizable DSP (cDSP) designs with TI's library of ASIC functions, including random access memory (RAM), read only memory (ROM), flash, embedded DRAM (EDRAM) and standard peripherals. Built-in self-test (BIST) is provided in the core and memory for greater reliability.
Code written for TI's TMS320C2xx DSPs can be translated to 'C27x code, providing an upgrade path for TI's existing base of embedded DSP customers. TI will provide a translation tool to facilitate migrating to this next generation platform. Standard products based on this new core architecture are scheduled for availability in the first quarter of 1999.
As real-time embedded applications become increasingly sophisticated, they require ever-greater levels of performance from CPUs. MCUs, for all their advantages, are beginning to lag the performance demands of real-time operation. TI's 'C27x DSP architecture marries the flexibility, ease-of-use and cost-efficiency of MCUs together with the high-performance of DSPs. The ’C27x architecture gives designers the performance headroom they need to create tomorrow’s real-time embedded systems today.