DSP Hotline TechBits: TMS320C40 Silicon Update (rev 1.x and 2.x)

                                                                        RMP
                      TMS320C40 Silicon Errors
                       Rev 1.x, 2.x Silicon (Document revision 3.0)
                       Last Modified: 7/1/93



Revision 2.3 Silicon
====================
Revision 2.3 silicon started shipping to customers on November 2, 1992.
Revision 2.3 silicon has the numbers "22" or higher as the first two digits
in the seven digit lot number on the device (22xxxxx).  All silicon shipping
today is Rev 2.3.

Revision 2.0 Silicon
====================
Rev 2.0 silicon started shipping to all customers on August 3, 1992.  Rev 2.x
silicon	can be identified in the lot code on the device	with the letters "EA"
as the first two digits in the date code (EAxxxx).

Revision 1.0 Silicon
====================
Rev 1 silicon shipped from July 1991 through July 1992.

*****************************************************************************

The following lot numbers had problems with boot loader. However, it will nor
produce erratic results (it will not boot-up at all)
 
2064070
2064071
2151191
2152265
2160979
2196891

*****************************************************************************

ERROR 1.  Fetch control logic

PLANNED TO BE FIXED IN PG 3.0

A branch type instruction normally disables instruction fetches as soon as the
branch is decoded. On Rev 2.x and Rev 1.x, a hold-everything pipeline conflict
delays the signal which disables instruction fetches. After the fix, the
instruction fetches will be disable as soon as the branch is decoded. This will
prevent the double fetch of the instruction after a branch.

If all five of the following conditions are true, there is a possibility that
a program fetch will be corrupted. If any one of these conditions is not met,
there will NOT be a problem.
 
1) If the external ready is pulled high (NOT ready), while re-fetching the
   instruction after a branch, and the bus data changes while tri-stated with
   ( ^CE(0-1), ^DE, and ^AE), this problem can occur. The C40 is different than
   the C30, in that the external ready must be used to prevent reads or writes
   from completing while the C40 is tri-stated.
2) The cache must be enabled.
3) A branch type instruction (branch, trap, call, return, RPTB, RPTS) is used
   (Delayed branches are not affected).  The instruction following the branch
   may be corrupted.
4) The problem may occur if the instructions are being fetched from either the
   local or global port (the peripheral bus is not affected) and a pipeline
   conflict occurs at one of the other ports. Below are 3 conditions which may
   cause a pipeline conflict:
   o  If the instruction before a branch attempts to read from a port (other
      than the port used to fetch instructions) and the port is not ready.
   o  If either of the 2 instructions before a branch type instruction does a
      multi-cycle fetch from a port (other than the port used to fetch
      instructions).
   o  If either of the 2 instructions before a branch type instruction attempts
      to store to a port (other than the port used to fetch instructions) and
      that port is not ready.
5) The timing of ready going high, the timing of ( ^CE(0-1), ^DE, or ^AE) going
   high, and the timing of the pipeline conflict on one of the other ports,
   all have to happen at specific times to cause this problem. These timings
   may be hard to control.
 
This is a problem even with single cycle reads.
 
WORK AROUND:
1) Insert 1 or 2 nop's before the branch type instruction to avoid this
   problem. Instructions which do not fetch or store data to ports can also be
   used.
 

ERROR 2.  RETIUD instruction

PLANNED TO BE FIXED IN PG 3.3
 
A problem occurs when a register ready conflict occurs the cycle after a
delayed RETID. The RETID executes but the stack pointer is not decremented.

WORK ARROUND:
Eliminating the register conflict that occurs the cycle after the RETID
instruction, eliminates the problem.  These sequences need to be changed:
 
 These sequences need to be changed      |   Fix              Fix
---------------------------------------------------------------------------
 RETIUD                                  |  RETIUD           RETIUD
 STI  ARn,*AR1 ;store auxreg(n)          |  NOP              STI  AR2,*AR1
 STI  AR0,*ARn ;auxreg(n) used as address|  STI  AR2,*AR1    NOP
 NOP                                     |  STI  AR0,*AR2    STI  AR0,*AR2
                                         |
                                         |
 RETIUD                                  |  RETIUD           RETIUD
 LDI  R0,ARn   ;load auxreg(n)           |  NOP              LDI  R0,AR2
 STI  AR0,*ARn ;auxreg(n) used as address|  LDI  R0,AR2      NOP
 NOP                                     |  STI  AR0,*AR2    STI  AR0,*AR2
                                         |
                                         |
 RETIUD                                  |  RETIUD           RETIUD
 STI  DP,*AR1   ;store data page pointer |  NOP              STI  DP,*AR1
 LDI  @data,R0  ;DP used in address      |  STI  DP,*AR1     NOP
 NOP                                     |  LDI @data,R0     LDI @data,R0
                                         |
                                         |
 RETIUD                                  |  RETIUD           RETIUD
 LDI  R0,DP    ;load data page pointer   |  NOP              LDI  R0,DP
 LDI @data,R0  ;DP used in address       |  LDI  R0,DP       NOP
 NOP                                     |  LDI @data,R0     LDI @data,R0
                                         |
                                         |
 RETIUD                                  |  RETIUD           RETIUD
 LDI  1, IR0        ;load IR0            |  NOP              LDI   1, IR0
 STI  R7, *AR2(IR0) ;IR0 used in address |  LDI 1,IR0        NOP
 NOP                                     |  STI R7,*AR2(IR0) STI   R7,*AR2(IR0)
                                         |
                                         |
 RETIUD                                  |  RETIUD           RETIUD
 STI  IR0,*AR1      ;store IR0           |  NOP              STI  IR0,*AR1
 STI  R7, *AR2(IR0) ;IR0 used in address |  STI IR0,*AR1     NOP
 NOP                                     |  STI R7,*AR2(IR0) STI   R7,*AR2(IR0)
                                         |
                                         |
 RETIUD                                  |  RETIUD           RETIUD
 LDI  100, BK     ;load BK               |  NOP              LDI   100, BK
 STI  R7, *++AR2% ;BK used in address    |  LDI 100,BK       NOP
 NOP                                     |  STI R7,*++AR2%   STI   R7,*++AR2%
                                         |
                                         |
 RETIUD                                  |  RETIUD           RETIUD
 STI  BK,*AR1      ;store BK             |  NOP              STI  BK,*AR1
 STI  R7, *++AR2%  ;BK used in address   |  STI BK,*AR1      NOP
 NOP                                     |  STI R7,++*AR2%   STI   R7,*++AR2%
 
 
ERROR 3.  Cache Update Logic

PLANNED TO BE FIXED IN PG 3.3

In a very special condition, TMS320C40 device will execute an incorrect opcode
due to a bug in the cache update logic. This bug exists on PG 3.2 and earlier
version C40 silicons only. It will be fixed in the later version C40 silicon.
 
When C40 cache is enabled, the cache freeze bit, CF, in the status register,
ST, is used to freeze (CF = 1) and unfreeze (CF =0) the cache update. This
cache problem will occur under the following conditions:
 
     1. The value of CF bit goes through 1-0-1 sequence, and
     2. During CF = 0 period, no instruction fetch is started and a
        multi-cycle instruction fetch is in progress (single cycle program
        fetch will not have problem).
 
When the above conditions are met, the opcode of the multi-cycle fetch
instruction will be put into the cache without updating the cache segment
address register. Therefore, if this corrupted cache segment is executed again
before it gets update with other address instruction, the incorrect opcode will
be executed.
 
An interrupt service routine (ISR) is the most likely place to find this
problem.  Usually, the ST value is saved on the stack in the beginning of ISR
function and restored before the RETI or RETID instruction. Therefore, if there
is an interrupt pending before returning from the ISR and the instruction after
returning from ISR is a multi-cycle fetch instruction, the above condition can
occur. If somehow this corrupted cache is executed again before it gets update
with other address, for instant - cache is frozen (CF = 1), the system will run
into an unexpected situation.  An example that might cause a problem is shown
below:
            :         :
            :         :           ; CF = 0
       Interrupt occurs           ; CF set to 1 and PCF set to 0
          PUSH       ST
            :         :
          ANDN       0800H,ST     ; CF = 0
            :         :           ; 4 segment cache filled
            :         :           ; with these instructions
            :         :
          POP        ST           ; CF set to 1 and PCF set to 0
          RETI                    ; CF set to 0
       Interrupt occurs again     ; CF set to 1 and PCF set to 0
       Repeat above sequence
 
Although the cache is unfrozen in the above interrupt routine program, it does
not mean the problem won't occur if the cache remain frozen in the ISR. However,
the device will be less likely to run the corrupted cache program again before
the faulty cache segment address gets update if the cache is frozen in the ISR.
Normally after the next RETI, instructions will be fetched and the cache error
is cleared.
 
WORK AROUND:
The workaround of this problem is to force the CF value equal to PCF value
before saving ST register or RETI/RETID. Examples are shown below:
 
Example 1:
            :         :
            :         :           ; CF = 0
       Interrupt occurs           ; CF set to 1 and PCF set to 0
          ANDN       0800H,ST     ; Set CF = 0 (PCF = 0)
          PUSH       ST           ; Save the ST value
            :         :
          ANDN       0800H,ST     ; CF = 0
            :         :
            :         :
            :         :
          POP        ST           ; CF set to 0 and PCF set to 0
          RETI                    ; CF set to 0
 
Example 2:
            :         :
            :         :           ; CF = 0
       Interrupt occurs           ; CF set to 1 and PCF set to 0
          PUSH       ST
            :         :
          ANDN       0800H,ST     ; CF = 0
            :         :
            :         :
            :         :
          POP        ST           ; CF set to 1 and PCF set to 0
          ANDN       0800H,ST     ; Set CF = 0 (PCF = 0)
          RETI                    ; CF set to 0
 

ERROR 4.  TOIEEE instruction

FIXED IN PG 2.3

If the floating-point number is a negative power of two (-2.0, -4.0, -8.0,
etc.; in other words s=1 AND f=0), the TOIEEE instruction will convert the
number to the incorrect IEEE number

The IEEE number will be scaled down by 1/4 of the value of the C40
floating-point number.

For example: If the input data is -32.0 (=0x04800000) in C40 format, the
output from TOIEEE instruction will be -8.0 (=0xC1000000) in IEEE format.



ERROR 5.  BcondAF/BcondAT instruction

FIXED IN PG 2.3

If the first instruction after BcondAF or BcondAT is a multi-cycle memory
read, the 3 instrcutions after the branch may not be anuled.

Beware that even if the memory is normally single-cycle, if may be multi-cycle
when changing pages, or a read immediately following a store, etc.

Inserting a NOP after BcondAT or BcondAF will solve this problem.

NOTE : Neither instruction is used by the C Compiler.



ERROR 6.  Return PC address corruption on interrupts

FIXED IN PG 2.0

The C40 CPU contains a four deep pipeline: fetch, decode, read, and execute.
When the interrupt signal is recognized, the C40 will flush the pipeline
before serving the interrupt.  The C40 will complete the instructions at the
"read" and "decode" stages of the pipeline and store the program counter (PC)
of the instruction at the "fetch" stage of the pipeline onto the stack,
allowing the C40 to return to the original stack location after the interrupt.

However, if the interrupt signal is recognized in the "read" or "decode" phase
of a particular instruction, such as, load/store the stack pointer (SP),
parallel store, and store immediate value (STIK), then the wrong return
address from an interrupt service routine may be stored onto the stack.

Specifically:

a) if the instruction before the interrupt is in the "decode" phase and is a
   stack pointer (SP) loading instruction such as:
	  1) LDI  SP,AR1
	  2) LDA  SP,AR3,
	  3) STI  SP,*AR1
	  4) PUSH SP
   or if the instruction in the "decode" or "read" phase is a stack pointer
   storing instruction such as:
          1) SUBI  2,SP
	  2) LDI   IR0,SP
	  3) ADDI3 3,R0,SP
          4) POP   SP

   Note: PUSHing or POPing other CPU registers will not cause the problem.

b) if the instruction in the "decode" phase is a parallel store instruction
   such as:
	  1)   STI  R1,*+AR2(1)	 ||  STI  R3,*AR4
	  2)   STF  R5,*AR3++(1) ||  STF   R2,*-AR5(1)

c) if the instruction in the "decode" phase is a store immediate instruction
   with the immediate value equal to -12 or the five least significant bits
   of the destination address equal to 10100b:
          1)   STIK  -12,*+AR3(4)
	  2)   STIK  -12,@1000h
	  3)   STIK  0,@F914h

   Note: STIK 1,@F913h or STIK -1,@F915h WON'T cause the problem.


WORK AROUND:
The delay unconditional branch instruction (BUD) can be used to frame those
instructions since the interrupt cannot occur in the decode/read/execute
phases of a delayed branch instruction.
                              .
                              .
                              NOP
                              LDI  125,AR2
                              PUSH AR3
 Add this to prevent ---->    BUD  $+4
 the interrupt occurring at   LDA SP,AR3
 next three instructions      LDI  @V_ADDR,AR1
                              NOP
                              .
                              .

NOTE: The  "BUD" instruction  will shield the "decode" and "read" phases of
the next instruction and only shield the "decode" phase of the second
instruction after it.  Therefore if the instruction is storing data to SP
register, it can only be protected in the first instruction after the
"BUD $+4" instruction.  Other cases can be protected in the first and second
instruction after the "BUD $+4" instruction.  Some examples are shown below:

               Example 1:                         Example 2:
               .                                  .
               .                                  .
               NOP                                NOP
               LDI    125,AR2                     LDI   25,AR2
               PUSH   AR3                         PUSH  AR3
               BUD    $+4                         BUD   $+4
               NOP                                LDI   @V_ADDR,AR1
Shielded--->   ADDI3  3,SP,AR3                    NOP
               LDI    @V_ADDR,AR1       NOT---->  LDA   SP,AR3
              .                         Shielded
              .                                   .


              Example 3:                          Example 4:
              .                                   .
              .                                   .
              NOP                                NOP
              LDI    125,AR2                     LDI   125,AR2
              PUSH   AR3                         PUSH  AR3
              BUD    $+4                         BUD   $+4
Shielded--->  SUBI3  3,SP                        NOP
              LDI    @V_ADDR,AR1       NOT---->  LDI   R5,SP
              NOP                      Shielded  LDI   @V_ADDR,AR1
              .                                 .
              .                                 .


              Example 5:                        Example 6:
              .                                 .
              .                                 .
              MPYF   *AR1,R1                    MPYI  *AR1,R1
              BUD    $+4                        BUD   $+4
              ADDF   R1,R2                      ADDI  R1,R2
 Shielded---> STF    R2,*AR3                    LDI   *AR2++,R3
           || STF    R1,*AR1++        NOT ----> STI   R2,*AR3
              LDF    *AR2++,R3     Shielded  || STI   R1,*AR1++
              .                                 .
              .                                 .


              Example 7:                        Example 8:
              .                                 .
              .                                 .
              LDHI   010H,AR6                   LDPK  02FH
              BUD    $+4                        BUD   $+4
              OR     041H,AR6                   LDI   @F800,R2
 Shielded---> STIK   -12,*AR5                   ADDI  *AR2,R2
              LDI    *AR6,R1           NOT----> STIK  0,@FB14H
              .                        Shielded .
              .                                 .


ERROR 7.   Multi-cycle external memory read

FIXED IN PG 2.0

When the CPU performs two indirect reads in the same cycle (resulting from a
3-operand or parallel instruction), and the operand decoded from the "source 1"
field requires more than 1 wait state, the wrong value may be read.

In all of the parallel instructions and some 3-operand instructions, two of the
operands are read from memory locations by indirect addressing. If these two
memory locations are from different ports, then the TMS320C40 CPU can perform
two data reads in the same cycle (it doesn't mean that the data read will be
completed in the same cycle).  There is a problem when source 1 (src1) is a
three or more cycle load (due to wait states or RDY input) and source 2 (src2)
is from another port.  The src1 load completes in 2 cycles, even though it
should wait for 3 or more cycles.

For example, if the configuration of the external buses are

	    AR0	points to -->  Internal	bus
	    AR1	points to -->  Local bus    - 0	wait state
	    AR2	points to -->  Local bus    - 2	wait state
	    AR3	points to -->  Global bus   - 2	wait state

All 3-operand and parallel instructions that have two indirect addressing
data loads from different ports will have this problem.

WORK AROUND:
Exchanging the src1 and src2 data fields or changing to single data accesses
can get rid of the problem. For the above examples, the workarounds are:

1.         LDI    *AR3,R1            OR        LDI    *AR1,R0
        || LDI    *AR1,R0                   || LDI    *AR3,R1

2.         ADDF3  *+AR3(5),*AR0,R1   OR        LDF    *+AR3(5),R2
                                               ADDF3  R2,*AR0,R1

3.         CMPI3  *+AR3(3),*+AR1(2)  OR        LDI    *+AR3(3),R1
                                            || CMPI3  *+AR1(2),R1

This problem DOES NOT effect DMA loads, parallel loads to the same port, or
multi-cycle stores.  The following examples are thus NOT EFFECTED:

                                        src1            src2
1.	    LDI	   *AR2,R0		3 cycles       3 cycles
	 || LDI	   *AR2,R1

2.          SUBI3  *AR3,*+AR3(5),R1     3 cycles       3 cycles


ERROR 8.   Pipeline conflict on the DBcond & DBcondD instructions

FIXED IN PG 2.0

Since the TMS320C40 auxiliary register is modified in the decode phase when it
is used as the pointer in the indirect addressing mode, the TMS320C40 has
pipeline protection to ensure its auxiliary register update sequence.  However,
this pipeline protection fails on the decrement and delayed branch instructions
(DBcond and DBcondD) with zero wait state program memory.

For example, the following program will	cause the problem:

                               .       .
                               .       .
     AR5 is decremented----->  ADDI     1,AR5,R2
     before the	ADDI	       DBU(D)	AR5,LOOP1
                               .       .
                               .       .

When the "DBU" instruction is in the "decode" phase and the "ADDI" instruction
is in the "read" phase, the AR5 is decremented by one already.

WORK AROUND:
Simply adding one instruction between "ADDI" and "DBU" will solve the problem.
For example:
                     .
                     .
                     ADDI    1,AR5,R2
		     NOP
		     DBU(D)  AR5,LOOP1
                     .
                     .

A related error is as follows.  If the first instruction after BcondAF or
BcondAT is a multi-cycle memory read, then the 3 instrcutions after the branch
may not be anuled.

Even if the memory is normally single-cycle, if may be multi-cycle when
changing pages, or a read immediately following a store, etc.


ERROR 9.   DMA errors

FIXED IN PG 2.0

There are three known errors on the C40 DMA logic:

1) ERROR A - In the first cycle after the completion of an auto-init,
   the previous value of the control register is used.

WORK AROUND - Use the same configuration of the DMA functions in the
autoinitialization.

2) ERROR B - After reset or in the DMA fixed priority mode, DMA3 has highest
   priority instead of DMA0.

WORK AROUND - Change priority scheme accordingly.

3) ERROR C - The Autoinit Sync. bit will not disable the auto-init
synchronization requirements.  Whenever the DMA is in Synchronous transfer
mode, the autoinit will be in the same synchronous mode.

WORK AROUND - Avoid using this mode.

Version 4.40 of the Floating Point C Compiler has the option of implementing
these workarounds into the code during compilation. Check the C Compiler
user's guide for details.
Device: TMS320C4x
Category: Device Information
Title: C4x Silicon Errata
Source: TI Apps
Date: 1/4/98
GenId: c40r12se