'C3x/'C4x C Compiler: Tips for Efficient Code Generation

The efficiency of the code generated by the floating point compiler depends to a large extent on how well you take advantage of the compiler strengths when writing your C code. There are specific constructs that can vastly improve the compiler's effectiveness.

  1. Use register variables for often-used variables. This is particularly true for pointer variables (there are 4 registers allocated for pointer register variables). For example, the following code fragment exchanges one object in memory with another:
    	  do
    	  {
    	      temp  = *++src;
    	      *src  = *++dest;
    	      *dest = temp; 
    	  }
    	  while (--n);

    Without register variables, this code takes 12 instructions and 19n cycles. With register variables, it takes only 4 instructions and 7n cycles.

  2. Avoid integer multiplies (or use -m). The MPYI instruction on the C30 uses 24 bit operands, forcing the compiler to use runtime support to do full 32 bit arithmetic. The -m option forces the compiler to use MPYI if you know 24 bit multiplies are sufficient for your application.
  3. Pre-compute subexpressions, especially array references in loops. Assign commonly used expressions to register variables where possible.
  4. Use pointers to step through arrays, rather than using an index to recalculate the address each time through a loop.
    As an example of the previous 2 points, consider the following loops:
    main()
    {
    float a[10], b[10];
    int i;

    for (i = 0; i < 10; ++i)
    a[i] = (a[i] * 20) + b[i]
    }
    main()
    {
    float a[10], b[10];
    int i;
    register float *p = a, *q = b;

    for (i = 0; i < 10; ++i)
    *p++ = (*p * 20) + *q++;
    }

    The loop on the left executes in 19 cycles; the equivalent one on the right executes in 12.

  5. Use structure assignments to copy blocks of data. The compiler generates very efficient code for structure assignments, so nest objects within structures and use simple assignments to copy them.
  6. Avoid large local frames, and declare the most often used local variables first. The compiler uses indirect addressing with an 8 bit offset to access local data. To access objects on the local frame with offsets greater than 255, the compiler must first load the offset into an index register. This causes 1 extra instruction and incurs 2 cycles of pipeline delay.
  7. Avoid the large model. The large model is inefficient because the compile reloads the data page pointer (DP) before each access to a global or static variable. If you have large array objects, use "malloc()" to dynamically allocate them and access them via pointers rather than declaring them globally. For example:
    int a[100000];                   /* BAD */
    
    int *a = (int *)malloc(100000);  /* GOOD */

Device: TMS320C3x
Category: TI Tools
Detail: Code Generation Tools
Detail2: Compiler
Title: 'C3x/'C4x Compiler: Tips for Efficient Code Generation
Source: Case from TMS320 Hotline
Date: 8/1/97
GenId: 0300011

© Copyright 1998 Texas Instruments Incorporated. All rights reserved.
Trademarks, Important Notice!