 'C3x/'C4x C Compiler: Tips for Efficient Code
Generation
The efficiency of the code generated by the floating point compiler depends to a large
extent on how well you take advantage of the compiler strengths when writing your C code.
There are specific constructs that can vastly improve the compiler's effectiveness.
- Use register variables for often-used variables. This is
particularly true for pointer variables (there are 4 registers allocated for pointer
register variables). For example, the following code fragment exchanges one object in
memory with another:
do
{
temp = *++src;
*src = *++dest;
*dest = temp;
}
while (--n);
Without register variables, this code takes 12 instructions and 19n cycles. With
register variables, it takes only 4 instructions and 7n cycles.
- Avoid integer multiplies (or use -m). The MPYI instruction
on the C30 uses 24 bit operands, forcing the compiler to use runtime support to do full 32
bit arithmetic. The -m option forces the compiler to use MPYI if you know 24 bit
multiplies are sufficient for your application.
- Pre-compute subexpressions, especially array references in loops. Assign
commonly used expressions to register variables where possible.
- Use pointers to step through arrays, rather than using an
index to recalculate the address each time through a loop.
As an example of the previous 2 points, consider the following loops:
main()
{
float a[10], b[10];
int i;
for (i = 0; i < 10; ++i)
a[i] = (a[i] * 20) + b[i]
} |
main()
{
float a[10], b[10];
int i;
register float *p = a, *q = b;
for (i = 0; i < 10; ++i)
*p++ = (*p * 20) + *q++;
} |
The loop on the left executes in 19 cycles; the equivalent one on the
right executes in 12.
- Use structure assignments to copy blocks of data. The
compiler generates very efficient code for structure assignments, so nest objects within
structures and use simple assignments to copy them.
- Avoid large local frames, and declare the most often used local
variables first. The compiler uses indirect addressing with an 8 bit offset to
access local data. To access objects on the local frame with offsets greater than 255, the
compiler must first load the offset into an index register. This causes 1 extra
instruction and incurs 2 cycles of pipeline delay.
- Avoid the large model. The large model is inefficient
because the compile reloads the data page pointer (DP) before each access to a global or
static variable. If you have large array objects, use "malloc()" to dynamically
allocate them and access them via pointers rather than declaring them globally. For
example:
int a[100000]; /* BAD */
int *a = (int *)malloc(100000); /* GOOD */
|