TI Santa Barbara - BIOSuite Overview (System Performance)

   Technical Overview
          A Real-Time DSP Environment
          Kernel Modules
          System Performance
          TI API Specifications
   FAQ's
   Tech Support Request Form
   Press Release

BIOSuite Technical Overview -
System Performance

A key design objective of DSP/BIOS has been to furnish mainstream DSP developers with a rudimentary set of software building-blocks which, at once, must support a wide range of embedded applications and yet incur only minimal time/space overhead within the system as a whole. Through a calculated partitioning of functionality between the DSP/BIOS kernel and its attendant host tools for static configuration and dynamic analysis, run-time demands for MIPS as well as memory within the target system can be reduced to unprecedented levels when benchmarked against other real-time kernels.

To quantify its time/space overhead, consider a simple and somewhat generic application of the DSP/BIOS kernel within a basic telecommunications system that transforms an 8 kHz input (voice) stream into a 1 kHz output (data) stream using a 8:1 compression algorithm operating on 64-point data frames. Figure 8 illustrates the high-level organization of the target application program around DSP/BIOS kernel objects.

[Figure 8]

Figure 8. Basic telecommunications example transforming a voice stream into a data stream.

The compression algorithm at the heart of this particular example executes once every 8 ms within the context of a single DSP/BIOS signal object, triggered when the next full input and empty output frame are ready for processing. This particular implementation relies upon statically-configured pipe callbacks to the kernel function SIG_andn,which clear individual bits in the signal mailbox representing this pair of triggering conditions. Once dispatched, the signal handler resets the mailbox to its initial non-zero value before retrieving descriptors for the next set of frames and then invoking the algorithm itself. (See the following code sample)

compressionSignal(inputPipe, outputPipe) {
    PIP_get(inputPipe);             /* dequeue full frame */
    PIP_alloc(outputPipe);          /* dequeue empty frame */

    `call compression algorithm`    /* read/write data frames */

    PIP_free(inputPipe);            /* recycle input frame */
    PIP_put(outputPipe);            /* wait for next frame pair */

    return;
}

At the opposite ends of the input and output pipe from the compression signal lie a pair of interrupt service routines which manage the underlying hardware peripherals that ultimately produce and consume the data streams processed by the algorithm. Notwithstanding differences in the implementation of these routines reflective of the underlying peripherals - say, whether a hardware FIFO or DMA controller can mitigate a less efficient interrupt-per-point approach in favor of a single interrupt on frame boundaries - these interrupt threads invariably exchange full and empty data frames with the signal thread through analogous pairings of PIP operations. Since PIP_free and PIP_put will implicitly callback to SIG_andn which in turn will post the compression signal when its mailbox converges to 0, this segment of the interrupt routine must be appropriately bracketed with HWI_enter and HWI_exit macros to ensure the kernel gains control upon return and performs the necessary context switch. (See code sample below.)

inputInterrupt() {
    `service hardware`
    if (`current frame is full`) {
        HWI_enter;
        PIP_put(inputPipe);         /* enqueue current full frame */
        PIP_alloc(inputPipe);       /* dequeue next empty frame */
        HWI_exit;                   /* dispatch pending signals */
    }
}

outputInterrupt() {
    `service hardware`
    if (`current frame is empty`) {
        HWI_enter;
        PIP_free(outputPipe);       /* enqueue current empty frame */
        PIP_get(outputPipe);        /* dequeue next full frame */
        HWI_exit;                   /* dispatch pending signals */
    }
}

Table II quantifies the overall performance of the DSP/BIOS kernel within this sample application, focusing exclusively on the MIPS and memory overhead incurred through usage of kernel objects and APIs under various execution scenarios. All MIPS and memory figures are given for a TMS320C54x DSP, assuming code and data reside in on-chip SARAM. The first row delineates a reference point for the remainder of the table, and represents the overhead introduced by the DSP/BIOS kernel objects and APIs depicted in the earlier program examples. Note that the 936 words of program ROM encompasses all kernel functions, but does not include the application program itself (which could be arbitrarily large). Likewise, the data RAM figures in the table only include the kernel's internal memory needs plus those of any pre-configured program objects - signals, pipes, accumulators, etc. - and do not take into account any space for tables or arrays already required by the application independent of DSP/BIOS; these figures also do not take into account the size of the internal message buffer associated with a special system LOG object included in the baseline, since the size of this buffer is ultimately a configurable parameter.

	Program ROM	System ROM	Scratch RAM	Software Stack	Object RAM	'C54x Cycles
baseline system	936 words	1 page	4 words	50 words	104 words	.09 MIPS
1 millisecond timer/real-time clock	-	-	-	-	-	.08 MIPS
4 millisecond periodic function	-	-	-	33 words	25 words	.12 MIPS
statistics accumulation	-	-	-	-	-	.06 MIPS
event logging	-	-	-	-	-	.06 MIPS
host probe on input pipe	-	-	-	-	155 words	.41 MIPS

Table II. TMS320C54x performance numbers.

The processor utilization figure of .09 MIPS sums all of the kernel functions directly or indirectly invoked by the sample application during its basic 8 ms processing cycle, and includes the total number of instructions required to enqueue/dequeue data frame descriptors from the pair of program pipe objects as well as to switch context to the foreground compression signal upon return from the posting interrupt routine. For consistency, this figure does not reflect the MIPS consumed by the compression algorithm itself along with any hardware-specific processing required in the interrupt routines - cycles intrinsic to the application itself and its underlying hardware platform.

The second row introduces real-time clock support through a 1 ms (millisecond) timer interrupt controlled by the CLK module, used subsequently for statistics accumulation as well as to serve as an underlying time base for driving PRD_tick to periodically execute another application-level function (say, at twice the rate of the 8 ms (millisecond) compression signal) as reflected in the third row of the table. Not surprisingly, introduction of the 4 ms (millisecond) periodic function requires extra stack space to accommodate the additional level of signal preemption.

The next two rows quantify the overhead introduced by enabling automatic statistics accumulation and event logging within the DSP/BIOS kernel, utilizing extent STS accumulator objects already associated with the compression signal and periodic function threads as well as a system LOG object pre-allocated as part of the baseline configuration. The .06 MIPS consumed by statistics accumulation results from tallying each thread's execution latency with internally paired calls to the kernel functions STS_set and STS_delta using the high-resolution clock value returned by CLK_gethtime. The .06 MIPS consumed by event logging similarly results from a trio of internal LOG_event calls as each of the two threads is triggered, dispatched, and terminated.

The last row of the table summarizes the time/space overhead that ensues by introducing a HST data channel object used, in this case, to probe the application's input pipe and stream its contents to a host file. While seemingly large, the 155 words of supplementary RAM required to implement this capability is dominated by a pair of 64-point frames that would hold the data itself. Similarly, the figure of .41 MIPS not only includes an internal quartet of PIP operations invoked on frame boundaries but also folds in a "worst-case" scenario in which the real-time host/target link generates an interrupt-per-point at the underlying 8 kHz rate.

At less than 1 MIPS of total overhead with all modes of automatic instrumentation enabled, the DSP/BIOS kernel can find a place in even the most resource-constrained of DSP applications. Besides furnishing standard run-time services for structuring an application program in the manner illustrated earlier - with baseline kernel overhead of only .06 MIPS - the true value proposition demonstrated by this example becomes the small incremental cost of the instrumentation itself, giving weight to the claim that DSP/BIOS can serve as essential infrastructure for a broad system test and diagnostic strategy for field and factory alike.

BIOSuite Technical Overview - System Performance

BIOSuite Technical Overview -
System Performance