Designing the legOS Multitasking Operating System

Dr. Dobb's Journal November 1999

An OS for the Hitachi H8 microcontroller

By Markus L. Noga

Markus is a student of computer science at the University of Karlsruhe, Germany. He can be contacted at noga@inrialpes.fr.

Mindstorms is a system developed by Lego and the Massachusetts Institute of Technology (MIT) that lets you design, build, and program robots using standard Lego pieces, a handful of sensors, and a microcontroller that is programmable via an infrared link with a PC. More specifically, the Mindstorms Robotics Invention System (see http://www.legomindstorms .com/) consists of more than 700 Lego pieces, two motors, gears, two touch sensors, a light sensor, the microcontroller, an infrared transmitter, and a PC cable to connect it to one of your COM ports. The packaged software currently supports Windows 95/98/NT.

Constructing Mindstorms robots involves combining normal Lego pieces with sensors (see Figure 1) that can handle data input and enable the robot to respond to external situations. You program the Robotic Command Explorer (RCX) computer (Figure 2) in a graphical language called "RCX Code." Finally, as Figure 3 illustrates, the code is then downloaded from your PC to the RCX via an infrared link.

Although you typically program the RCX using RCX Code, you can also use SPIRIT.OCX (provided in the Mindstorms SDK) to control the unit using Visual Basic, Visual C++, Delphi, or the like. Figure 4 illustrates the RCX software system, which consists of a graphical programming environment, OCX control, and bytecode Virtual Machine (VM). Although powerful, each of these layers introduces certain limitations. The graphical environment limits variable usage to one counter, rendering it unusable for any but the simplest applications. The event-driven programming model is severely hampered by the lack of reentrancy and task synchronization. And finally, like the OCX control, the environment is Windows only.

The VM, on the other hand, provides 32 global variables, a write-only datalog, eight concurrent tasks, and direct control from a PC. Still, the VM also suffers from drawbacks, lacking, for instance, a stack, a real heap, and the concept of priorities.

Alternative Software

Because of these and other limitations in Mindstorms software, a number of freeware/shareware alternatives have become available, since Kekoa Proudfoot reverse engineered the bytecode instruction set (http://graphics.stanford .edu/~kekoa/). Dave Baum conceived of Not-Quite-C (NQC), a C dialect suitable for the bytecode VMs. This implies no local variables, compound expressions, reentrancy, arrays, or user-defined data types. Baum's NQCC compiler, now integrated in his RCX Command Center, runs on a wide variety of host platforms and is still the most widely used bytecode cross-compiler. Other tools include RCX Creator (Barry Ruffner), BotCode (Marty Balash), and Bot-Kit (based on Dolphin Smalltalk), along with tools and techniques for programming the RCX with VBScript or JavaScript.

Programming environments centering around NQCC and other compilers targeting the RCX VM followed, and high-level wrappers for the OCX control provided by Lego abound. However, all these approaches share the limitations imposed by the VM. Frustrated with these shortcomings and the unavailability of so many crucial features, I launched the legOS project as an alternative to the RCX firmware. My goal was to design a small embedded operating system, implemented in standardized languages and native assembler instead of exotic dialects and bytecodes. In other words, I wanted to tear down the padded VM environment while providing similar or superior system services. legOS offers a number of features, including preemptive multitasking, energy saving, dynamic memory management, POSIX semaphores, native access to display, buttons, infrared (IR) communication, motors and sensors, and a new kernel and library. legOS can be programmed with standard C or assembler. It is available from DDJ (see "Resource Center," page 5) and at http://www.multimania .com/legos/.

The Real Machine

The heart of the Mindstorms RCX is a 16-MHz Hitachi H8/3292 series microcontroller. This processor line sports an 8-channel 10-bit A/D converter, serial communications interface, several timers, and multiple digital I/O ports among its on-chip modules. The version inside the Mindstorms offers 16 KB of ROM and 32 KB of RAM. Fortunately, the H8 instruction set is a supported target platform for several flavors of the freely available GNU C compiler. I used binutils-2.9.1 and gcc-2.8.1 under RedHat Linux 5.1. Pre-1.1.1. egcs and other gcc versions might need minor patches to cross-compile correctly, so stick with gcc-2.8.1 if possible.

These tools are available from any number of sources, including the GNU ftp site mirrors and most Linux distributions. Their installation on a UNIX-like platform is a near-classic case of "make ; make install;" see Example 1. Installation under Windows is equally straightforward with the aid of cygwin32, a free UNIX compatibility package by Cygnus Support (http:// www.cygnus.org/).

Design Decisions

I chose to comply with standard UNIX interfaces wherever possible. Most C programmers are familiar with puts(), sleep(), malloc(), and exec(), and I wanted to ease their transition. However, this is also a question of personal taste, because I dislike the abbreviated, technical naming conventions in other embedded operating systems that might have served as role models.

Outside the VM sandbox, there is no memory protection whatsoever; like many embedded systems, the H8 simply lacks the concept. Malformed or malicious user applications can always overwrite kernel code and data. Consequently, I sacrificed security for speed and simplicity.

To minimize memory footprint, I wanted to give programmers full and fine-grained control over which functionality to include in the kernel. As in most embedded operating systems, this can easily be achieved with conditional defines.The full system's memory requirements are currently a third of the original firmware's.

The kernel was to be statically linked to user applications, forming a binary downloadable with standard firmware tools. This was the only initially feasible method, as an IR driver didn't exist. Also, it emphasizes independent operation of the RCX. Dynamic linking of user applications against a kernel symbol table on a host PC will be a future option for task download via IR.

I decided against a filesystem, since there's no mass storage on the RCX. And although user-mode hardware access via special device files is an elegant method on systems with memory protection, the operating security gained on the H8 would be illusory. Currently, I like to view the hardware devices as OS-provided objects with a thin public interface mostly consisting of inline functions. Device files might be considered in the list of possible future extensions for the sake of elegance. Then again, when did ioctl()s become elegant?

By textbook definition, the three basic tasks of an operating systems are:

Memory management.
Resource management.
Task management.

Lacking a complete emulator for the RCX hardware, resource access was clearly the first milestone to tackle. Without I/O, it's difficult to see if your programs work. As tasks need stacks, memory management became the second milestone and task management the third. Thus, the course of development was set.

Resource Management

Due to space constraints, I'll only address the console driver -- that is, the LCD display and button control -- in this article. Although most of the other drivers are fairly straightforward derivations of the H8 hardware manual (see http://www.multimania .com/legos/), nearly every single one contains RCX-specific twists. For example, the IR driver needs to enable carrier frequency generation exclusively when sending, and the A/D driver has to provide pulsed alimentation for active sensors such as light and rotation.

The starting point for the console driver involved the ROM functions to display symbols and numbers on the LCD and refresh the display and query buttons. Their addresses and some of their parameters had been located by Kekoa, who used these functions in his first assembler firmware.

In an early version of legOS, I relied exclusively upon ROM to address the LCD. Examination of the ROM disassembly narrowed the region of possible control codes. I then wrote experimental programs that stepped through these ranges and noted the display changes affected.

ROM functionality includes 4- and 1-digit decimal number display with an optional selectable decimal point. This is how ROM and firmware display sensor values, program numbers, and time. There are also codes to set and unset individual graphics symbols such as the IR transfer indicator. legOS makes these capabilities accessible via symbolic constants and inline wrappers in rom/lcd.h.

In-depth analysis of the ROM disassembly revealed further potential for enhancements. All ROM display functions employ the memory area 0xef43-0xef4b as a bit field indicating segment states. From this buffer, the display is updated by the lcd_refresh() function. Recognizing the rare possibility to improve both footprint and execution speed, I added direct-lcd.h to access the data area directly, bypassing ROM for all but display refresh (see Example 2 for a comparison). Gradually bypassing ROM completely is a general pattern in legOS development.

Notice how arguments to ROM functions need to be recopied. ROM calling conventions differ from gcc's -- 16-bit arguments and return values are passed in r6, not r0. Unfortunately, gcc's excellent support for assembler instructions with C operands only offers constraints for register classes, not individual registers (see Example 3 for an example of how button states are read). gcc also treats r0..r3 as clobbered by function calls, although ROM preserves them. Trivial changes to the H8 machine definition file would have accomplished seamless compatibility with ROM, but I chose not to provide compiler patches for every gcc version or binaries for every platform.

Direct access to the LCD bit field also expanded the display capabilities. Although the memory layout of every 7-segment digit is different, driver functions with a uniform interface could be implemented in conio.c with minimum footprint, due to the H8's powerful bit-manipulation functions. Providing softfonts to display hexadecimal or maimed ASCII characters paved the way to display hexwords and strings with cputw(unsigned) and cputs(char*).

Memory Management

The choice of a memory-management algorithm was dictated by circumstances. Total system memory of slightly less than 32 KB and an overall design goal of small footprint ruled out page-oriented allocation schemes immediately; on average, they waste half a page per allocated block. Also, the smaller the blocks, the bigger their relative administrative overhead.

This left continuous allocation. Of the possible variants, I decided to use a straightforward scheme (see Table 1) that is optimal in terms of space allocation and linear with a low constant in the time domain. Why space optimal? For each allocated block, you need to store its length somewhere if you ever intend to free it. With a 32-KB address space and a processor in need of word alignment for all but character data, this translates to 14 bits. Furthermore, you need to track a block's status with a minimum of three categories -- free, allocated, and reserved (for memory-mapped I/O areas). This would fit in the remaining two bits.

For multitasking systems, however, it is desirable to track block owners, if only to clean up after task death. With the constraint of word alignment, this translates to another two bytes of administrative data. Free and reserved blocks are efficiently implemented as special owner PIDs 0x0000 and 0xffff, respectively.

To minimize fragmentation, freed blocks must be united with neighboring free blocks. This, however, means traversing the block list from the beginning to locate the predecessor. malloc() performs this traversal in any case, so unification was integrated there, leaving free() an atomic O(1) operation. Since the priority scheduler has to free process data and stack areas, this property is handy.

The memory-managed area starts immediately after the last byte of static data. Fortunately, the GNU linker provides a method of inserting symbols at precise locations in the memory map (see Example 4). This spares you from clumsy double compilation or compile-and-patch approaches. As a final common optimization, a pointer to the first free block is maintained.

Task Management

My next goal was preemptive multitasking. The subgoals involved in changing to a new task context can be clearly separated in a task switcher (which performs the actual context switch) and task scheduler (which decides where to spend the next timeslice). The latter should ideally be totally platform independent, whereas the former is machine-specific by definition.

To keep the scheduler interface lean and mean, I saved all context information on the stack frame. The only argument to the scheduler is the current task's stack address, and the only return value is the next task's. Consequently, the switcher is ignorant of process data structures. Starting a new task became the question of assembling a stack from which the switcher would return to the desired start address and having the scheduler return this stack address.

Initially, I selected a barebones circular scheduling scheme for maximum stability. The harder part was getting the switcher to work. I started with a cooperative approach, forcing tasks to yield explicitly. This gave me a semistable environment wherein I could control the exact moment a task switch would occur.

Trying to stick with UNIX standards, I examined various methods of task startup and shutdown. A fork seemed infeasible, as it requires a full copy of both data and stack segments. It is also overkill for most embedded applications. Finally, I extended the exec family with execi(). This function inserts a new process into the scheduler and allocates stack space for it. execi() is nonstandard in that it doesn't replace the old process, but simply adds a new one. In more orderly fashion, exit() ends a process' life, and kill() allows it to prey on others.

After this system proved reasonably stable, I plunged into the joys of IRQ handling. As Table 2 shows, Hitachi placed the H8's IRQ vector table at 0x0000 as usual -- a ROM address. However, in accord with the scheme of downloadable firmware, Lego's engineers displayed some excellent software engineering. They created a secondary IRQ table in RAM and a primary ROM wrapper for every IRQ (see Example 5). The secondary IRQ handlers can be easily overridden by new firmware. Replacement handlers have to return with a normal "subroutine return" instruction, and they can rely on r6 being saved and restored by the wrapper.

The H8/3929 series provides one 16-bit and two 8-bit on-chip timers, each with several associated interrupts for compare-to-constants and overflow events. Fractions of the system clock are available as internal clock frequencies, as well as various external trigger modes. The 16-bit timer can generate two PCM waveforms, the 8-bit timers provide one each. Additionally, an on-chip watchdog timer is available.

Experimentation with the timers quickly indicated that the first 8-bit timer's output was wired to the speaker, ruling it out for task switching. As the second 8-bit timer appeared to be safe, I employed it for my first scheduler. Later, I discovered it was generating the IR carrier frequency, so I moved the task switcher to the 16-bit timer.

It was somewhat disappointing to notice that some IRQ handlers may crash the system. For IRQs associated with certain status bit changes, the handlers have to reset the status bit before returning or the system will freeze. You should always mask out IRQs instead of providing empty handlers. The otherwise comprehensive, freely available hardware manual from Hitachi (link on http:// www.multimania .com/legos/) fails to address this fact.

The current scheduling algorithm is priority based. Per default, only pairwise different priorities are admitted, as this allows for considerably faster scheduling. Support for equal priority levels by priority rotation within a level can be compiled in.

Processes may wait for arbitrary event functions to become nonzero -- this is how sleep() is, and select() may be implemented. To conserve battery power, I introduced a special idle task of lowest priority. It switches the processor to sleep mode until the next interrupt occurs in case all regular processes are waiting for events.

Additional Features

The combination of memory and task management prompted a means of synchronizing access to critical sections. POSIX semaphores are an accepted solution to this problem, and the POSIX standardized interface was easy to implement.

Writing an IR driver was frustrating until I had a flash of inspiration. No matter what settings I tried for the H8's on-chip serial module, the PC wouldn't receive anything. Then I realized the carrier frequency had to be generated on-chip. The IR driver supports half-duplex operation at arbitrary baud rates and collision detection. Hardware constraints limit usable transmission speeds to 4800 baud, but this allows over four-times faster communication than the original firmware for limited distances. For the near future, there is talk of writing packet networking code.

About that time, another programmer discovered the motor driver I/O address to be 0xf000. That same day, I wrote a first PCM driver. A later discussion on the Mindstorms mailing list led to an advanced PCM driver derived from the Bresenham line-drawing algorithm. It offers 256 speed levels instead of the eight available with the standard firmware.

Support for basic resistive sensors (that is, the touch and light sensors packaged with the Mindstorms set) was straightforward to add with code taken almost literally from the H8/3292 hardware manual. Guessing conservatively, a 10-KHz sampling rate can be achieved.

Making the light sensor measure reflectivity instead of just ambient light was largely trial-and-error. Power for the LED had to come from some specific output pin, and there were only a handful to try. As the cable to the sensor has only two wires, alimentation and A/D conversion are mutually exclusive. They have to be synchronized. Rotation sensors additionally require a state machine to decode.

Examinations of the RCX hardware indicate that two additional A/D pins might be usable. Should this be confirmed, legOS will support two additional sensors.

Conclusion

There's plenty of room for improvement in legOS. Apart from C++ support, device files, and enhanced stability, most items on the wish list depend on IR packet networking.

A shell task could provide a clean method of loading applications via IR as previously outlined, saving application developers from downloading the kernel every time. Together with enhanced IR speed, this could potentially divide program download times by four or more.

A gateway application on the host computer could provide TCP/IP connectivity, enabling services such as local and remote joystick control and status display. Commercial products such as EMIT 3.0 (http:// www.emit.com/), available with an H8 evaluation board, might serve as an inspiration.

Packet networking would also vastly extend the possibilities of inter-RCX communication, facilitating the development of systems of interacting agents.

legOS already fills a technically interesting niche for development targeting the Lego Mindstorms. With some of the aforementioned extensions, it might become a feasible platform for courses in introductory robotics, mobile agent systems, and certain forms of rapid prototyping.

DDJ