Trends in Operating System Design

Will we gain portability at the expense of performance?

Peter D. Varhol

Peter is chair of the graduate computer science department at Rivier College in New Hampshire. He can be contacted at varholp@alpha.acast.nova.edu.


Over the past several years, we've witnessed a number of trends affecting operating-system design, foremost among them a move to modularity. Operating systems such as Microsoft's NT, IBM's OS/2, and others are splintered into discrete components, each having a small, well-defined interface, and each communicating with others via intertask message passing. The lowest level is the microkernel, which provides only essential OS services, such as context switching. Windows NT, for example, also includes a hardware-abstraction layer beneath its microkernel which enables the rest of the OS to perform irrespective of the processor underneath. This high level of OS portability is a primary driving force behind the modular, microkernel-based push.

For an example of a modular, operating-system architecture, there's no better place to look than QNX Software's QNX operating system. QNX is a real-time OS with a UNIX-like command language. QNX consists of a tiny (around 8-Kbyte) microkernel that only handles process scheduling and dispatch, interprocess communication, interrupt handling, and low-level network services, all of which are accessible through 14 kernel calls. The size and simplicity of the kernel allows it to fit entirely in the internal cache of processors such as the 80486.

A minimal QNX system can be built by adding a process-manager module, which creates and manages processes and process memory. To use a QNX system outside an embedded or diskless system, a file system and device manager can be added. These managers run outside kernel space, so the kernel remains small. For more details, see the accompanying text box entitled, "QNX: A Scalable, Microkernel-Based Operating System" and "A Message-Passing Operating System," by Dan Hildebrand (DDJ, September 1988).

Likewise, IBM's Workplace operating system (see Figure 1) is based on the Mach 3.0 microkernel, although IBM-specific extensions (developed with the OSF Research Institute) support parallel processors and real-time operations. This implementation counts five sets of features in its core design: interprocess communication (IPC),

virtual-memory support, processes and threads, host and processor sets, and I/O and interrupt support.

Process dispatch is in the microkernel, but process scheduling is not. The design goal behind this distinction is to separate policy from mechanism. In this case, dispatch is a core mechanism that need never change, but scheduling is a policy that might. This lets you swap the default scheduler for one that provides stronger support for real time, for example, or for a specialized scheduling policy for nonstandard uses.

Above the microkernel, IBM implements personality-neutral services (PNSs) that implement a policy rather than a mechanism, and run outside kernel space. Memory management, for instance, is divided between the microkernel and a PNS. The kernel itself operates the paging functions of the CPU. The pager, operating outside the kernel, determines the page-replacement strategy--that is, which pages will be removed from memory to accommodate a page brought in as a result of a page fault. The pager implements a policy, and the policy can be changed through the use of an alternative pager. IBM is providing a default pager to boot Workplace OS, but the primary paging mechanism is actually the file system, which provides memory-mapped file I/O, caching, and virtual-memory policies, combined.

PNSs include not only traditional OS services (such as the file system and device drivers), but also networking and even database engines. Behind this strategy is IBM's belief that placing application-oriented services such as these close to the microkernel can improve the efficiency of data transfers and queries. Third-party database vendors such as

Oracle can then embed database engines as PNSs to improve performance or make more-direct use of kernel services.

The third layer of modules, closest to the user, is composed of individual personalities. A "personality" is the appearance and behavior of an operating system from the standpoint of the end user. OS/2 can be one personality, Windows another, UNIX a third. The personality looks like the operating system, and system services behave in the expected manner, but many of the services are actually implemented at the PNS level, differently than in the original OS. IBM has demonstrated a UNIX personality, which was simply the entire OSF/1 image running on top of Mach.

Objects and Distributed Computing

Another major trend is objects finding their way into operating systems. The primary characteristic of objects that makes them worth using in an operating system is encapsulation.

This makes possible, for example, object-embedding technologies such as Microsoft's object linking and embedding (OLE) that would have been difficult (if not impossible) using a file-based data paradigm.

Objects and message passing go hand in hand. In a classic object-oriented system, messages carry data objects along with instructions on what to do with that data. In an OS, message passing helps modularize the operating-system architecture, since the transfer of data is not dependent upon having a function to call.

Operating systems such as QNX and Windows NT already use message passing, at least to some extent. Message passing in NT supports networking as well as security. For example, the security gateways check every system message to ensure that the user has the privileges to send that message. Consequently, data and instructions are under better control than in a traditional OS.

Among the emerging object technologies are IBM's System Object Model (SOM) and the Distributed System Object Model (DSM), Microsoft's Component Object Model (COM), the Object Management Group's Common Object Request Broker Architecture (CORBA), Next's Portable Distributed Objects (PDO), and Taligent's Taligent Operating Environment (TOE).

Performance is an Issue

One question that's hounded message-based operating systems from the start is performance. Does communicating with different components through message passing--as opposed to straight function calls--hurt performance? It clearly can (although QNX claims that its message-passing architecture offers performance comparable to that of traditional architectures). In object-oriented languages such as Smalltalk, vendors claim decent but hardly stellar performance for message passing. Whether the OS queues messages, or whether a message blocks until the recipient executes a receive (as in QNX), it is easy to see that this mechanism can be slower than a function call.

These new operating systems are using a variety of techniques to improve message-passing performance. One common approach, used by IBM, involves the use of shared memory space so that data doesn't have to be copied from one memory address to another. However, this still requires that two processes establish a connection before the shared-memory approach can work. This is still a two-step process (connect, then exchange), so it is still more time consuming than a straight procedure call.

Windows NT takes this one step further, with a special implementation of the local procedure call for Win32 applications called the "quick LPC." This technique opens one port to establish the connection between processes, then passes multiple messages through a shared memory space without the need to send additional messages through the port. However, there is a trade-off: NT assigns a thread to every instance of the quick LPC, and this uses up system resources.

Another performance issue revolves around memory utilization. If higher-level OS services run in user space, as they do with QNX and Workplace OS, there's a trade-off between efficiency and performance. Kernel processes cannot be swapped out to disk, while user processes can. This means that an OS that relies on user processes may run in less memory, at the expense of speed. One solution to this is a configurable kernel. The next release of OSF/1 will let system administrators determine whether to run large parts of the OS in kernel or user space. Thus, you'll be able to tune the OS for specific needs.

Overall, reduced performance may be a consequence of the direction operating systems are taking, as has been true over the last few years with windowed systems. One alternative is to give priority to a particular operation at the expense of others, as Windows NT does with I/O and context switching.

Conclusion

The modularity of emerging operating systems will not be very noticeable to application programmers in the near future. Most of us will be programming on commercial versions with most of the major building blocks built in. There will probably be about the same number of APIs, although it may be important to know which module a particular API applies to.

The benefits will be primarily indirect. The unified approaches to OSs, for example, mean that porting applications will be easier. Particularly with IBM's multiple personalities, the OS issue may not even matter, as long as the CPU is the same.

The big change will come with objects. Both OLE and Apple's OpenDoc (a compound-document architecture designed for sharing text, graphics, and video objects across operating systems) will require that developers understand and adhere to the underlying object model so that they can take advantage of the ability to establish hot links between data objects into compound documents. Applications will have access to OS services that will fundamentally change how we view data.

The bad news is that there are competing object models. OpenDoc includes support for Microsoft's OLE 2.0 spec so that an OLE application should work with an OpenDoc operating system, but not vice versa. Other object models will have their own ways of doing things. Multiple personalities such as the Workplace OS, will relieve some of the learning curve, but versatile programmers will have to know not only C++ and objects, but how multiple operating systems use them.

Figure 1: IBM's Workplace operating system is based on the Mach 3.0 microkernel architecture.

QNX: A Scalable, Microkernel-Based

Operating System

The operating system of the future may best be modeled by QNX Software's QNX, a 32-bit multitasking OS that utilizes a tiny microkernel. QNX takes a modular approach to services that lets you choose only those services necessary for a particular use. QNX is not an implementation of UNIX, despite its UNIX-like command language and POSIX compliance. It is a separate and distinct operating system from the ground up, and it uses technologies just now starting to come into the mainstream.

The heart of QNX is its microkernel, which implements interprocess communication, low-level network services, process scheduling, and interrupt dispatching; see Figure 2. Process scheduling is real time with preemption, and scheduling is prioritized with round-robin, FIFO, and adaptive-scheduling disciplines. All kernel services are available through 14 APIs, so the ways to access the kernel services are limited.

QNX is a message-passing operating system that utilizes blocking versions of Send, Receive, and Reply function calls. Messages don't queue--the message facility is a process-to-process copy, which QNX claims provides performance comparable to function calls. You can construct your own message queues using built-in messaging primitives.

However, the microkernel does not include process managers, device managers, or a file system. The process manager, Proc, provides services such as process creation and accounting, memory management, inheritance, and pathname-space management. Together, the kernel and Proc provide the features necessary to implement a bare-bones operating system. Fsys (the file-system manager) and Dev (device manager) can be added to for more robustness. Like other QNX processes, device drivers run in user space, but use a specific API to enable them to access a kernel-interrupt vector.

The networking manager is an optional component, tied directly into the microkernel. There is a private interface between the kernel and the network manager, so that any messages passed from a local to a remote process are queued to the network manager. Net manages the sending and receiving of messages, essentially merging microkernels on different nodes into a single, virtual microkernel.

The message-passing architecture, combined with networking services, produces a seamless, distributed system. From the standpoint of user processes, there is no difference between a local call and a call across the network. Likewise, all services above the microkernel are transparently accessible to all processes, whether or not they are local. For data acquisition, QNX can use a private connection between microkernels on a network. This lets you mirror a data-acquisition process without generating traffic on a network being used for other activities.

QNX can be extended. New modules can be developed in user space and debugged at the source level while still providing services normally associated with the kernel. QNX claims that customized OS services can be easily developed by application programmers. Because of the small number of APIs in the kernel and the limited number of APIs in the other QNX-provided components, the QNX learning curve isn't as difficult as with UNIX.

The QNX microkernel consists of 605 lines of source code. A complete implementation of all of the services necessary to implement process management, device management, the file system, and networking, is under 16,000 lines. QNX also conforms to POSIX 1003.1, 1003.2 (shell and utilities), and 1003.4 (real time). With POSIX compliance and a similar command-line interface, is it possible to use QNX in place of UNIX? From my own experiments, the answer appears to be yes, at least in some circumstances. QNX Software is not positioning QNX as a general-

purpose operating system, but there's no reason why it can't be used for almost any purpose.

--P.D.V.

An Interview with Linus Torvalds, Creator of Linux

Sing Li

Sing, a products architect with microWonders in Toronto, specializes in embedded-systems development, GUI portability, UNIX system programming, and device drivers. You can contact him on CompuServe at 70214,3466.


Linus Torvalds is a student at the University of Helsinki (Finland) working towards a masters degree in computer science. In 1990, he took an operating-systems course on UNIX and C and became hooked on OS design. Linus wanted to make his 386 PC function like the Sun workstations at the university. What started out as a protected-mode utility posted on the Internet, eventually resulted in Linux, a widely popular 32-bit, protected-mode, preemptive multitasking operating system that runs on 386 PCs.

The Linux project now involves hundreds of programmers worldwide. It is available at ftp sites around the world, the most popular distributions being the MCC (Manchester Computer Center) in England and SLC (Softlanding Software) in Canada. The full distribution consists of kernel sources, C, C++, man pages, basic utilities, networking support, X Windows, XView/OpenLook, DOS emulators, and much more. A comprehensive list of Linux distribution sites for downloading, as well as related information, is available electronically (see "Availability," page 3).

Linux supports an unlimited number of concurrent users. Each application runs in its protected address space, greatly reducing the chance of system crashes brought on by ill-behaved applications. Applications on Linux can make use of either static or dynamically linked libraries.

Virtual memory is supported through demand paging, and up to a total of 256 Mbytes of usable swap space can be configured. Executables are demand loaded, this ensures efficiency in memory usage as well as better system performance. The memory manager supports shared executable pages with copy-on-write. There is a common memory-cache pool for both system and application use which ensures that memory is best utilized wherever it is needed.

Kernel support of networking is included for TCP/IP, both over standard Ethernet hardware and over asynchronous lines (via SLIP, serial-line Internet protocol). The operating system supports various national or customized keyboards. The PC console can act as multiple virtual terminals under Linux, using hot-key switching. Each virtual terminal acts independently and can be in either graphic or character mode.

I recently linked up with Linus over the Internet and asked him about the history (and future) of Linux.

SL: What was your motivation behind building Linux?

LT: I bought my first PC clone in early '91, and while I didn't want to run MS-DOS on it, I couldn't afford a real OS for it either. I ended up buying Minix, which I knew of from an OS course, and while it wasn't really what I hoped for, I still had a reasonable UNIX clone on my desk.

Linux didn't start out to be an operating system: I just played around with the hardware to learn about the new machine, and found the memory management and process switching of the 386 especially interesting. After tinkering a few months, my small project eventually became something that looked more and more like an OS. So I decided I wanted to create something that I could use instead of [running] Minix on my machine.

When I decided to create my own OS, compatibility became a major factor. I wanted to write just the OS: I didn't want to rewrite every program under the sun. That is still very much true, and Linux seems to be one of the easier UNIXs to port things to--it's a good mix of POSIX/SysV/BSD/SunOS4. My search for the POSIX documentation also got me in touch with arl, who was later to create the Linux directory on nic.funet.fi, the site where I still release my kernels.

SL: With commercial flavors of UNIX, standards are a hotly debated topic. What's your viewpoint on standards compliance?

LT: Simple adherence to standards isn't the Linux way. I (and others) have tried to make the system as usable as possible, and added some features just because they were interesting. I've strived for a simple and clean design within those constraints--at least as long as it's efficient. (I hate inefficient code and still fall back to checking the compiler output every now and then.)

SL: Since Linux seems to run almost everything that plain-vanilla UNIX will, exactly how different is the internal architecture between commercial UNIX and Linux?

LT: Well, the basic design has similarities: The kernel is monolithic, and processes aren't forcibly preempted while in kernel mode. So the architecture per se doesn't necessarily differ too wildly, but the actual code is likely to be rather different.

SL: Tell us about your programming style when dealing with developing a multitasking OS which runs a wide variety of software on a variety of hardware configurations.

LT: I try to avoid subtle code: If it isn't obvious what a routine does, it's likely to be buggy (or become so after a few changes). The way the scheduling works is rather hard to follow at times, and some of the file-system checks can seem incomprehensible unless you know what is happening. (I dislike locking, so the file-system code has to be very careful in order to avoid race conditions.) One of my personal favorites may be the select() code, which is definitely not obvious, but avoids races in interesting ways.

One of the most challenging aspects has been the wide variety of PC hardware: Drivers which work on most machines can fail subtly on others. Linux has good support for different kinds of hardware, but it has in some cases been a real trial to get it all to work, and there are still occasionally reports of machines that simply don't work correctly with Linux. It can be rather frustrating at times.

SL: What's in the future for Linux?

LT: I expect to continue working on it the same way I have so far: no real long-term planning, only a general idea about what I want to have. I, personally, have been handling only the actual kernel for a long time now, and I expect to continue with that: I hope others will find interesting projects in Linux (both in the kernel and in user space), as they have so far. I hope the Windows-emulation project will work out, along with the iBCS2 ("real i386 unix" binary compatibility) project: Those will open up new user areas when they arrive.

Figure 1: The QNX microkernel.

A Conversation with E. Douglas Jensen

Michael Floyd

Doug Jensen, technical director for real-time computer systems at Digital Equipment Corporation (DEC), has had a long career developing real-time systems. While an associate professor at Carnegie-Mellon University (CMU), Jensen developed the notion of a decentralized OS and created the Alpha OS kernel. Jensen's technology is now incorporated in DEC's Libra OS kernel. I recently spoke with Jensen about the use of microkernel technology in real-time operating environments, its benefits, and its future.

DDJ: How much of your work on the Alpha OS kernel is embodied in the Libra OS architecture?

EDJ: The Libra OS architecture embodies my understanding and experience from over 27 years of research and advanced-technology development in real-time computers and operating systems. My Alpha OS kernel at CMU is one of the primary intellectual progenitors of the Libra OS. Another is the Mach 3 kernel, which forms the commercial and standards context for the Alpha and other new real-time OS technologies in Libra. The concepts of distributed threads, time-value functions, and best-effort scheduling are based directly on extensions of Alpha kernel functionality.

DDJ: You say you've created a new paradigm for resource management in real-time systems. Describe this paradigm and tell us its relevance to other microkernels.

EDJ: The Libra OS architecture reflects the important expansion of real-time computing from its roots in small scale, centralized, low-level, sampled-data subsystems. Many real-time computing systems are becoming more complex and decentralized as they move up in the application-control hierarchy. But most traditional real-time concepts and techniques don't scale up. These small-scale ideas include hard deadlines as the only kind of computation-completion timeliness constraint, the requirement for application programmers to somehow map computation-completion time constraints onto fixed priorities, and the limitation of the real-time OS's responsibility for computational timeliness to starting the highest-priority computation as quickly as possible. These notions all are based on the pretense that a system can be deterministic, which is an oversimplification that usually works adequately in small scale but not in the large--much as Newton's "law" of gravity was revealed by relativistic physics to be a small-scale simplification of space-time curvature.

Libra's real-time paradigm is a generalization of the traditional concepts and techniques which allows the domain of real-time computing to encompass larger-scale, more-dynamic, more-decentralized applications. For example, time constraints can be expressed in terms of the benefit a computation provides, as a function of the time that computation completes execution. Libra OSs accept responsibility for adaptively managing resources according to those time constraints, to attain the most-optimal system timeliness possible under the current conditions. And Libra does this on an end-to-end basis across physically dispersed computing nodes. In contrast, commercial real-time OS and executive products are centralized--"distributed real-time" systems are actually non-real-time networks of centralized real-time nodes, without OS-enforced end-to-end timeliness.

DDJ: What benefits do microkernels currently offer, and how will they evolve over, say, the next six years?

EDJ: The real-time application domain implies that it is no longer possible for one or two kinds of real-time OSs--a small real-time executive and a full-function real-time UNIX, for instance--to meet user needs. It even appears that a general-purpose, real-time distributed OS may be theoretically impossible. The only feasible solution may be for real-time applications will facilitate this structure. The classical layered organization of OSs and system and application software will relax to more of a "depends on" hierarchy of distributed objects. A modular OS is more than an unconstrained collection of building blocks--for manageability, it requires an OS architecture specification which all these different configurations comply with. First-generation microkernels exist today, but this kind of modular OS--real-time or not--is still in the research stage.


Copyright © 1994, Dr. Dobb's Journal