Ray Duncan is a software developer for Laboratory Microsystems. You can reach him at 12555 W. Jefferson Blvd, Ste. 202, Los Angeles, CA 90066.
Regardless of whatever other nits you might pick with OS/2, you could never complain that is array of interprocess communications (IPC) facilities is too austere. In fact, OS/2's IPC support could aptly be termed an "embarrassment of riches" (the title of a recent book, by the way, which has nothing whatsoever to do with protected mode programming). Browsing through the reference manual, one gets the distinct impression that the IBM/Microsoft IPC Design Subcommittee couldn't agree on anything, so they threw in everything! OS/2 offers all of the following classic IPC mechanisms: semaphores, pipes, shared memory, queues, and signals. When the LAN Manager is running, OS/2 also supports an IPC mechanism called mailslots (which will not be mentioned further in this article).
Regrettably, while the IBM and Microsoft manuals are passably complete on the "how to" for each individual IPC method, they are remarkably stingy with the "which to," "why to," and "when to." In this article, I'll try to provide you with a somewhat more cosmic overview of OS/2 IPC, including some ballpark comparisons of capability, performance, and throughput.
Most of the IPC mechanisms listed above rely on named, global objects or data structures that are controlled and maintained by the operating system. The names are said to be in the "file system name space" that is, they have the general format of filenames with the same elements and delimiters, and are subject to the same constraints on length and valid characters. The names of IPC objects are distinguished from the names of true files by a reserved path (such as \ SHAREMEM \, \ PIPE \, \ QUEUES \).
The resemblance between IPC objects and files does not end with their naming. To gain access to most types of IPC objects, a program must first "open" or "create" the object in a manner analogous to opening or creating a file. OS/2 then returns a token (a selector or an arbitrary "handle"), which the process uses to manipulate the IPC object: reading, writing, querying the number of waiting messages, and so on.
In order to understand OS/2 IPC, it's also crucial that you grasp two essential OS/2 terms: processes and threads. In its simplest form, a process is conceptually equivalent to a program loaded for execution under MS-DOS. OS/2 creates a process by allocating memory to hold its code, data, and stack, and by initializing the memory from the contents of a program (.EXE) file. Once it is running, a process can obtain additional resources --such as memory and access to files -- with appropriate system function calls.
The OS/2 module which oversees multitasking, however -- the scheduler -- cares nothing for processes; it deals with entities called threads. A thread consists of a set of register contents, a stack, an execution point, a priority, and a state: executing, ready to execute, or waiting for some event ("blocking"). Each process starts life with a primary thread, whose execution begins at the entry point designated in the .EXE file header, but that thread can start other threads within the same process all of which execute asynchronously and share ownership of the process's resources.
Here's why the distinction between processes and threads is important when discussing IPC. When a process opens or creates a semaphore, pipe, queue, or shared memory segment, OS/2 returns a handle that can be used by any thread within that process. But when a thread issues an OS/2 function call that blocks on (waits for) an IPC event, such as the clearing of a semaphore or the availability of a queue message -- the other threads in the same process continue to run unhindered.
Semaphores are simple IPC objects with two states. These two states can, in turn, be viewed in two different ways, depending on how a semaphore is being used. When a semaphore is being used for signalling between threads or processes, it is said to be either "set" or "clear." Typically, one thread sets the semaphore, and then clears it upon the occurrence of some event; other threads, which wish to be notified of the same event, "block on" the semaphore by issuing a "semaphore wait" function call that does not complete until either the semaphore is cleared or a designated timeout interval has elapsed.
When a semaphore is being used for mutual exclusion, it is said to be either "owned" or "available." In this model, the semaphore symbolizes a resource (such as a file or a data structure) that would be corrupted if it was manipulated by more than one thread or process at a time. To prevent such damage, threads or processes cooperate by refraining from accessing the resource unless they have acquired ownership of the corresponding semaphore with an OS/2 function call.
Aside from the two ways in which they may be used, OS/2 semaphores come in three flavors: system semaphores, RAM semaphores, and Fast-Safe RAM semaphores. System semaphores are named, global objects, which reside outside every process's memory space and are completely under the control of the operating system. They must be "opened" or "created" with a system call before they can be used. System semaphores support "counting;" that is, a process can make "nested" requests for ownership of the semaphore, and the semaphore will not become available again until a corresponding number of "release" calls have been issued. OS/2 also provides cleanup support for system semaphores; if a process dies owning a semaphore that another process is waiting for, that other process will be notified with a unique error code.
RAM semaphores, on the other hand, reside in memory controlled by a process. They consist of any arbitrary, but properly initialized doubleword of memory in the application's address space, and the "handle" for a RAM semaphore is just its address (selector and offset) --no "open" or "create" operation is required. The number of RAM semaphores that a process may use is limited only by the amount of virtual memory it can allocate. RAM semaphores are used to communicate between threads, but since memory segments can be shared, they can also be used to communicate between processes. In the latter case, OS/2 does not provide any assistance if a process dies owning a RAM semaphore and another process is waiting for the same semaphore.
The so-called Fast-Safe RAM semaphores, which were added to OS/2 in Version 1.1, combine characteristics of both system semaphores and RAM semaphores. They are implemented as 14-byte structures in a process's own memory space; so, like plain vanilla RAM semaphores, the number of Fast-Safe RAM semaphores that a process can use is huge. Like system semaphores, Fast-Safe RAM semaphores support "counting" and are also endowed with a certain amount of clean-up assistance by the operating system. Unfortunately, Fast-Safe RAM semaphores must be manipulated with special-purpose function calls --the general purpose set, request, wait, and clear functions employed for both system and RAM semaphores cannot be used --and they support only the "owned/available" model for mutual exclusion.
Pipes, which were first popularized under Unix, are basically conduits for byte streams. In OS/2, processes refer to pipes with handles that are allocated out of the same sequence as file handles, and they read and write pipes with the same function calls as are used for files. The transfer of information through a pipe is much faster than it would be through an intermediary file, however, because the ring buffer that implements a pipe is always kept resident in memory.
OS/2, Version 1.1, supports two different species of pipes: anonymous pipes and named pipes. When a process creates an anonymous pipe, no global name is involved; the system merely returns read and write handles. These handles can be inherited by child processes, which is what enables anonymous pipes to be used for IPC. However, because a child has no way to predict what handle should be used for what, a common practice is for a parent process to redirect the child's standard input and standard output handles to pipe handles, so that the child unknowingly communicates with the parent rather than with the keyboard and display. The corollary handicap of anonymous pipes is that processes which are not direct descendants of a pipe's creator cannot inherit handles for the pipe and thus have no way to access it.
Named pipes, on the other hand, are global objects, and any process --related or unrelated to the pipe's creator --can open the pipe by name to obtain handles for reading and writing. Another important feature of named pipes is that they can be used in either byte stream mode or message mode. In byte stream mode, a named pipe behaves like an anonymous pipe --the exact number of bytes requested is always read or written. In message mode, a named pipe acts more like a first-in-first-out (FIFO) queue: the length of each message written into the pipe is encoded in the pipe, and a read operation returns at most one message at a time regardless of the number of bytes requested. Last but not least, named pipes can be used to communicate between processes running on two different nodes of a network, simply by prefixing the name of the pipe with the name of the target machine.
Shared memory segments are potentially the most efficient of all OS/2's IPC mechanisms. If two or more processes have addressability for the same segment, they can theoretically pass data back and forth at speeds limited only by the CPU's ability to copy bytes from one place to another, with no need for additional calls to the operating system. Of course, the threads and processes using a shared segment are responsible for synchronizing any changes to the segment's content, and this synchronization is often most convenient to accomplish with semaphores (requiring system calls after all).
OS/2 supports two distinct methods by which processes can share memory: creation of named segments and giving and getting of selectors for anonymous segments. Each method offers different advantages for security and speed of access. Named segments are restricted to a maximum size of 64K bytes; once a named segment is created, any process which knows the name of the segment can "open" it to obtain a selector with which it can read or write the segment. The segment persists until all the processes, which have valid selectors for the segment have either released the selector or have terminated.
Anonymous segments, on the other hand, can be any size at all (huge segments, consisting of logically contiguous 64-bytes segments, can be as large as available virtual memory), but sharing is more difficult to arrange. The selectors for such shared segments must be explicitly made addressable for each process that needs them, and passed between the processes by some other means of IPC. One technique, called segment giving, requires the process that created a segment to request an additional selector for use by a specific other process then send the selector to that process.
The other technique, segment getting, requires the creating process to pass its own selector for the segment to the other process by some IPC mechanism. The other process then gains addressability to the shared segment by issuing a function call that makes the selector valid. Segment getting allows far pointers to be passed around freely, but it is correspondingly less secure than the use of giveable selectors.
Queues are the most powerful IPC mechanism in OS/2, and inevitably are also the most complex to use. Queues are named global objects, and any process which knows a queue's name can "open" it and write records into it, although only the process which created the queue can read messages from it or destroy it.
In essence, an OS/2 queue is an ordered list of shared memory segments; the operating system maintains and searches the list on behalf of the communicating processes. Data in the queue is not copied from place to place, instead, pointers are passed from the queue writer to the queue reader (the operating system also provides the queue reader with supplementary information such as the process ID of the queue writer). The items in a queue can be ordered in several different ways: first-in-first-out (FIFO), last-in-first-out (LIFO), or by a priority in the range 0 through 15. Moreover, the queue reader has the freedom to inspect and remove queue messages in any arbitrary order, if it needs to.
Writing a message into a queue is a relatively complicated process. First, the queue writer must allocate a "giveable" memory segment and build the queue message in it. Next, the writer must obtain a giveable selector for the segment that is valid for the queue reader. Finally, the writer must request the queue write, passing the giveaway selector, and release its own original selector for the segment. Thus, a minimum of four system calls are typically required at the queue writer's end for each queue transaction. At the queue reader's end, luckily, only two system calls are usually required: one to read the message (obtain a pointer to the message and its length), and one to release the selector for the segment containing the message after it has been processed.
Signals, which (like pipes) have their conceptual origin in Unix, are analogous to a hardware interrupt. They are unique among OS/2's IPC mechanisms in that the time of a signal's arrival is not completely under the control of the receiving process. OS/2 supports two classes of signals. The first class, which consists of signals generated by the operating system, includes the following:
SIGINTR a Ctrl-C was detected SIGBREAK a Ctrl-Break was detected SIGTERM the process is being terminated SIGBROKENPIPE a pipe read or write failed
Signals in the second class are explicitly sent by one process to another. These are known as event flags, and three types are available (each of which may have a distinct handler): Flags A, B, and C. Event flag signals may be accompanied by an arbitrary word (16 bits) of data.
For each signal type, a process may either register its own handler, instruct the system to ignore the signal, or allow the system's default handler to take its usual action. If a particular signal occurs and the process has previously indicated its desire to service that signal type, the primary thread of the process is transferred forcibly to the routine designated as the signal handler. When the handler completes its processing control is restored to the point of interruption.
The system's default handling of the different signal types varies. SIGTERM terminates the target process. SIGINTR and SIGBREAK are fielded by the ancestor process which has registered an appropriate handler; if this ancestor is CMD.EXE or the Presentation Manager shell, SIGBREAK and SIGINTR are translated to SIGTERM. SIGBROKENPIPE and the Event Flag signals, on the other hand, are by default discarded.
From the preceding discussion and the summary in Table 1, it is clear that the characteristics of OS/2's various IPC facilities vary drastically. Yet, at least several of them can be made to do essentially the same job. How does one assess their relative performance and suitability for a specific application? The OS/2 documentation gives little guidance here, except to note in passing that RAM semaphores are faster than system semaphores, semaphores in general are faster than everything else, and pipes are faster than queues.
IPC Global Name Resident/ Maximum
Mechanism Form Swappable Data Held
---------------------------------------------------------------------------
RAM Semaphore not Swappable set/clear or
applicable owned/available
Fast-Safe not Swappable owned/available
RAM Semaphore applicable
System \SEM\name Resident set/clear or
Semaphore owned/available
Anonymous not Resident 64 Kbyte
Pipe applicable
Named Pipe \PIPE\name Resident 64 Kbyte
Anonymous not Swappable limited only by
Shared Memory applicable virtual memory
Named Shared \SHAREMEM\name Swappable 64 Kbyte per
Memory named segment
Queue \QUEUES\name Swappable limited only by
virtual memory
Signal not not 16 bits passed
(Event Flag) applicable applicable with signal
In order to try and get a feel for these issues, I carried out some simple timings on the most commonly used IPC methods, which I will describe shortly. The timings were obtained on a IBM PS/2 Model 80 at 16MHz with 4 Mbytes of RAM, running under IBM's OS/2 Standard Edition, Version 1.1. The relevant CONFIG.SYS parameters were:
BUFFERS=30 BREAK=OFF DISKCACHE=64 IOPL=YES MAXWAIT=3 MEMMAN=SWAP,MOVE PROTECTONLY=NO RMSIZE=640 THREADS=128
The only significant processes that were running during the timings were the Presentation Manager shell and two instances of LMI UR/FORTH in PM windows. I judged the system to be lightly loaded, a conclusion supported by my observation that no swapping occurred during the timings (as evidenced by the fixed disk light) and by the fact that the DosMemAvail function returned the size of the largest block of available physical memory as 1,367,520 bytes.
The programs used to obtain the timings were written in LMI UR/FORTH, my own company's protected mode Forth interpreter/compiler for OS/2. Forth is an ideal language for this sort of system probing because it is fast enough for real-time work, yet it affords interactive, direct access to all operating system functions.
Let's look first at the semaphore family. To appraise the relative speeds of system, RAM, and Fast-Safe RAM semaphores for both the "signalling" and "mutual exclusion" models, I timed 100,000 request release cycles and set/clear cycles for each semaphore type (Table 2). The tare time for the loop was determined by substituting a dummy function for each system call that simply returned a success status; this time was then subtracted from the total before calculating the cycles per second.
Semaphore Type Request/Release Set/Clear
Cycles per Second Cycles per Second
------------------------------------------------------------------
RAM Semaphore 16,507 17,156
Fast-Safe 17,066 not applicable
RAM Semaphore
System Semaphore 7,464 7,532
As you can see from the Table, the difference between the performance of system and RAM semaphores is not nearly as great as you might expect from reading the OS/2 technical manuals. Your selection of system, RAM, or Fast-Safe RAM semaphores should really be made on other grounds. I have already mentioned some of the important differences (counting and cleanup), but there are additional subtle differences that might prove important in a real-life project.
First, the apparent performance advantage of RAM semaphores in a lightly-loaded system cannot be generalized to a heavily-loaded system. System semaphores are implemented in fixed, non-swappable memory owned by the operating system; the access time to a system semaphore will always be consistent. In contrast, RAM semaphores are located in memory owned by a process --which is by default moveable and swappable. If the segment containing a RAM semaphore has been swapped out to disk, a reference to the semaphore could be delayed for an unpredictable length of time (but on the order of tens or even hundreds of milliseconds) until the virtual memory manager can roll the segment back into physical memory.
Another important aspect of system semaphores is that they are implemented in memory below the 640K-byte boundary, so that they can be addressed in either real mode or protected mode. This is vital if you wish to use semaphores to communicate between a closely coupled process and device driver and the driver might need to manipulate the semaphore during service of an hardware interrupt, because the CPU mode at the time of an interrupt cannot be predicted.
Finally, we should note that the location of system semaphores in physical memory severely constrains the number that OS/2 can make available. The memory below the 640K-byte boundary is dear, because it must be conserved for the execution of real-mode programs in the DOS Compatibility Environment. Consequently, the maximum number of system semaphores is 128 in OS/2, Version 1.0, and 256 in OS/2, Version 1.1, and many of these are used up by the operating system itself. If you need large numbers of semaphores in your application, you will have to use RAM or Fast-Safe RAM semaphores and simply work around their other limitations.
As I thought about assessing the relative throughput of message passing using shared memory, pipes, and queues, I realized that simplistic timings of system calls would not be very helpful. The amount of tangential work that is associated with the use of these IPC mechanisms can be fairly extensive (allocating and deallocating memory segments, setting and clearing semaphores to control access to shared segments, copying data to and from local buffers, and so on).
Eventually, I settled upon a timing model which, I think, is at least reasonably parallel to the IPC performed by real applications. I obtained each set of timings by running two processes, a parent and a child. The parent's only function was to launch the child, then serve as a message turnaround point. As the parent received each message from its child via the IPC mechanism under test, it would simply do whatever was necessary to ship the message back to the child again (a more detailed sketch of the timing procedure for each IPC method can be found in Figure 1, Figure 2, and Figure 3). A consistent message size of 512 bytes was used.
The results, which are reported in Table 3, are based on 100,000 message round-trips --from child to parent and back again. The tare times were found and subtracted using equivalent loops where the system calls had been replaced with dummy functions that returned a success status or other reasonable result.
IPC Method Message Round-Trips
Per Second
------------------------------------------------------------
Share Memory 661
Anonymous Pipe 346
Queue 76
IPC performance via shared memory segments, even with the overhead of system, calls to set and clear RAM semaphores that synchronize access to the segments, is seen to be far faster than either pipes or queues. In fact, because processes can easily simulate the behavior of a pipe by explicitly controlling a ring buffer in a shared segment, the use of pipes for any reason other than "transparent" communication with an oblivious child process is probably ill-advised.
Communication by queues turns out, as expected, to be the slowest method. It is an order of magnitude slower than IPC using shared memory, and two orders of magnitude slower than signalling with system semaphores. It seems clear that IPC with queues should be reserved for those occasions where message prioritizing and selective message scanning and extraction are really needed. The complexity of queue manipulation, the number of system calls involved, and the relatively heavy demand for system resources, such as sharable selectors, should deter you from casual use of queues.
As with the semaphores, these comparisons on a lightly-loaded system could turn out quite differently on a heavily-loaded system, where applications have over-committed virtual memory and the virtual memory manager and swapper are constantly busy. Pipe performance should be relatively consistent, because the system buffers used by pipes are not swappable. On the other hand, named shared memory segments, and the giveable shared segments used in queue messages, are swappable, so IPC performance via shared memory or a queue could be quite erratic depending on swapper activity, thread priorities, and so on.
Although OS/2 has gotten off to a slow start, its eventual importance in the desktop computer world can no longer be doubted. I feel strongly that the appearance of the high-performance file system (HPFS) and 80386-specific versions over the next year or so will make it the platform of choice for software developers. Users will migrate more slowly (we have the history of the Macintosh to guide us here), but the benefits of OS/2's multitasking, virtual memory, and graphical user interface will eventually draw them in.
With such a complex system, though, the ad hoc design methods we all used in the CP/M and MS-DOS days will no longer cut the mustard. We need detailed and reliable metrics that can help us make tradeoffs between code size, code complexity, and code performance at every level of an application --in short, we need an understanding of the operating system's overall behavior that has never before been necessary in the microcomputer world. The timings presented in this article are crude and their scope is narrow, but perhaps (with luck) they will inspire successor articles by wiser and more experienced DDJ readers!
Copyright © 1989, Dr. Dobb's Journal