GETTING A HANDLE ON VIRTUAL MEMORY

Virtual memory management should be supported by the compiler

Walter Bright

Walter is the director of the compiler development division at Zortech Ltd. He has a degree from Cal Tech and can be reached at 4819 118th Ave. N.E., Kirkland, WA 98033.


Programmers are well aware that there are serious memory limitations when programming under MS-DOS on the PC. These problems are variously called "ram cram," "the 640K barrier," "brain-damaged 8086" -- and occasionally even more colorful terms. There are several different strategies for dealing with this memory limitation problem, including the implementation of virtual memory managers.

Most existing software-based virtual memory managers are clumsy to use: They're prone to obscure, and severe bugs that can make regular pointers in C look simple. Furthermore, the conventions of these packages had to be rigorously adhered to. Even if the program using a virtual memory manager finally got debugged, the results were inefficient and the syntax was aesthetically ugly. Ugly syntax is a sure sign of a poor solution.

A better solution to virtual memory is to have the compiler directly support it. A C or C++ compiler could support a new pointer type called a "handle." The syntax for accessing memory referenced by the handle would then be taken care of by the compiler. This article describes the implementation of handles for expanded memory under MS-DOS using Zortech C/C++.

Handle Pointers

The handle refers to dynamically allocated data, serving the same general purpose as a regular pointer. The difference is that while data pointed to by a regular pointer is allocated on the heap, the data pointed to by a handle can reside in expanded memory, extended memory, or a disk sector.

To actually refer to the data, the handle must be converted into a pointer by a function. The function extracts a logical page number and an offset from the handle; the logical page is then swapped into a physical page, and the offset is added to the physical page address. The result is returned as a pointer. This process simulates virtual memory swapping in software.

For handles to be useful, they must adhere to the following criteria:

How will handles look in source form? With PC C compilers, it is already common to support multiple pointer types with the syntax shown in Figure 1.

Figure 1: Typical pointer types used in most PC implementations of C.

  void *p;         /* pointer type is default for the memory model */
  char far *po;    /* far (segment and offset pair) pointer        */
  Int near *p;     /* near (offset only, segment is assumed)       */
Supporting handles in the same style, then, suggests the following syntax:
  long _handle *h; /*h is a handle to a long */

The keyword _handle was chosen because it is compatible with name space rules for extensions to ANSI C. The underscore convention is used because, as it turns out, the identifier handle is used by a lot of existing code. Finally, the keyword virtual would conflict with C++ use of that keyword.

Implementation

A handle is a 32-bit type. The high 16 bits refer to the page, in a manner defined by the library implementation. The low 16-bits form the offset into that page. Obviously, this restricts the page size to less than 64K in length. Handles are unique; no two handles can refer to the same location in handle space.

Pointer arithmetic with handles follows the same rules and behaviors as does far pointer arithmetic (that is, only the 16-bit offset is manipulated). Comparisons are also handled the same way as for far pointers, that is, for < <= >= and > the comparison is done for the 16-bit offset only. For = = and !=, comparison involves the full 32-bit value.

The high 16 bits of a handle refer to the page where the data is stored. Because it's desirable to avoid a performance penalty if expanded memory is absent, the format of the page reference is a bit tricky. If expanded memory is not present, handles will point into real memory. The best way to do that is with a far pointer, the handle format must be able to distinguish a handle from a regular far pointer.

The 8086 has a 20-bit address space, of which the high 16 bits form the segment. All possible 16-bit segment values are valid segment values, so there isn't a simple way to distinguish a handle from a far pointer. On the PC, the ROM BIOS occupies the high end of the address space. Programs almost never access that part of the address space (and if they do, they can use a regular far pointer). Valid handle values are only manipulated by library functions, and are only created for dynamically allocated data.

Segment values from 0xFE00 to 0xFFFF can therefore be defined as handles. (This is controlled and is adjustable by the library portion of the handle support.) This yields 256 pages. Because this is the number of pages that will be allocated in expanded memory and each expanded memory page has 16K bytes in it, 256 pages make for s total of 4 Mbytes of handle memory space This ought to be sufficient space for most applications, and you can extend the range of handles to more than 0xFE00 to 0xFFFF.

It's the library implementation's job to distinguish a handle from a far pointer. The segment portion is compared against 0xFE00. If the 16-bit segment is less than 0xFE00, the handle is a regular far pointer. Otherwise the upper 16 bits of the handle refer to logical page number (segment-0xFF00).

Conversions from handles to far pointers occur whenever a handle pointer is dereferenced or when a handle is cast to a far pointer. The conversion process is done by a library routine that swaps the logical page into the address space and returns a far pointer into it Figure 2 shows examples that do handle conversions to far pointers. Conversions from far pointers to handles are just a type of point that is, no change in the bit pattern.

Figure 2: Converting pointers from handle to far

  int handle *h;
  struct A_handle *h2;
  int far *t;
  int i;
  extern void func(int far *pi);

  f = h;
  *h - i;
  h[3] = *t;
  i = *(h + 6);
  h2->b = i;
  func(h);
  h = (int far *) h;

The optimizer is aware that handles are a special type. Its main job is to determine when a new handle dereference is necessary and when a previous conversion can be used instead. Consider the example in Figure 3. Clearly, h only needs to be converted to a far pointer once. It converts the code to that shown in the second portion of Figure 3.

Figure 3: Example showing that the optimizer is handle aware

        struct {int a,b;} handle *h;
        h->a = 1;
        h->b = 2;
  /* Converted code */
        struct {int a,b;} _handle *h, far "p;
        p = h;
        p->=a=1;
        p->=b=2;

The result of a conversion in Figure 3 cannot be used if:

    1. The value of the handle might have changed.

    2. A handle dereference was done on another handle (thus possibly over writing the previous page with an other).

    3. A function was called (because that function may convert other handles, resulting in case 2).

Because handle memory is larger than physical memory, pages from handle space are swapped into buffers in physical space. Each handle conversion may result in a new page from handle space overwriting a previous page. Invalidating any pointers into that page. This explains the reasoning behind assuming that any handle conversion invalidates any previous conversions.

Of course, you can always convert a handle to a pointer yourself. If you know that a function call does not convert any handles, then the conversion is still valid after the function call. Also, if the library implementation of handle conversions uses more than one physical page (the expanded memory version described shortly uses four), you can rely on at least that many conversions being valid simultaneously.

Handles and Expanded Memory

Let's look at how Zortech C and C++ implement handles in turns out that expanded memory is particularly well suited for implementing handles. It is fast, swapping a logical page into a physical page usually requires a simple write into an I/O register. Expanded memory can be efficiently emulated on 386 machines by using the memory mapping lectures of the CPU. (QEM 386 from Quarter Deck is an example of this type of utility.) There are even emulators that fake expanded memory by using extended memory (memory above 1 Mbyte) or even a hard disk. A final impetus for using expanded memory is that it is part of DOS 4.0.

The expanded memory concept is that one or more logical pages of bank-switched memory are mapped into real memory (physical pages) as needed. This corresponds directly to the handle concept of a large amount of memory (handle space) of which a subset is mapped into physical memory buffers. The mapping of a logical page to a physical page occurs when a handle is dereferenced.

Expanded memory makes it possible to have two physical pages actually map onto the same logical page. In other words, expanded memory allows two different addresses to refer to the same memory location! This implementation of handles carefully avoids depending on this "feature," as that capability is impossible to emulate with disk or extended memory implementations of expanded memory.

Initialization is handled automatically by the C run-time startup, so no extra work is necessary. Termination is a bit trickier. When a program terminates and returns to DOS, DOS automatically frees up any memory used by the program and makes it available for the next one. Unfortunately, this does not happen with expanded memory. A program must always explicitly free its expanded memory pages or they will be unavailable for use by other programs until the machine is rebooted. All exit paths from the program must be covered. Even Ctrl-break must be intercepted.

Storage allocation in handle space is done by handle_malloc( ), handle_realloc( ), and related functions. Each 16K page is converted into a heap, with the usual free list data structure. There is a special data structure in page 0 that contains the size of the maximum available free block in each page, so the storage allocator can directly swap in the page it needs to allocate from, instead of sequentially swapping in each page until the allocation succeeds. This is of critical importance if a disk-swapping expanded memory emulator is used! If there is insufficient free expanded memory space to satisfy an allocation request, or if the size of a request exceeds 16K, the routines fall back to allocating from conventional DOS memory.

The Expanded Memory 3.2 subset of EM 4.0 is used to provide maximum portability; in fact, the more exotic features of 4.0 aren't needed for handle support. It's possible to use expanded memory in addition to using handles, though the two uses should be kept independent to avoid conflicts.

When to Use Handles

The overhead of dereferencing handles is much higher than the overhead of dereferencing pointers, both in program size and speed. Therefore, the best candidates for handles are those data structures that are infrequently accessed both in the number of times the dereference statically occurs in the program and in the number of times the dereferences are executed.

Handles give a sharp increase in the amount of memory available. The proper data structure for something that will reside in handle space is one that favors speed over memory compactness. Locality of data is also important (locality means that related data should be clustered into the same page, increasing the likelihood that the desired page is already swapped in). For example, a bubble sort across data in handle space is to be avoided. If your expanded memory implementation swaps to disk, the swap file on disk may get read and written in its entirety several times during such a sort!

If database-wide searches are necessary, try putting the access structure in conventional memory and put the leaves in handles. This makes the lookup fast and efficient, and the end data is paged in only when it is needed.

Listing One (page 110) shows a version of the classic Unix wc (word count) program, converted from using the standard pointer and dynamic storage allocation to using handles and handle storage allocation. The primary advantage of using handles in a program such as wc is that they increase the program's capacity without a significant performance penalty. The handle version of wc is just as fast as the original version, which used regular pointers. Files with several megabytes of text can be processed by the handle version (assuming that expanded memory is available).

Portability is achieved simply: The_handle keyword is #defined to nothing, so all handles revert to being regular pointers. Second, the handle storage allocation routines can be #defined to be the standard malloc, free, and so on, routines. Thus, handle-specific code can quickly be made portable to other compilers and environments. The Zortech compiler provides a simple mechanism for eliminating handles from a program: Simply add the following line to any program before it #includes handle.h: #define NO_HANDLE 1

Debugging

Debugging C applications that use a lot of dynamically allocated memory is notoriously difficult. Experienced C programmers have reluctantly learned to live with this. Unfortunately, handle-based applications have all the liabilities of pointers, plus a few more. The worst problem is when a stray pointer places unwanted data into a page that has been swapped out. Such a bug will only rarely exhibit symptoms, and therefore can be difficult to pin down. Another problem occurs when more than four dereferenced handles are used simultaneously. The bug will only show up if they happen to all fall on different logical pages. Here are some techniques you can use to deal with these problems:

When converting code from using malloc to handle_malloc, watch out for converting from: p = (char *) malloc(n); to: h = (char *) handle_malloc(n); instead of the correct: h = (char _handle *) handle_malloc(n); The cast to (char *) dereferenced the handle and stored a far pointer into h instead of the handle!

Other Uses

Handle space could be a disk file. The disk file is divided up into blocks of equal size (16K seems a good number). These are paged in and out of a small number of physical buffers (>=2). An advantage of a disk file implementation is that the limit on the size of handle space is equal to the size of the disk. Also, the data can persist from one invocation of the program to another. Disk paging would, however, be rather slow.

Extended memory can be used for handles space. Extended memory is the memory above 1 Mbyte on 286- and 386-based computers. It is only accessible via protected mode (DOS runs in real mode). To use extended memory, the CPU must switch into protected mode, copy the extended memory pages to and from real-mode buffers, and switch back to real mode. It's not as fast as expanded memory hardware, but it's much faster than paging to disk.

The minimum number of simultaneous accesses is determined by the number of physical pages available simultaneously. With expanded memory, this is four. It is recommended that at least two be available, otherwise functions such as memcpy( ) could not be used to directly copy from one handle to another; instead, a temporary buffer in real memory would have to be used. Programs written for Microsoft Windows use a handle-like storage allocation scheme. The library implementation of the handles could be written as a shell around the Windows functions, thus easing one of the more frustrating things about programming for Windows.

Handles in VM Operating Systems

Handle pointers are useful for other applications besides extending the memory space available. Under virtual memory operating systems, such as Unix or OS/2, there is no need for more memory. There is also no hardware support for bank-switched memory like expanded memory.

Alas, one fundamental problem always remains. There is usually a difference in how a data structure is stored in memory and how it is stored on disk. Pieces of a data structure in memory are typically connected to one another by pointers. Memory pointers written to disk have no meaning when read back off a disk, so the data structure must be translated when written to disk. The memory pointers are converted into symbolic references when written out; when the data structure is read back from the disk, the symbolic references are converted back to real pointers.

Handle pointers can solve this problem. Recall that a handle consists of a logical page number and an offset into that page. The logical pages can simply be represented by disk blocks. The library implementation of handle pointers would be rewritten to read and write pages from a specified disk file rather than bank-switching expanded memory. The in-memory data structure is exactly the same as the disk file structure. Writing out the data file becomes a call to a handle function that flushes any in-memory pages to disk. Reading in the data structure becomes the simple task specifying which file to use! Dereferencing a handle will cause its page to be read in from disk. As an additional benefit, disk I/O will be minimized. Two major applications that might benefit from this approach are databases and applications that consist of multiple programs that share data via files.

Wrapping Up

Handle pointers are an elegant solution to an entire class of programming problems. The first (and what the Zortech implementation initially addresses) is extending the address space available. The second is in simplifying and speeding up writing a data structure to disk and reading it back again. The third is in defining a file format by which data can be easily read and written to files accessed by multiple programs.

_GETTING A HANDLE ON VIRTUAL MEMORY_ by Walter Bright [LISTING ONE]



/* Compile with: ZTC wc -ml                                */
/* (Use large model so strcmp() can handle far pointers.)  */

#include        <stdio.h>
#include        <stdlib.h>
#include        <string.h>
#include        <ctype.h>
#include        <handle.h>

struct tree
    {
        char __handle *word;
        int count;
        struct tree __handle *left;
        struct tree __handle *right;
    };
int readword(char *w, int nbytes)
    {
    int c;
    do  {
        c = getchar();
                if (c == EOF)
                        return 0;
        }
    while (!isalpha(c));
    do  {
        if (nbytes > 1)
            {
            *w++ = c;
                        nbytes--;
            }
        c = getchar();
        }
    while (isalnum(c));
        *w = 0;
        return 1;
    }
void tree_insert(struct tree __handle * __handle *pt, char *w)
    {
        int cmp;
        struct tree __handle *p;
        while ((p = *pt) != NULL)
        {
        if ((cmp = strcmp(w,p->word)) == 0)
            goto gotit;
                pt = (cmp < 0) ? &p->left : &p->right;
        }
    p = (struct tree __handle *) handle_calloc(sizeof(struct tree));
        if (!p || (p->word = handle_strdup(w)) == NULL)
        {
        printf("Out of memory\n");
                exit(EXIT_FAILURE);
        }
    *pt = p;
    gotit:
        p->count++;
    }
tree_print(struct tree __handle *p)
    {
        while (p)
        {
        tree_print(p->left);
        printf("%5d %s\n",p->count,(char far *)p->word);
                p = p->right;
        }
    }
tree_free(struct tree __handle *p)
    {
    struct tree __handle *pn;
        while (p)
        {
        handle_free(p->word);
                tree_free(p->left);
                pn = p->right;
                handle_free(p);
                p = pn;
        }
    }
main()
    {
        struct tree __handle *root = NULL;
        char word[32];
        while (readword(word,sizeof(word)))
        tree_insert(&root, word);
    tree_print(root);
    tree_free(root);
        return EXIT_SUCCESS;
}


[Figure 1: Typical pointer types used in most PC implementations of C]


void *p;        /* pointer type is default for the memory model */
char far *pc;   /* far (segment and offset pair) pointer        */
int near *p;    /* near (offset only, segment is assumed)       */

[Figure 2: Converting pointers from handle to far]


R 65,T 5
        int __handle *h;
        struct A __handle *h2;
        int far *f;
        int i;
        extern void func(int far *pi);

        f = h;
        *h = i;
        h[3] = *f;
        i = *(h + 6);
        h2->b = i;
        func(h);
        h = (int far *) h;




[Figure 3: Example showing that the optimizer is handle aware]

   struct { int a,b; } __handle *h;
   h->a = 1;
   h->b = 2;
/* Converted code */
        struct { int a,b; } __handle *h, far *p;
        p = h;
        p->a = 1;
    p->b = 2;