GETTING A HANDLE ON VIRTUAL MEMORY

  void *p;         /* pointer type is default for the memory model */
  char far *po;    /* far (segment and offset pair) pointer        */
  Int near *p;     /* near (offset only, segment is assumed)       */

Supporting handles in the same style, then, suggests the following syntax:

  long _handle *h; /*h is a handle to a long */

The keyword _handle was chosen because it is compatible with name space rules for extensions to ANSI C. The underscore convention is used because, as it turns out, the identifier handle is used by a lot of existing code. Finally, the keyword virtual would conflict with C++ use of that keyword.

Implementation

A handle is a 32-bit type. The high 16 bits refer to the page, in a manner defined by the library implementation. The low 16-bits form the offset into that page. Obviously, this restricts the page size to less than 64K in length. Handles are unique; no two handles can refer to the same location in handle space.

Pointer arithmetic with handles follows the same rules and behaviors as does far pointer arithmetic (that is, only the 16-bit offset is manipulated). Comparisons are also handled the same way as for far pointers, that is, for < <= >= and > the comparison is done for the 16-bit offset only. For = = and !=, comparison involves the full 32-bit value.

The high 16 bits of a handle refer to the page where the data is stored. Because it's desirable to avoid a performance penalty if expanded memory is absent, the format of the page reference is a bit tricky. If expanded memory is not present, handles will point into real memory. The best way to do that is with a far pointer, the handle format must be able to distinguish a handle from a regular far pointer.

The 8086 has a 20-bit address space, of which the high 16 bits form the segment. All possible 16-bit segment values are valid segment values, so there isn't a simple way to distinguish a handle from a far pointer. On the PC, the ROM BIOS occupies the high end of the address space. Programs almost never access that part of the address space (and if they do, they can use a regular far pointer). Valid handle values are only manipulated by library functions, and are only created for dynamically allocated data.

Segment values from 0xFE00 to 0xFFFF can therefore be defined as handles. (This is controlled and is adjustable by the library portion of the handle support.) This yields 256 pages. Because this is the number of pages that will be allocated in expanded memory and each expanded memory page has 16K bytes in it, 256 pages make for s total of 4 Mbytes of handle memory space This ought to be sufficient space for most applications, and you can extend the range of handles to more than 0xFE00 to 0xFFFF.

It's the library implementation's job to distinguish a handle from a far pointer. The segment portion is compared against 0xFE00. If the 16-bit segment is less than 0xFE00, the handle is a regular far pointer. Otherwise the upper 16 bits of the handle refer to logical page number (segment-0xFF00).

Conversions from handles to far pointers occur whenever a handle pointer is dereferenced or when a handle is cast to a far pointer. The conversion process is done by a library routine that swaps the logical page into the address space and returns a far pointer into it Figure 2 shows examples that do handle conversions to far pointers. Conversions from far pointers to handles are just a type of point that is, no change in the bit pattern.

Figure 2: Converting pointers from handle to far

  int handle *h;
  struct A_handle *h2;
  int far *t;
  int i;
  extern void func(int far *pi);

  f = h;
  *h - i;
  h[3] = *t;
  i = *(h + 6);
  h2->b = i;
  func(h);
  h = (int far *) h;

The optimizer is aware that handles are a special type. Its main job is to determine when a new handle dereference is necessary and when a previous conversion can be used instead. Consider the example in Figure 3. Clearly, h only needs to be converted to a far pointer once. It converts the code to that shown in the second portion of Figure 3.

Figure 3: Example showing that the optimizer is handle aware

        struct {int a,b;} handle *h;
        h->a = 1;
        h->b = 2;
  /* Converted code */
        struct {int a,b;} _handle *h, far "p;
        p = h;
        p->=a=1;
        p->=b=2;

The result of a conversion in Figure 3 cannot be used if:

1. The value of the handle might have changed.

2. A handle dereference was done on another handle (thus possibly over writing the previous page with an other).

3. A function was called (because that function may convert other handles, resulting in case 2).

Because handle memory is larger than physical memory, pages from handle space are swapped into buffers in physical space. Each handle conversion may result in a new page from handle space overwriting a previous page. Invalidating any pointers into that page. This explains the reasoning behind assuming that any handle conversion invalidates any previous conversions.

Of course, you can always convert a handle to a pointer yourself. If you know that a function call does not convert any handles, then the conversion is still valid after the function call. Also, if the library implementation of handle conversions uses more than one physical page (the expanded memory version described shortly uses four), you can rely on at least that many conversions being valid simultaneously.

Handles and Expanded Memory

Let's look at how Zortech C and C++ implement handles in turns out that expanded memory is particularly well suited for implementing handles. It is fast, swapping a logical page into a physical page usually requires a simple write into an I/O register. Expanded memory can be efficiently emulated on 386 machines by using the memory mapping lectures of the CPU. (QEM 386 from Quarter Deck is an example of this type of utility.) There are even emulators that fake expanded memory by using extended memory (memory above 1 Mbyte) or even a hard disk. A final impetus for using expanded memory is that it is part of DOS 4.0.

The expanded memory concept is that one or more logical pages of bank-switched memory are mapped into real memory (physical pages) as needed. This corresponds directly to the handle concept of a large amount of memory (handle space) of which a subset is mapped into physical memory buffers. The mapping of a logical page to a physical page occurs when a handle is dereferenced.

Expanded memory makes it possible to have two physical pages actually map onto the same logical page. In other words, expanded memory allows two different addresses to refer to the same memory location! This implementation of handles carefully avoids depending on this "feature," as that capability is impossible to emulate with disk or extended memory implementations of expanded memory.

Initialization is handled automatically by the C run-time startup, so no extra work is necessary. Termination is a bit trickier. When a program terminates and returns to DOS, DOS automatically frees up any memory used by the program and makes it available for the next one. Unfortunately, this does not happen with expanded memory. A program must always explicitly free its expanded memory pages or they will be unavailable for use by other programs until the machine is rebooted. All exit paths from the program must be covered. Even Ctrl-break must be intercepted.

Storage allocation in handle space is done by handle_malloc( ), handle_realloc( ), and related functions. Each 16K page is converted into a heap, with the usual free list data structure. There is a special data structure in page 0 that contains the size of the maximum available free block in each page, so the storage allocator can directly swap in the page it needs to allocate from, instead of sequentially swapping in each page until the allocation succeeds. This is of critical importance if a disk-swapping expanded memory emulator is used! If there is insufficient free expanded memory space to satisfy an allocation request, or if the size of a request exceeds 16K, the routines fall back to allocating from conventional DOS memory.

The Expanded Memory 3.2 subset of EM 4.0 is used to provide maximum portability; in fact, the more exotic features of 4.0 aren't needed for handle support. It's possible to use expanded memory in addition to using handles, though the two uses should be kept independent to avoid conflicts.

When to Use Handles

The overhead of dereferencing handles is much higher than the overhead of dereferencing pointers, both in program size and speed. Therefore, the best candidates for handles are those data structures that are infrequently accessed both in the number of times the dereference statically occurs in the program and in the number of times the dereferences are executed.

Handles give a sharp increase in the amount of memory available. The proper data structure for something that will reside in handle space is one that favors speed over memory compactness. Locality of data is also important (locality means that related data should be clustered into the same page, increasing the likelihood that the desired page is already swapped in). For example, a bubble sort across data in handle space is to be avoided. If your expanded memory implementation swaps to disk, the swap file on disk may get read and written in its entirety several times during such a sort!

If database-wide searches are necessary, try putting the access structure in conventional memory and put the leaves in handles. This makes the lookup fast and efficient, and the end data is paged in only when it is needed.

Listing One (page 110) shows a version of the classic Unix wc (word count) program, converted from using the standard pointer and dynamic storage allocation to using handles and handle storage allocation. The primary advantage of using handles in a program such as wc is that they increase the program's capacity without a significant performance penalty. The handle version of wc is just as fast as the original version, which used regular pointers. Files with several megabytes of text can be processed by the handle version (assuming that expanded memory is available).

Portability is achieved simply: The_handle keyword is #defined to nothing, so all handles revert to being regular pointers. Second, the handle storage allocation routines can be #defined to be the standard malloc, free, and so on, routines. Thus, handle-specific code can quickly be made portable to other compilers and environments. The Zortech compiler provides a simple mechanism for eliminating handles from a program: Simply add the following line to any program before it #includes handle.h: #define NO_HANDLE 1



/* Compile with: ZTC wc -ml                                */
/* (Use large model so strcmp() can handle far pointers.)  */

#include        <stdio.h>
#include        <stdlib.h>
#include        <string.h>
#include        <ctype.h>
#include        <handle.h>

struct tree
    {
        char __handle *word;
        int count;
        struct tree __handle *left;
        struct tree __handle *right;
    };
int readword(char *w, int nbytes)
    {
    int c;
    do  {
        c = getchar();
                if (c == EOF)
                        return 0;
        }
    while (!isalpha(c));
    do  {
        if (nbytes > 1)
            {
            *w++ = c;
                        nbytes--;
            }
        c = getchar();
        }
    while (isalnum(c));
        *w = 0;
        return 1;
    }
void tree_insert(struct tree __handle * __handle *pt, char *w)
    {
        int cmp;
        struct tree __handle *p;
        while ((p = *pt) != NULL)
        {
        if ((cmp = strcmp(w,p->word)) == 0)
            goto gotit;
                pt = (cmp < 0) ? &p->left : &p->right;
        }
    p = (struct tree __handle *) handle_calloc(sizeof(struct tree));
        if (!p || (p->word = handle_strdup(w)) == NULL)
        {
        printf("Out of memory\n");
                exit(EXIT_FAILURE);
        }
    *pt = p;
    gotit:
        p->count++;
    }
tree_print(struct tree __handle *p)
    {
        while (p)
        {
        tree_print(p->left);
        printf("%5d %s\n",p->count,(char far *)p->word);
                p = p->right;
        }
    }
tree_free(struct tree __handle *p)
    {
    struct tree __handle *pn;
        while (p)
        {
        handle_free(p->word);
                tree_free(p->left);
                pn = p->right;
                handle_free(p);
                p = pn;
        }
    }
main()
    {
        struct tree __handle *root = NULL;
        char word[32];
        while (readword(word,sizeof(word)))
        tree_insert(&root, word);
    tree_print(root);
    tree_free(root);
        return EXIT_SUCCESS;
}


[Figure 1: Typical pointer types used in most PC implementations of C]


void *p;        /* pointer type is default for the memory model */
char far *pc;   /* far (segment and offset pair) pointer        */
int near *p;    /* near (offset only, segment is assumed)       */

[Figure 2: Converting pointers from handle to far]


R 65,T 5
        int __handle *h;
        struct A __handle *h2;
        int far *f;
        int i;
        extern void func(int far *pi);

        f = h;
        *h = i;
        h[3] = *f;
        i = *(h + 6);
        h2->b = i;
        func(h);
        h = (int far *) h;




[Figure 3: Example showing that the optimizer is handle aware]

   struct { int a,b; } __handle *h;
   h->a = 1;
   h->b = 2;
/* Converted code */
        struct { int a,b; } __handle *h, far *p;
        p = h;
        p->a = 1;
    p->b = 2;