GENERATION SCAVENGING

An efficient, unobtrusive, portable garbage collector

Frank Jackson

Frank is a member of the technical staff at ParcPlace Systems and has spent much of his time there designing, building, and evaluating various forms of automatic memory management. He can be reached at ParcPlace Systems, 1550 Plymouth St., Mountain View, CA 94043.


Nobody likes to take out the trash, and programmers are no exception. Wouldn't it be nice if someone else would gather up all the garbage that we create and dispose of it for us? It should come as a welcome relief, then, to discover that the run-time systems of programming languages such as Lisp, Smalltalk, and Prolog generally provide facilities that do exactly that. With the advent of powerful computer workstations, automatic garbage collection has become an important component of many modern interactive programming environments as well as the applications that are built using such environments.

Although traditional programming languages such as C, Fortran, and Pascal do not require the programmer to expend any effort managing the memory occupied by either the data that is allocated on the system's run-time stack or the data that is statically allocated on the system's heap, they do require the programmer to manage any data that is dynamically allocated on the heap. Programmers that use such languages are forced to litter their programs with explicit free statements if they wish to recycle the storage consumed by heap-allocated data that is no longer useful. By having the language's run-time system collect such garbage automatically, a certain class of well-known bugs is eliminated. For example, storage leaks cannot occur in such a system, so valuable memory is not wasted if the programmer neglects to free data that is no longer accessible. Even more important, data cannot be prematurely freed, avoiding the chaos that can result when an application tries to access data that was mistakenly recycled. Finally, the programmer is relieved of the burden of having to explicitly manage the heap, which saves development time and results in less complex code.

Given these benefits, you might expect heap-based garbage collection to be an integral component of most present day language implementations. This is not the case, however. Most traditional programming languages were not designed with garbage collection in mind, and it is generally difficult to retrofit existing language implementations with an automatic garbage collector. Further, there are a number of serious drawbacks to the classical garbage collection algorithms. These drawbacks include:

These drawbacks become even more apparent when the algorithms are deployed in modern interactive environments, given the stringent response-time requirements of these environments.

Significant progress has been made in the past decade, however, and new garbage collection techniques have been developed that all but eliminate the above drawbacks. One of these techniques -- generation scavenging -- not only addresses each of the problems just listed, but it requires no hardware support, making it portable across a wide variety of personal computers and engineering workstations. In this article, I'll discuss some of the historical events that led to the development of the original generation-scavenging algorithm. In addition, I'll describe some of the more recent refinements that significantly enhance the performance of the basic generation scavenger. In particular, the scavenging algorithm described later can be tuned so that the average pause time and the total overhead for collecting garbage can be reduced to an acceptably low level.

Classical Garbage Collection Algorithms

Automatic garbage collection has roughly a 30-year history, starting with the near simultaneous invention of the two classical approaches to garbage collection from which most modern collection schemes are derived, namely the mark-and-sweep collection algorithm and the reference-counting algorithm. In 1960, Collins2 introduced the notion of using reference counts to determine if a piece of data could be safely reclaimed. Each piece of data, which I shall refer to hereafter as an object, has associated with it a count of the number of other objects that reference it. When this count drops to zero, the object can be reclaimed automatically by the run-time system.

The primary advantage of the reference-counting approach is that the pauses required for such reclamations are generally imperceptible to the user, because these object reclamations can easily be distributed across the computation. In addition, the space occupied by the garbage objects can be recycled immediately, thereby reducing the total amount of memory required to complete a given computation, although this space advantage is reduced somewhat by the storage required to hold the per-object reference counts.

There are some significant disadvantages to the reference-counting approach, however. Most importantly, it can't reclaim circular garbage, because any object that is indirectly self-referential will never have a reference count of zero, even if the object is no longer accessible to those objects that are still involved in the computation. In addition, special provisions have to be made to reclaim those objects whose reference-count fields have overflowed. Accordingly, most systems that employ reference counting attempt to prevent these storage leaks by either providing a backup garbage collection system that utilizes a different collection algorithm or by incurring the expense of additional recursive scanning.

Reference counting also has high overhead because each store requires the run-time system to decrement the reference count of the object whose reference is being overwritten and to increment the reference count of the object whose reference is being stored. In addition, when an object's reference count drops to zero, the run-time system has to decrement the reference count of every object pointed to by the dying object, possibly causing the reference counts of these objects to drop to zero, forcing the system to decrement the reference counts of still more objects. Finally, additional overhead is engendered by the necessity of recycling the storage occupied by these dead objects in order to avoid running short of memory.

Subsequent refinements to the basic reference-counting algorithm by Deutsch and Bobrow4 in 1976 have succeeded in reducing the total temporal overhead to approximately ten percent, which is still a relatively high price to pay. Consequently, reference counting is no longer widely used in commercially available language implementations. The fact that reference counting permits object reclamation based strictly on local information, however, has made it relevant to systems that must operate in a distributed computing environment.

The other classical garbage collection technique, also proposed in 1960, is McCarthy's7 mark-and-sweep algorithm. Unlike reference counting, which uses local information to make its decisions, the mark-and-sweep algorithm relies on a global traversal of all live objects to decide which objects can be reclaimed. The basic mark-and-sweep algorithm works as follows:

    1. Mark all objects reachable from the system's roots as being live objects.

    2. Sweep memory, unmarking live objects and reclaiming dead objects, possibly performing a simultaneous or subsequent memory compaction.

Although the mark-and-sweep algorithm does reclaim circular garbage, it, too, has serious drawbacks:

In a virtual memory system, these pauses can be exacerbated by the paging overhead required to touch each object in the entire system. The mark phase can cause especially bad paging behavior, because it typically exhibits what is essentially random page-referencing behavior.

Modern Garbage Collection Algorithms

The high cost in time associated with the classical garbage collection algorithms was reduced somewhat by the development of the copying garbage collectors, such as that described in 1969 by Fenichel and Yochelson. In its simplest form, a copying collector works in the following manner: The data heap is divided into two semispaces, and object allocations are restricted to a single semispace. When that semispace fills up, the computation is paused, and the garbage collector then traces the system's roots, copying all live objects to the other semispace. Once all of the live objects have been copied, the computation can continue with new objects being allocated in the same semispace as the live objects.

Such a collector is potentially faster than the traditional mark-and sweep collector because it touches only live objects, making its pause time proportional to the total size of the live objects rather than the size of allocated memory. Dead objects are reclaimed by virtue of the fact that they are not copied to the other semispace. The problem with this approach is twofold. First, dividing the heap into semispaces wastes space because the computation can utilize only half of the heap at any given time, and, second, the pauses required to copy all of the live objects can still be quite lengthy.

To eliminate the disruptive pauses caused by this sort of stop-and-copy collector. Baker1 proposed an incremental copying collector in 1978. In this approach, the act of copying live objects from one semispace to the other was interleaved with the actual computation. This algorithm imposed some additional forms of overhead on the computation, however, including the need for the computation to monitor every read and write to the heap in order to correctly follow the forwarding pointers that were placed at an object's old address when it was copied to the other semispace. This additional overhead was typically overcome by using hardware support, such as that available on the MIT Lisp machines. Even so, the Baker collector was just as spatially inefficient as the other copying collectors, because it also divided the heap into a pair of semispaces. In addition, the total time required to copy all of the live objects with each collection was still unreasonably long.

Generation-Based Garbage Collectors

To alleviate the problems associated with these early copying collectors, language implementors began to exploit the empirical properties of data. Researchers observed that young objects tended to die while still young, whereas older objects were more likely to liv on indefinitely. It made sense, then, to devote more effort to collecting young objects, where the return on the system's copying investment was likely to be high, instead of repeatedly copying older objects that simply refused to die.

To make such an approach possible, Lieberman and Hewitt6 proposed in 1983 that objects be segregated into multiple generations, with each generation containing objects of roughly the same age. In addition, they proposed that the system be designed so that each generation could be collected independently. Younger generations could then be collected more often, generally using some sort of copying algorithm. If an object survived enough collections, then it would be promoted to an older generation, where it would be collected with less frequency. This approach saved time because there were fewer objects to copy in the young generations, given the high mortality rate of young objects; it also saved space because in principle the system needed to maintain only enough free space at any given time to copy the live objects in a single generation rather than enough free space to copy all of the objects in the entire system.

The feasibility of the generation-based approach depended in part on the speed with which the garbage collector could identify the live objects in a given generation. Consequently, the implementors of the generation-based systems developed various methods for keeping track of the roots for each generation. This task was typically simplified by keeping track only of the objects in older generations that pointed directly to objects in the younger generations, and being careful to collect all of the younger generations when collecting an older generation. Nevertheless, the task of keeping track of each generation's roots added additional overhead to the computation. For example, each store into the heap had to be monitored to see if it created the sort of intergenerational reference that increased the number of roots for a given generation. However, the overhead required to monitor each store in such a manner was generally less than the store overhead required by the reference-counting approach. At any rate, the generation-based approach allows the language implementor to stake out an intermediate position between those taken by the reference-counting approach, which relies solely on local knowledge to reclaim garbage, and the mark-and-sweep approach, which has to traverse the entire system to reclaim garbage.

Even more overhead is incurred by those generation-based systems that reclaim garbage incrementally (as noted in the discussion of the Baker1 algorithm). Such systems, most notably Moon's8 ephemeral garbage collector for Lisp, generally hide much of this overhead by taking advantage of the hardware support provided by modern Lisp machines. Such systems are still in use today, and the language implementors on these machines continue to find ways to further utilize these hardware capabilities (for example, Courts3 has described a way to reduce page faults by dynamically improving the locality of reference of those objects housed on the heap).

Generation Scavenging

Many language implementors, however, especially those developing third-party software that must be deployed on a variety of hardware platforms and operating systems, can't count on having any hardware support for garbage collection at their disposal. Without hardware support, it is quite difficult to implement an efficient incremental garbage collector, primarily because of the extra overhead required to follow forwarding pointers when reading and writing to the heap. The generation-scavenging algorithm was developed to provide language implementors with a garbage collector that, like the incremental collectors, was unobtrusive but, unlike the incremental collectors, did not require hardware support in order to be reasonably efficient.

The basic algorithm, as described by Ungar10 in 1984, requires that the heap he divided into two spaces -- oldspace and newspace. oldspace is generally much larger than newspace, because oldspace is used as the repository for objects that are considered permanent. As such, oldspace is collected infrequently, typically by using a global mark-and-sweep collector. Instead, most reclamation attempts are focused on the objects in newspace. In Ungar's10 original algorithm, newspace is divided into three zones -- a creation zone and two survivor zones. New objects are allocated in the creation zone. Whenever the creation zone fills up, the computation is halted, and the scavenging mechanism copies all live objects in the creation zone to one of the survivor zones. A forwarding pointer is left behind for each object that is copied, and any other objects that reference the old location of a copied object will have these references updated when the scavenger scans them in its search for additional survivors. Once the scavenge is complete, the creation zone will be empty and can be reused. Subsequent scavenges will continue to copy those live objects that can be found in the creation space and the occupied survivor zone to the empty survivor zone. In this scheme, then, objects are born in the creation zone and thereafter bounce back and forth between the two survivor zones until they are deemed old enough to be promoted to oldspace. Ungar11 refers to this process of promoting an object to oldspace as "tenuring."

Because generation scavenging is a stop-and-copy algorithm, as opposed to being incremental, there is no need for special hardware to follow forwarding pointers, because all references to these forwarding pointers are automatically updated during the scavenge. Like the other generation-based algorithms, however, the system must maintain a list of roots for newspace: It must keep a list of those objects in oldspace that contain references to objects in newspace. Maintaining this list imposes some extra overhead on every store into the heap. Even without special purpose hardware, however, this overhead doesn't appear to be onerous for the following empirical reasons:

In addition, the system must be able to discern the age of an object so that the scavenger can decide when the object should be tenured to oldspace. In Ungar's10 initial implementation, each object had an age field that the scavenger incremented periodically. As we shall see shortly, the requirement that the system keeps track of each object's age need not impose any additional storage overhead. Generation scavenging, then, can be viewed as a two-generation system, where oldspace and newspace are the two generations, or a multi-generation system with objects of different generations (that is, different ages) being housed together in newspace.

Because the generation-scavenging algorithm was first published, many variations have been both proposed and implemented. In some systems, newspace is composed of two zones instead of three. Other implementations allow the scavenger to scavenge multiple spaces instead of restricting its purview to a single space. These spaces are sometimes arranged as pairs of semispaces and sometimes as a bucket brigade of consecutive spaces through which the surviving objects are promoted. Some systems eliminate the need for an age field by spatially segregating objects of the same age. (See Wilson and Moher13 for an example of a system that obviates the need for an age field by organizing its spaces into a bucket brigade.) Finally, various schemes have been proposed for efficiently identifying the roots of newspace. Shaw,9 for example, recently suggested combining the store check with the virtual memory mechanism that marks hardware pages as being dirty and then scanning these dirty pages for actual roots at scavenge time.

Tenure Policies

One of the key decisions that a generation scavenger must make is when to tenure an object to oldspace. Early scavengers generally employed a simple fixed-age tenure threshold: They tenured any object that had survived for a fixed amount of time or a fixed number of scavenges. Studies conducted by myself and Ungar12 show that such tenure policies are not particularly effective in minimizing the amount of tenured garbage (that is, objects that die after being tenured) or in controlling the length of the pauses required to perform the scavenge. Different applications cause objects to survive for different amounts of time, so no single tenure threshold will perform optimally in all circumstances. If the tenure threshold is set too young, then oldspace will be flooded with objects that die shortly thereafter. (This problem will be further exacerbated by the effects of nepotism. See Figure 1.) And if the tenure threshold is set too high, then the scavenge pauses can easily become disruptive.

Both of the above problems can be solved by employing a tenure policy that modifies its tenure threshold dynamically according to the demographics of the object population currently housed in newspace. I will now describe how one might go about designing such a tenure policy. Because stop-and-copy collectors traditionally have problems with distracting pauses, it is important to provide the scavenger with the means to control the length of its pauses. Assuming that we have determined the maximum pause time that the scavenger can be permitted to take without being considered disruptive, we need to measure how many bytes of surviving objects the scavenger can copy in that amount of time. The scavenger can then control the length of its pauses by using this number as a watermark in the survivor zones. If the aggregate size of the objects in the survivor zone is less than this watermark, then the scavenger doesn't need to tenure any objects during the upcoming scavenge, because the pause required to scavenge these survivors will probably be acceptably brief. If, however, the size of these survivors actually exceeds this watermark, then the scavenger should tenure some objects during the next scavenge to keep the pause times from becoming disruptive.

Because the scavenger tenures objects to keep the duration of its pauses under control, we need to provide it with the means to minimize the amount of the tenured garbage that it creates. Rather than tenure objects randomly, which could result in young objects that are unworthy of promotion being tenured, the scavenger uses demographic information to select a tenure threshold that will result in the desired amount of the oldest objects in newspace being tenured. The necessary demographic information can be kept in a table indexed by age that contains the number of data bytes in newspace for each age. This table can either be maintained by the scavenger as a matter of course, or it can be created on-the-fly by a quick scan of the occupied survivor zone. By scanning this table backwards, the scavenger can then set the appropriate tenure threshold for the ensuing scavenge. These measures permit the scavenger to tenure the minimal amount of objects that have the highest likelihood of surviving (see Figure 2).

Thus far, we've described a scavenger that can easily be made nondisruptive, even in the face of the varying object demographics, but what about the total scavenging overhead? Given a maximal acceptable pause time of, say, 100 milliseconds, we can drive the total scavenge overhead reasonably low by sizing the creation zone appropriately (assuming that the overhead required to perform the store checks is as low as recent studies seem to suggest). That is, if we were to size the creation zone such that it filled up once per second (and, hence, a scavenge was performed every second or so), then the total overhead for scavenging would be around ten percent. If, however, we sized the creation zone so that it filled up every three to four seconds, then the scavenge overhead would be less than three percent.

Thus, the generation scavenger described can easily be tuned in two respects: The average pause time required to perform a scavenge and the total scavenge overhead can be controlled by setting the watermark in survivor space and the size of the creation zone, respectively. Of course, the cost for reducing both pause times and scavenge overhead is paid in memory, in terms of both the memory required to size the creation zone appropriately and the space taken up by tenured garbage resulting from the need to keep the total size of the scavenge survivors less than the survivor zone watermark. For example, current Smalltalk implementations on stock hardware that utilize this particular type of scavenger typically have survivor zone watermarks that vary between 50K and 120K, resulting in worst case pause times of 100 milliseconds, and creation zones between 400K and 800K, resulting in a scavenge overhead of less than three percent.

Because it requires neither hardware assistance nor any operating system's software, this scavenger has been successfully deployed as a component of Objectworks for Smalltalk-80 fielded by ParcPlace Systems. This particular implementation, coded entirely in C, has been ported to a wide variety of personal workstations, including the Apple Macintosh family, most 386-based DOS PCs, and the workstations sold by Sun, Digital Equipment, and Hewlett-Packard.

Trouble in Paradise

Generation scavenging is not without its shortcomings, however. Certain space-consumptive programs may produce so many live objects that their sheer volume simply overwhelms the capacity of the newspace architecture described earlier. These programs typically produce substantial amounts of tenured garbage that can result in wasted memory, poor paging behavior, and lengthy interruptions required to reclaim this garbage. These problems can be mitigated somewhat by utilizing several large additional generations, but not without incurring noticeable pauses when these additional generations are scavenged with the current generation of stop-and-copy algorithms. Finding an efficient way to collect these additional generations without perceptible pauses on stock hardware will require further research.

Furthermore, real-time programs frequently require drastically shorter scavenge pauses than normal interactive programs. The measures required to reduce these pauses can also result in a significant increase in the amount of tenured garbage.

Conclusions

Generation scavenging has proven to be an efficient, unobtrusive technique for reclaiming storage among an object population where deaths outnumber survivors. In addition, it has proven to be popular among language implementors. For example, all of the commercially available Smalltalk implementations use variants of generation scavenging as their primary reclamation systems. This popularity can be attributed both to the algorithm's simplicity and to the fact that it requires no special hardware support. Finally, I expect future developments in the area of generation-based garbage collection to proceed along the following lines:

References

    1. Baker, Henry G., Jr. "List Processing in Real Time on a Serial Computer." Communications of the ACM 21(4) (April 1978).

    2. Collins, George E. "A Method For Overlapping and Erasure of Lists." Communications of the ACM 2(12) (December 1960).

    3. Courts, Robert. "Improving Locality of Reference in a Garbage-Collecting Memory Management System." Communications of the ACM 31(9) (September 1988).

    4. Deutsch, Peter L. and Bobrow, Daniel G. "An Efficient, Incremental, Automatic Garbage Collector." Communications of the ACM 19(9) (September 1976).

    5. Fenichel, Robert R. and Yochelson, Jerome C. "A Lisp Garbage Collector for Virtual Memory Computer Systems." Communications of the ACM 12(11) (November 1969).

    6. Lieberman, Henry and Hewitt, Carl. "A Real-Time Garbage Collector Based on the Lifetimes of Objects." Communications of the ACM 26(6) (June 1983).

    7. McCarthy, John. "Recursive Functions of Symbolic Expressions and Their Computations by Machine," part I. Communications of the ACM 3(4) (April 1960).

    8. Moon, David A. "Garbage Collection in a Large Lisp System." Conference Record of the 1984 ACM Symposium on LISP and Functional Programming, pages 235-246, Austin, Texas, August 1984.

    9. Shaw, Robert A. Improving Garbage Collector Performance in Virtual Memory. Technical Report CSL-TR-87-323. Stanford: Stanford University, March 1987.

    10. Ungar, David. "Generation Scavenging: A Non-disruptive High-Performance Storage Reclamation Algorithm." Proceedings of the ACM Symposium on Practical Software Development Environments, Pittsburgh, Penn., April 1984.

    11. Ungar David. The Design and Evaluation of a High-Performance Smalltalk System, AMC 1986 Distinguished Disertation, MIT Press, Cambridge, Mass., 1987.

    12. Ungar, David and Jackson, Frank. "Tenuring Policies for Generation-based Storage Reclamation." OOPSLA'88 Conference Proceedings. ACM, September 1988.

    13. Wilson, Paul R. and Moher, Thomas G. "Design of the Opportunistic Garbage Collector." OOPSLA'89 Conference Proceedings, ACM, October 1989.

Figures 1 and 2 originally appeared in the OOPSLA'88 Conference proceedings, ACM, September 1988.