PROGRAMMING PARADIGMS

Getting CLOS

Michael Swaine

What makes Lisp relevant today is that it is converging, in terms of features and performance, with other development environments for large software projects. When Guy Steele published Common Lisp: The Language (Digital Press, 1984), he codified what quickly became the de facto standard for Lisp; now the ANSI subcommittee X3J13 has nearly completed a draft standard for Common Lisp that includes the Common Lisp Object System (CLOS), an object-oriented extension to the language. I had this column half written when the second edition of Steele's book arrived, containing much new material, including an entirely new chapter on CLOS. It forced me to go back and rewrite several things; this column also corrects some things I said last month that are now out of date. Steele's treatment of CLOS is essentially the ANSI committee's treatment, and should be very close to the final draft standard, due out this year.

This convergence, though, is turning Lisp into something new. At last year's OOPSLA meeting, Bjarne Stroustrup summed up CLOS by calling it a multi-paradigm language. The circumstances (the developer of C++ being asked to deliver a lecture on the virtues of CLOS) left it unclear whether he meant it as a term of opprobrium or as a compliment.

This column's beat is paradigms, and it seemed worthwhile to take a look at how one paradigm (functional programming) is extended to another (object-oriented programming). In January we looked at "pure" Lisp; in February we saw how this pure functional paradigm has evolved with the widespread acceptance of Common Lisp, and this month we'll take a look at the objectification of Lisp in the form of the Common Lisp Object System. We'll examine two themes: How the Common Lisp data-type system underlies the CLOS class system, and how the basic concept of a function, a key aspect of Common Lisp as well as of "pure" Lisp, has been extended to the object world.

Typing Tutor

Some of the things I said last month have been superseded by the new edition of Steele's book, and this edition makes some things more official than they were previously. Because of these things and also because CLOS classes map into the Common Lisp hierarchy, I'll spell out the Common Lisp data type relationships in some detail.

To begin with, it's not really a hierarchy, but an overlapping structure that Rosemary Simpson, in her Common Lisp: The Index (Coral Software and Franz, Inc., 1987) calls a "heterarchy." Two types stand at the very top and bottom of the Common Lisp data type heterarchy. t is a supertype of every other type, and nil is a subtype of every other type. No object is of type nil. Every object is of type t.

The following subtypes of type t are of interest because X3J13 has defined them to be pairwise disjoint: character, number, symbol, cons, array, random-state, hash-table, read-table, package, pathname, and stream. A Common Lisp object cannot belong to more than one of these types, although it need not belong to any of them.

In addition to these types, any data type created by the defstruct or defclass macros (a user-defined structure or a CLOS class, respectively) is also disjoint from any of the above types. Any two user-defined structures are disjoint from one another unless defined otherwise, and the same goes for classes. Classes, though, are always defined in terms of other classes. I won't say much about structures here, and I'll discuss classes later.

Functions are data objects, too, and the data type function is disjoint from some of the above types, specifically from character, number, symbol, cons, and array. The types character, number, symbol, cons, array, and function are worthy of some elaboration.

Lisp Has Character

First, I'll discuss characters and numbers, correcting some outdated info from last month.

X3J13 redefined the character subtypes that were given in the first edition of Steele's book. Now the base-character and extended-character subtypes form an exhaustive partition of the type character. All characters are one or the other of these types. Base-character is implementation-defined, but must be a supertype of standard char, which is a set of 96 characters that any Lisp implementation must support; the extended-charactertype seems to be X3J13's way of dodging the confusion of bit and font attributes prevalent in Lisp.

Formerly, the data type number contained three disjoint subtypes, rational, float, and complex. Now a new type, real, has been introduced. The hierarchy runs like this: Types real and complex are disjoint subtypes of type number, other subtypes of type number can be defined. Each of these two subtypes also has two disjoint subtypes. Type real has the disjoint subtypes rational and float; it's possible to define other real subtypes. Type rational has the disjoint subtypes integer and ratio; other rational types can be defined.

However, type integer has exactly two subtypes, and Common Lisp does not allow other subtypes of integer to be defined. The two integer subtypes are fixnum and bignum. The fixnum data type is a conventional fixed-word-length integer, the word length being implementation-dependent. bignums are "true" integers, their size dependent only on storage limits, not on word length. fixnums are more efficient than bignums, and are used where efficiency is more important than being able to represent precisely the number of grains of sand required to fill the universe. For example, fixnum is the required data type for array indices.

An object of type ratio represents the ratio of two integers. The Lisp system is required to reduce all ratios to the lowest terms, representing a ratio as an integer if that is possible.

Common Lisp defines four subtypes of type float, but an implementation need not have all four as distinct types. Types short-float, single-float, double-float, and long-float, in nondecreasing order of word length, all must be supplied, but any adjacent pair or triplet of these may be identical. Any float subtypes that are not identical must be disjoint.

An object of type complex represents a complex number in Cartesian form, as a pair of numbers. The two numbers must be of type real, and both must be rational or both must be of the same floating-point type.

Everything in Lisp is a List

Characters and numbers are straight-forward data types, but symbols and lists are trickier. Symbols are named data objects. Type symbol includes among its subtypes one peculiar subtype: type null. null is the type of exactly one Lisp data object: the object nil. The status of type null is one reason that the type relationships of Common Lisp form a heterarchy rather than a hierarchy. null is a subtype of two types, neither of which is a subtype of the other: symbol and list. nil is the only object that is both a list and a symbol.

Actually, at another level, all symbols have a list-like structure. Each symbol has an associated data structure called a "property list," a list of pairs, the first elements being (typically) symbols, and the second elements being any Lisp data objects. The purpose of the property list of a symbol has evolved over time; in Common Lisp it is less important than in earlier Lisps, being used now for data not needed frequently, such as debugging, documentation, or compiler information. Neither a property list nor a symbol is of type list, but somehow everything in Lisp is a list of some sort. (Viewed another way, almost everything in Lisp is a function, as we'll see shortly.)

The data type list, though, is not regarded as being as basic as type cons. These are alternate ways of viewing the same thing. A list is recursively defined to be either the object nil or a cons whose second component is a list. A cons is a data structure with two components, which can be pretty much anything; usually, though, the second component of a cons is a list (or nil, the empty list). The first components of the conses making up a list are the elements of the list.

The data type cons, then, is the type of the basic data structure used to build lists. Any object that is a cons is also a list, so list is a supertype of cons. The data type list has exactly two subtypes, and they are disjoint: cons and null. In this sense, null is the (type of the) empty list. list itself is a subtype of the data type sequence, which has one other subtype: vector. vector and list are disjoint.

Vectors, and arrays generally, can be rather complex. Arrays can be complex, with the ability to share data with other arrays, be dynamically sized, and have fill pointers. An array that has none of these features is called a "simple array." Vectors are one-dimensional arrays; they differ from lists in performance characteristics. Accessing an element of a list is, on average, a linear function of list length, while the time to access an element of a vector is constant. When it comes to adding an element to the beginning of a list or vector, though, the relationship is reversed: constant for the list, and a linear function of vector length for the vector.

One of vector's more interesting subtypes is type string. Type string is the union of one or more vector types with the characteristic that the types of the vector's elements are subtypes of type character.

According to X3J13, the data type function is strictly disjoint from data types cons and symbol. But lists and symbols are the only tools available for referring to functions, or for invoking them. This is probably a use-mention distinction, but in any case, when a list or symbol is used in this way it is automatically coerced to type function. As we'll see shortly, there's some truth to the exaggeration that everything in Lisp is a function.

Lisp Has Class

CLOS is an object-oriented extension to CL, adding four kinds of objects to CL: classes, instances, generic functions, and methods. The key aspects are generic functions, multiple inheritance, declarative method combination, and a metaobject protocol. Classes and instances are tied to data types, generic functions to functions. I'll say only a little bit here about the metaobject protocol, which is not yet officially a part of CLOS.

The Common Lisp Object System maps classes into the data types just described. Many Common Lisp types have corresponding classes with the same names, but not all. Normally, a class has a corresponding type with the same name.

Because the types do not form a simple tree, and a type can be a subtype of two types neither of which is a subtype of the other, you might expect CLOS to support multiple inheritance, in which a class can inherit from more than one superclass. In fact, this is the case. The heterarchical structure of types is mirrored in the inheritance structure of classes, but CLOS requires that more structure be added to establish a clear precedence order for inheritance. For example, the class vector has superclasses sequence and array, just as the type vector has supertypes sequence and array, but from which superclass does vector inherit what?

CLOS resolves questions such as this by requiring that you specify an ordering of direct superclasses when you define a class (and by supplying this ordering for predefined classes). The business of deriving a full precedence order is fairly complex, but the CLOS class precedence order for predefined classes resolves such issues. In particular, the precedence order for the class null is null, symbol, list, sequence, t; and the precedence order for the class string is string, vector, array, sequence, t. By implication, the precedence order for the class vector is vector, array, sequence, t; so array methods have precedence over sequence methods when class vector is inheriting methods.

Everything in Lisp is a Function

The simplifying generalization is that everything in Lisp is a function. It's nearly true; any data object can be treated as a function, or rather, as a form. A form is simply a data object treated as a function. You treat a data object as a function when you hand it to the evaluator, which is the mechanism that executes Lisp programs. The evaluator accepts a form and does whatever computation the form specifies.

The evaluator can be implemented in various ways, such as by an interpreter that traverses the form recursively, performing the required calculations along the way; or as a pure compiler; or by some mixed form. Common Lisp requires that correct programs produce the same results, regardless of the method of implementation. The evaluator is available to the user via the function eval, and also the special form eval-when, which allows specifying that a form should be evaluated, say, only at compile time.

Not every data object specifies a meaningful function, but most do. To the evaluator, there are three kinds of forms, corresponding to three nearly disjoint data types. There are symbols, lists, and self-evaluating forms (per X3J13, all standard Common Lisp objects, except symbols and lists, are self-evaluating forms).

Self-evaluating forms are taken literally by the evaluator; they return themselves on evaluation.

Symbols name variables, constants, keywords and functions. They evaluate to whatever they name; for example, what they are bound to or what they are set to.

Lists, from the viewpoint of the evaluator, come in three varieties: special forms, macro calls, and function calls. Note that while a function is not a list, a function call is.

Special forms are structural elements of the language that don't fit the functional paradigm well, such as the if-then-else structure. These deviations from the purity of the paradigm have been a part of Lisp since the beginning, and new special forms have been added over the years, but in Common Lisp the set of special forms is fixed and cannot be extended by the programmer. A macro is a function from forms to forms, much as in other languages. A macro call, when evaluated, is said to be expanded. Programmers can extend the set of macros. Despite the fact that they are not true functions, special forms look like functions syntactically, as do macros. The consequence of this is that when you are sitting at the keyboard typing in Lisp code, it feels like you are dealing with one kind of construct: A parenthesized list that represents a function and its arguments.

A form that is a function call consists of a list whose first element is a function name. The other elements of the list, if any, are treated by the evaluator as forms to be evaluated to provide the function with arguments. There are two levels of evaluation that take place whenever the evaluator deals with a function call: The arguments get evaluated, then the function is evaluated with these arguments. Typically, the evaluation of the function produces a value, which becomes the value of the original form.

There are two ways in which the first element of a form can name a function, one involving a symbol and the other involving a list. Because symbols are used to name functions, this is the most direct and obvious way. The other way involves the use of a lambda expression. A lambda expression is technically not a form, and cannot be evaluated. It is a list, the first element being the word lambda. The second element is a list of parameters, and this is followed by some number of forms to be evaluated, which can use the parameters. When the function that the lambda expression names is applied to arguments, the parameters are bound to the arguments and the forms are executed with these bindings.

Using a lambda expression as a function name is like slipping physical actions into your speech, as you would be doing if you referred to what comes at the end of a joke by making a punching motion, then saying the word "line." Lambda expressions see their main use in defining functions, roughly like this:

defun <fn-name> <lambda-list> <forms>

CLOS adds generic functions to Lisp. Because the evaluation of functions is central to Lisp, the extension of functions to generic functions has a lot to say about how it feels to program in CLOS.

A generic function is a true Lisp function, is called with the same syntax, and can be used in the same contexts in which a Lisp function can be used.

Defining a generic function object is similar to defining a function. You use the defgeneric macro, basically like this:

defgeneric <fn-name> <lambda-list> <methods>

The difference is that, rather than a fixed set of forms to be evaluated, the generic function has a collection of method descriptions, each of which may consist of a number of forms. The method descriptions have their own lambda lists that must be congruent with the main lambda list. Texas Instruments has implemented generic functions in its TICLOS as normal compiled functions with pointers to data structures containing their slots. When the function is called, it is up to the object system to select the appropriate method from its methods. Actually, not select; the technique is more general than this, and is called "method combination." The code eventually executed is called the "effective method."

The selection/combination has three stages: select applicable methods, order them by precedence, and apply method combination. The method combination, defined in the definition of the generic function, can be as simple as using the most specific method, or it can be some function of some of the applicable methods. Some built-in method combination types are +, and, or, append, max, and min, which perform the corresponding functions on the applicable methods to produce the effective method.

Some of the most interesting CLOS functions are those that allow customization of the object system itself, by manipulating metaobjects and metaclasses. Unfortunately, these have not yet been approved by X3J13 for inclusion in the standard. They do, however, support the original spirit of Lisp as an introspective language, with all the strangeness that Douglas Hofstadter suggested when I quoted him last month, a quote that I here double-quote:

"A ... double-entendre can happen with LISP programs that are designed to reach in and change their own structure. If you look at them on the LISP level, you will say that they change themselves; but if you shift levels, and think of LISP programs as data to the LISP interpreter ... then in fact the sole program that is running is the interpreter, and the changes being made are merely changes in pieces of data."

Editor's Note: For a general discussion of functional programming, see "Functional Programming and FPCA '89" by Ronald Fischer, DDJ, December 1989. Also, see "A Neural Networks Instantiation Environment" by Andrew J. Czuchry, Jr. in next month's DDJ for more information on programming in Lisp.