PROGRAMMING PARADIGMS

Four Hundred and Eighty-seven

Michael Swaine

A man is sent to prison, and during his first supper inside, a fellow prisoner stands up and says loudly, "Five thousand nine hundred and thirty-three." Everybody laughs. The next night, a different convict jumps up and shouts, "Three thousand and ten," and the crowd breaks up. This continues: The next day in the yard, a prisoner mutters, "Two sixteen," and the prisoners near him snicker and snort. Puzzled, the man finally asks his cell mate what's going on.

"There's one joke book in the prison library," his cell mate explains, "with the ten thousand best jokes, all numbered. Well, a lot of us have been here long enough to have memorized the jokes, so when we want to tell one, we just use its number. My favorite is four hundred and eighty-seven." And he chuckles softly to himself.

The next night at supper, the man stands up and shouts, "Four hundred and eighty-seven." His fellow prisoners stare at him in silence. Crushed, he sits back down and tries to finish his meal.

"I thought you said four hundred and eighty-seven was a good joke," he snarls at his cell mate later that night.

"It's one of the best," the cell mate drawls.

"Then why didn't anybody laugh?"

"You didn't tell it right."

If you're wondering right now what the number of the joke I just told is, you have the right mind-set for a discussion of representation in Lisp. If, furthermore, you think that the number of that joke ought to be four hundred and eighty-seven, and if you've thought about what a different joke it would be if that were the case, you must be a Lisp hacker.

Lisp hackers often point to certain features of the language in explaining why they choose to program in Lisp. These include recursion, the ability to operate on code as data, and extensibility. None of these features are unique to Lisp, though, and none of them gets across why Lisp is unique.

In this month's column, we'll take a look at representation in Lisp, and try to get some sense of the power inherent in Lisp's representation scheme, and we'll examine the wide range of data structures supported by the Common Lisp standard, built on this representation scheme.

Notes on Notation

"Author: The idea is to imitate Godel's self-referential construction, which as you know is INDIRECT, and depends on the isomorphism set up by Godel-numbering.

"Crab: Oh. Well, in the programming language Lisp, you can talk about your own program directly ... because programs and data have exactly the same form. Godel should have just thought up Lisp...." -- Douglas Hofstadter, a fictional character, and Crab, a crab, in Godel, Esher, Bach, by Douglas Hofstadter.

John McCarthy thought up Lisp 30 years ago as a tool for manipulating symbolic expressions, which is essential for tasks like symbolic integration. But the real point was to make it possible for a program to talk about a program, with an eye to developing provably correct programs. Toward this end, McCarthy used a class of expressions called "s-expressions," or symbolic expressions, based on Alonso Church's work on the lambda calculus. S-expressions permit programs and data in Lisp to have the same form. Programs and data in Lisp are both coded as s-expressions.

What, exactly, are s-expressions? That's really two questions: What do they look like and what do they do, or what's the notation and what's the interpretation? Both questions are fruitful.

The notation McCarthy used for s-expressions was list notation, and the name Lisp is an acronym for LISt Processing. Roughly, s-expressions are lists. But the matter of s-expression notation goes a little deeper than this.

Formally, an s-expression is either an atomic symbol (atom) or a list of s-expressions:

<s-expr> ::= <atom> | <list-of-s-expressions>

What's an atomic symbol? An atomic symbol, or atom, is, formally, just a string of characters subject to certain constraints. It's a name, a symbol, such as a name for a variable or constant or function in any language. Or, as we shall see, not quite like a name in any other language.

And the notation for lists? A list can be recursively defined to be either the empty list (which is sometimes written using the atom NIL) or an item like the thing on the right below:

  <list> ::= NIL | ( <s-expr> <list> )

In other words, all Lisp lists are binary trees with all their (non-NIL)atoms hanging off their left terminal branches, and all right terminal branches containing NILs. NIL is a primitive Lisp object (atom) used for several purposes, including to represent Boolean false and the empty list.

An alternative notation for lists used in Lisp is called "dot" notation. In dot notation, a (dotted) list can be defined thus:

<dotted-list> ::= ( <s-expr>. <s-expr> )

Some examples of Lisp s-expressions are:

  
  A
  (A . B)
  (A . NIL)
  (A . (B.NIL))
  (A . (B . ((C . (D . NIL)) . NIL)))

Here are prose descriptions of the five s-expressions above:

A is not a list, nor even a dotted list, but is both an atomic symbol and an s-expression.

(A . B) is a dotted list, but not a true list, because it has a non-NIL right terminal branch.

(A . NIL) is the list containing one element: The atomic symbol A.

(A . (B . NIL)) is the list containing the two elements A and B.

(A . (B . ((C . (D . NIL)) . NIL))) is the list containing three elements: A, B, and the list containing the two elements C and D.

Figure 1 shows pictures of the binary trees that the s-expressions represent. There are reasons for using dot notation in low-level programming, but for most purposes the simpler list notation is used. In this shorthand form, the above forms are written:

(No list representation for A, since A is not a list.)
(No list representation for (A . B), since (A. B) is not a "true" list.)
(A) (A B) (A B (C D))

Lists in this notation can have any number of elements, but note that the underlying representation is still the binary tree described by the dotted pairs of the corresponding dot notation. This (non-dotted) notation is the form in which all Lisp programs are normally written. Every Lisp program or subprogram begins and ends with a matched pair of parentheses, usually with more pairs inside and with atomic symbols inside them. Every data object is also expressed as one of these parenthesized lists if it is not represented as an atomic symbol.

Figure 1: Binary trees that s-expressions represent

      A
      *            *
    /   \        /   \
   /     \      /     \
  A       B    A       *
                     /   \
      *             /     \
    /   \          B       *
   /     \               /   \
  A       NIL           /     \
      *                *       NIL
    /   \            /   \
   /     \          /     \
  A       *        C       *
        /   \            /   \
       /     \          /     \
      B       NIL      D       NIL

Lisp students early on decided that the acronym List stood for "Lots of Insipid, Stupid Parentheses." Little Wonder.