PROGRAMMING PARADIGMS

A Decade Later

Michael Swaine

Each year, The Association for Computing Machinery (ACM) gives its Turing Award to a computer scientist who has earned distinction for technical contributions of lasting and major importance to the field of computer science. The recipients through 1985 were Charles Bachman, John Backus, E.F. Codd, Stephen Cook, Edsger Dijkstra, Robert Floyd, R.W. Hamming, C.A.R. Hoare, Kenneth Iverson, Richard Karp, Donald Knuth, John McCarthy, Marvin Minsky, Allen Newell, Alan J. Perlis, Michael O. Rabin, Dennis Ritchie, Dana S. Scott, Herbert Simon, Ken Thompson, Maurice V. Wilkes, J. H. Wilkinson, and Niklaus Wirth.

I list them all here not only because they all deserve recognition, but also to give a sense of the range of contributions honored, from Minsky's AI work to Codd's creation of the relational database model, from Hoare's foundational work in axiomatic semantics to Ritchie and Thompson's creation of C and Unix.

Clearly, recognition is not solely restricted to abstract, academic achievements. And in fact the Turing Award lectures, delivered by the recipients at annual conferences of the ACM, usually stress-broad issues of concern to all programmers. These lectures have been collected in a great book ACM Turing Award Lectures: The First Twenty Years: 1966-1985, Addison Wesley, 1987.

The 1978 Turing Award went to Robert Floyd for his work in the foundations of the theory of parsing, programming language semantics, the automation of program synthesis and verification, and the analysis of algorithms.

It was Floyd's Turing Award lecture, The Paradigms of Programming, that inspired this column. In that lecture, he cited inadequacies in our store of programming paradigms, in our knowledge of existing paradigms, in the way we teach paradigms, and in the way our languages support the paradigms of their user communities. Recently I talked with Professor Floyd about his thinking in the area of programming paradigms a decade after the lecture.

Swaine: In this column, I've talked about paradigms generally at the level of the programming language implementation. But you're really not working with different paradigms at that level, are you?

Floyd: No, I do most of my programming in Pascal. There are paradigms on all scales.

On one end of the scale you have the paradigm that's embodied in a language: object-oriented programming and functional programming and logic programming and all these things. At the other extreme there are things that would be substantial conveniences right down at the micro level, like having parallel assignment. Then there's the intermediate level, the paragraph of programming.

Swaine: How would you characterize your work today?

Floyd: My own programming efforts for a very long time have been concerned with programming in the small rather than in the large. The programs I write myself seldom go more than four pages of Pascal. Typically, I'm concerned with getting the very good ten-to-fifty-line program for a mathematically sharply-defined task. I spend a lot of my time deducing the mathematical characteristics that the solution has to have and substantially less time working out how the mathematical objects will be represented as data structures and how the coding will go. I don't face the software engineering problems of programming in the large.

Swaine: You work more at the paragraph level? Can you cite some examples of these paragraph-level paradigms?

Floyd: A couple of those that are reasonably well known that I can mention as typical are dynamic programming and branch-and-bound algorithms. Then I use various subject-matter oriented paradigms. Statistical sampling is an example of that.

Swaine: I think you'll have to explain how statistical sampling can be a programming paradigm.

Floyd: There are problems that are not in themselves statistical problems but where you can improve the performance of an algorithm by taking a random sample of the data set and pre-computing something for that, in effect, gives you an estimate of what your final result is going to be.

The algorithm that Ron Rivest and I developed for calculating the median and other quantiles is an example of that. Random sampling is in no way required, but it turns out to be the way that we got an algorithm that appears to be asymptotically close to the maximum efficiency, and I don't think anybody has discovered a non-sampling algorithm that can accomplish that.

Here's another example. I have a mathematical sample that is quite tightly clustered, where the mean is very large compared to the standard deviation, and I want to calculate the standard deviation very precisely. The obvious ways of calculating standard deviation are numerically rather dangerous. Not only can they be very inaccurate, but the cumulative effect of rounding errors can very easily lead to your taking the square root of a negative number. One way to attack that problem would be to take a random subset of the data, using that get an estimate of the mean, then to run through the entire data set, subtracting that estimate off all the data. That doesn't change the standard deviation, but it does scale the data down in magnitude so that the variations among them are now large compared to the data themselves. And that makes the problem numerically well conditioned.

Swaine: So in one case you've used sampling to produce an efficient algorithm, and in another case to improve accuracy?

Floyd: Yes, that kind of use of sampling is a general paradigm. I've used it in lots of areas that are not necessarily terribly close to one another. It's obviously not something that's going to be put forward as a way to solve all the world's problems, the way that functional programming and such are presented as the universal solvent. But I think that the high-level programmer is going to need to know a lot of these paradigms on the intermediate scale, the scale of the sentence and the paragraph rather than the scale of the page and the book.

Swaine: One paradigm that some have put forth as the universal solvent is structured programming. You describe it in your Turing Award lecture as "the dominant paradigm in most current treatments of programming methodology." But it has its detractors: Rich Gabriel at Software Development 88 last year said something to the effect that structured programming had done more damage to programmers' thought processes than the GOTO statement.

Floyd: I don't know why Rich would have said that.

Swaine: Don't take the quote too seriously; I'm misquoting from memory. But where do you stand on structured programming today?

Floyd: As I said in the lecture: "Its firmest advocates would acknowledge that it does not by itself suffice to make all hard problems easy." An example where the top-down structured programming approach doesn't work very effectively is the eight queens problem. Wirth tackled that in Program Development by Stepwise Refinement, and he got an answer, but it's a real kludge. There's an earlier paper of mine that solved it using a non-deterministic paradigm. You write a program for an imaginary non-deterministic computer that makes arbitrary choices in deciding where to put the queens on the chessboard, but also makes tests to see if it's still on track, to see if it's violated any rules yet, and aborts if it has made any mistakes. Then that can be macro-expanded into a backtracking program that backs up wherever the original program would have aborted, and systematically searches the entire space of choices.

The program that you write as a non-deterministic program is very simple and is itself within the classical top-down discipline, but the macro expansion from it produces a program in which the loops don't nest. If you looked directly at the result from the macro expansion you'd say that there's no structure here, this is a kludge, this is hacking. But there is a conceptual structure in the sense that it's the result of a simple transformation from a simple formulation.

That's the kind of thing that makes me think of the top-down viewpoint as excessively limited. It has its uses, but within that framework there are some thoughts that are too complicated to think.

Swaine: In your lecture, you dismiss the idea that automatic programming will solve the so-called software crisis. Is there any more promise there today?

Floyd: I don't think that there is going to be any universal solvent for programming. Programming is hard.

There has been some interesting work on automatic program generation. Cordell Green has concentrated on algorithms for sorting and searching. He estimates that a program to come close to human performance in this area has to simply know a lot of facts that a skilled programmer in this area would know. The number of facts is on the order of five hundred.

So one can't expect to march up to a completely naive computer and give it a problem and expect it to come out with a really neat program. Because really net programs are based on a lot of world knowledge.

I am sure that what he found in his subject matter you would find in any subject matter that people want to apply programming to. I know that if you want to do sophisticated graphic programming, there's a lot that you have to know about computational geometry and appropriate representation and algorithms for triangulation.

These things don't invoke a software crisis. These things are problems typically at the scale of the programs that I have been talking about, that fit on a page or so. But they are hard. You have to know a lot and be able to use that knowledge in order to be able to do these things well.

Was it Euclid who said "There is no royal road to geometry?" There is no royal road to programming. Programming is hard. On the other hand, there are problems that look very hard, but if you have the appropriate paradigm they become easy.

Swaine: For example?

Floyd: In the context of a spelling checker, I've got a misspelled word, and I want to compare it to a word from the dictionary, to measure how different they are. I'm looking for the most similar word in the dictionary; which is to say I want to know the minimum number of editing operations, from a limited set of editing operations, that will convert the one word into the other.

Years ago I gave that as a final project in a programming course in which I hadn't talked about dynamic programming, and nobody got anywhere with it. Those who did it ended up limiting themselves very severely, restricting themselves to looking at one or two editing operations. But if you formulate it as a dynamic programming problem where the subproblems are to measure the difference between every prefix of the one word and every prefix of the other, and you go through those in increasing length of these prefixes, it's a very straight-forward problem. Ten or twenty lines does it. I could write it in half an hour and have time left over to trim my nails.

Swaine: What general paradigm do you find most useful in your work?

Floyd: In my programming, I try to find mathematical characterizations of what I'm looking for, to limit my search space.

You're familiar with loops that you have to go through n plus one-half times? For example, you're processing a data set and one of the data is a sentinel. You have to read all the n+1 data, but the processing part omits the sentinel. That can be very difficult to explain to a novice programmer.

Swaine: And yet it's something that novice programmers run into immediately.

Floyd: Very definitely. Well, they can deduce from the statement of the problem some things that tell them that their programs are going to need some things that they might not guess. It's plausible that the program is going to use some kind of iteration, and that the processing step is going to be inside the loop so that they can do it as many times as they need to. But they're only going to process n items, while they need to read n+1 items. From this they can deduce that there has to be a read outside the loop or they have to have some conditional branching inside the loop. And when they argue that the test that controls the loop is going to have to always have one datum available to it, that tells them that they need one read outside the loop, and in particular before the loop. And if they're going to alternate between reading and processing, that tells them that inside the loop, they need first to process and then to read, which is contrary to what they'd expect.

So by just some logical reasoning you characterize the kind of program you need well enough that your search space is starting to get quite reasonable, even if you are a novice.

Swaine: Speaking of novices, one of the emphases of your Turing Award lecture was on the way in which we educate, people about programming. It strikes me that American education generally shortchanges students on the paradigm level when dealing with technical and quantitative matters. Most people don't even know how to check results for plausibility.

Floyd: Right. They come out of school never having learned to subject their answers to plausibility checks. A good many schools seem to adopt an approach to mathematical problem solving that shows the student how to turn himself into a machine. And machines don't worry about the plausibility of their results.

Swaine: I suppose that implementing plausibility checks in a programming language would be something of a challenge.

Floyd: Well, you could have a language in which every datum had dimension, and the type checking included dimension checking, so that you couldn't add feet to square feet and if you added feet to yards an automatic conversion took place. It would be fairly straightforward to incorporate that in a programming language if you wanted to do it.

Or you could write a programming language for banks. Banks neither create nor destroy money, and their programs should embody that fact. The programmer should not be able to make money disappear, because if there's money disappearing in the program, the chances are there's real money disappearing.

Swaine: That sounds useful. But to get back to education, where do paradigms fit into the computer science curriculum?

Floyd: Paradigms are not something you could have a separate course about. They fit in by getting professors to talk explicitly about the paradigms that they use in the particular subject discipline in which they're lecturing.

Swaine: It's appealing to believe that we could educate and excite new programmers about programming through the concepts and paradigms of programming as effectively as we now do through that sense of power or mastery or relief or whatever it is that they get when they finally make the machine do what they want. I assume you're doing that in your courses. How do you go about it?

Floyd: Right now I'm teaching a course in searching and sorting out of Knuth Volume 3, which is a huge compendium of great algorithms in that area and analyses of their performance, but which to my mind lacks a unifying viewpoint. What I use as the unifying viewpoint is the idea that every problem in that domain has a certain inherent difficulty that I call its entropy; the formula from information theory and thermodynamics that gives you that. So I use an information-and-entropy paradigm to get an overall viewpoint.

Swaine: How do you do that?

Floyd: In these sorting and searching programs, you are extracting information from data sets. The program makes various queries of the data, and each query gathers a certain amount of information, and you can derive a formula for that. Then you can characterize the algorithms by the rate at which they gather relevant information, and divide that into the entropy of the original problem to get the time that it's going to take.

A well known case is that you're sorting a data set and the incoming order of the data is random. Then each of the possible reorderings is equally likely and has probability 1/n! and the entropy of that problem turns out to be log of n! and, applying Sterling's approximation you end up with n log n. This tells you that any method that you apply that only extracts one bit of information per step will necessarily need time proportional to n log n to determine the right permutation to reorder the data.

They go up in sophistication from there, but one can view most sorting and searching problems, and some other problems like problems of large-scale movement of information in memory, from an information-and-entropy viewpoint. And you'll end up not only having some insight into whether an algorithm is efficient as a whole, but you'll also have a microscope with which you can look at each phase of an algorithm and ask, What's its information-theoretic efficiency? Is there room in there for potential improvement?

Swaine: Are there books on paradigms or problem solving that you admire?

Floyd: In one of your columns you cite George Polya's How to Solve It, but to my mind the great one is his two-volume Patterns of Plausible Inference. In it, he deals with several tough problems from the history of mathematics, such as the isoperimetric inequality. When you read Polya you not only learn about paradigms, you learn something about the subject matter.

Educating programmers is only part of the business of educating people about programming. It strikes me as plausible that the level of paradigms might be the appropriate level for talking to intelligent non-programmers about what programming is about. But that doesn't seem to happen at all.

Because of the way in which programmers acquire what they know, they often don't know what they know; they can't articulate it in any general terms.

Swaine: I just read an article that addresses exactly that. It was the public lecture that Tony Hoare gave on his induction to one of the professorships he held. He was describing adapting the Quicksort algorithm to find a particular quantile, and he illustrated it by hand using cards with numbers printed on them. The whole thing was nonmathematical and directed at an audience that was not at all familiar with the shoptalk of computing. It's in a book that just came out: Lectures in Computer Science, Hoare et al, from Prentice-Hall, 1989.

Floyd: There have only been a few good essays of that sort directed at the layperson. All too often, if they don't think that the computer is an electronic brain, they think that it's a glorified adding machine.

Swaine: You certainly staked out the high ground in your Turing Award lecture. What do you think the impact of the speech has been?

Floyd: The lecture had very little observable effect. It's been cited very little in the literature; I've checked that through science citation indices, and most of the citation has been peripheral: citation of specific examples or problems, rather than concerned with the thesis. My overall feeling is that the world has not advanced very much in that direction.

Swaine: Perhaps this interview will give the world a tiny nudge in that direction. It occurs to me that, although we've used the word paradigm liberally, we haven't really defined it except through example. Maybe that's the best way to define it. But could you give one more example of a paradigm?

Floyd: In high school you probably learned to solve quadratic equations by completing the square. The theorem says, You can complete the square. The paradigm says, In a lot of situations you ought to complete the square.