PROGRAMMING PARADIGMS

On the Paradigm's Beat at Neural Nets

Michael Swaine

Ron, Kent, and I spent the end of July in San Diego at the 1988 IEEE International Conference on Neural Networks (IEEE ICNN-88). Many luminaries of neural-network research were there, including Bart Kosko (USC), Robert Hecht-Nielsen (HNC Inc. and UCSD), and Teuvo Kohonen (Helsinki University of Technology), all of whom were on the IEEE ICNN-88 conference committee. Marvin Minsky gave an openIng talk in which he acknowledged what many neural-network people feel to be the case: that his critique of early work in the area derailed neural-net research for a decade. He hadn't intended his analysis to be prescrIptive, he told the crowd. He was just trying to explain why people seemed to be leaving the field, not to encourage others to join them. His talk seemed well-received.

We attended the lectures (some of them), walked the exhibit floor, scoured the press room for press releases, bought the proceedings. We walked up and down the poster aisles, mystified. Anyway I was.

The poster sessions were a strange sight: rows of 4 x 8 boards set up at blackboard height, each covered with scribbled notes or typeset copy, or sequences of drawings, or photographs. In front of each board stood a speaker, the author of whatever IS on the board, animatedly explainIng the notes or copy or pictures to a small group of people, or waiting silently for an audience that may never form.

At the end of one poster aisle a priest stood in front of an empty board, talking in Italian to an audience of one. To say that it was incongruous would be to claim a better grasp of the congruity of the conference than I was ever able to secure, but it did seem odd.

Too Narrow, Too Soon

The truth is, a lot of the poster sessions and other technical presentations went over my head. The conference had the feeling of an academic event. Some of the sessions could have been comprehensible and interesting only to a narrow subset of the already narrow set of attendees. Some presentations, though, took a broader perspective. There were dozens of presentations of algorithms for learning, an important issue in neural networks, and some of these offered some perspective. There was, for example, an overview of the back propagation learning technique by Paul Werbos, whose neuron based back propagation approach was turned down as a thesis topic at Harvard in 1972. There were dozens of talks on network architectures.

There were also some applications talks. Kent kept asking the pragmatist's question: What can you do with this stuff? The answer that kept coming back was: Solve tomorrow's problems. Certainly a paradigm that requires a parallel architecture is not going to solve many of today's problems for the vast majority of computer users. Some of tomorrow's practical problems that people at the conference were using neural networks to solve were image processing, handwriting analysis, and speech recognition. The general level of all this work appears to be rudimentary.

There were also a dozen or so talks specifically classified as associative memory talks. Associative memory is a rich area of neural-network research.

And there were some presentations that made the attempt to map the frontier. Bart Kosko, in particular, presented a real map, a diagram showing the relationships among several of these memory models. It clears away a lot of underbrush to see that Hopfield Circuits and Brain-State-in-a-Box models are both additive, as opposed to multiplicative (a.k.a. shunting) models, and that these are all special cases of bidirectional associative memory (BAM) models, which are special cases of adaptive bidirectional associative memory (ABAM) models. As Ron pointed out to me, Kosko's style, in writing and speaking, is exceptionally concise and to the point.

Some of the presentations that were at the opposite end of the clarity scale from Kosko's may have suffered from the usual problem of focusing on results to the exclusion of a specification of assumptions. This is particularly a problem in a young paradigm whose assumptions and terminology may be rather slippery.

Of course, that's what the conference was for, to foster communication among neural-networks researchers and developers, to expand the base of common understanding that makes a paradigm a paradigm. On the basis of conversations overheard, a journalistic sense of how the attendees seemed to feel about the conference, and what I was able to get out of the conference myself, I'd say it was successful. Still, I had the sense that I was a graduate student again, puzzling over experimental designs and equations, and I wonder if that's not how a lot of the attendees felt (even if they didn't have to struggle as hard). And if so, I wonder if that's the right place for neural networks research to be. Is it getting too narrow too soon?

Semantic Vacuousness

The emphasis on technique also begs the question: What does it all mean? Neural networks is both an approach to modeling the mind and a parallel-processing paradigm for programming. Psychological models need to refer to something psychological, but programming paradigms viewed just as programming paradigms don't have to mean anything. But I'm going to stick my neck out and say without any attempt to justify it that I think that the effectiveness of neural-network models, even viewed purely as a programming paradigm, will hinge on part on the semantics of the networks. What kind of semantics can we expect form neural networks?

Here's what Jerry Fodor, one of the leading thinkers on the philosophical foundations of cognitive research, has to say: "Rumor has it that, in semantic, AI is where the action is. ut alas, soberly considered, computer models provide no semantic theory at all, if what you mean by a semantic theory is an account of the relation between language and the world."

A computer system--neural network, expert system, database, or whatever--that contains the element BOISE and the element CITY and the element IS-A may be able to construct something that we are pleased to call a representation of the fact that Boise is a city, but nothing internal to the system connects any part of it to the actual city Boise, or to the real-world fact. It looks entirely possible that we won't get a scientific semantics even in psychology; even less in programming. What we can get is not even an approximation to this; it's strictly symbols defined in terms of other symbols. What is that worth? How does it limit us? These seem to me to be hard questions that somebody ought to be asking.

A Semi-facetious Glossary

A conscientious student of neural networks with time to digest all the material from the conference would do well to present a glossary of neural-network terms. There are a lot of them, and the relationships among them are not always clear even when the terms have been defined. I am wrapping this up a couple of days after the conference, scrambling to meet a superstretched deadline, writing to fit. And I'm a dilettante, really. I can only, under the circumstances, give you a kind of cut-and-paste glossary, a list of some of the oft-heard terms, defined just about as the presenters defined them. What such a glossary shows has more to do with the state of communication in a developing discipline than with the useful clarification of fundamental terms.

The brief glossary that follows is not really intended to be useful. For that, it would have to be much longer, for one thing. And taking people's words out of context like this pretty much guarantees that the words will confuse rather than edify. My purpose in juxtaposing these definitions, with all the gaps in definition that they imply, is to demonstrate how people communicate in a developing paradigm. I'm acting here as a kind of anthropologist of programming.

Learning--Any change in any synapse. Some neural networks learn, some stabilize. Few do both.

Stability--The content addressable memory (CAM) property of neural networks, the input ball rolling down the nearest attractor basin, dissipating energy as it rolls.

Backpropagation--A supervised learning algorithm, complete with particular choice of unit transfer functions (e.g., semi linear with logistic squashing), error function (e.g., meansquare error), and weight update rule :(e.g., using momentum).

Boltzmann machine (BM)--A parallel computing network consisting of simple processing units connected by bidirectional links.

Associative memory (AM)--A process dual to self-organized pattern formation in nonlinear dynamical systems.

Bidirectional associative memory (BAM)--An associative memory that uses forward nd backward bidirectional search to recall an associated bipolar vector pair from an input pair.

Bidirectional associative memory (BAM)--A minimal two-layer non-linear feedback network.