PROGRAMMING PARADIGMS

Neural Nets: A Cautionary View

Michael Swaine

The September issue of Byte magazine brought together some computer industry experts to discuss, among other things, the staying power of some new or newly popular technologies. The expert opinion was divided on whether or not neural nets are a flash in the pan. With all due respect to the experts, neural nets have already demonstrated their usefulness in several areas. If nothing else, they have a place in multidimensional pattern recognition.

Neural nets are a useful tool for solving certain kinds of problems. This column is about what they are not.

The New Connectionism

One thing neural nets are not is just another programming methodology. Neural nets, along with Parallel Distributed Processing, is part of a movement in cognitive and computer science called the "New Connectionism."

Parallel Distributed Processing (PDP) is on the cognitive side of the fence. According to David Rumelhart and James McClelland, whose book Parallel Distributed Processing (MIT Press, 1986) defined the discipline, PDP models "assume that information processing takes place through the interactions of a large number of simple processing elements called units, each sending excitatory and inhibitory signals to other units. In some cases, the units stand for possible hypotheses about such things as the letters in a particular display or the syntactic roles of the words in a particular sentence."

That sounds like a description of neural nets, and in fact neural nets more or less represent the computer science and engineering side of the New Connectionism. Generally, a neural net implementation for, say, picking tanks out of the foliage in grainy photographs, looks like a PDP system, but the goals of the implementors are different. Neural nets are built to get something practical done, rather than to model the mind.

There exists today an intricate weaving between cognitive and computer science. This column attempts to follow some of the threads of that common fabric.

The New Connectionism is a recent revival, with differences, of an old idea. Connectionism in this new form has a strong attraction for many computer and cognitive scientists. Jerry Fodor and Zenon Pylyshin examine both the attractions and the assumptions of the New Connectionism in "Connectionism and Cognitive Architecture: A Critical Analysis" in Connections and Symbols, eds. Steven Pinker and Jacques Mehler (MIT Press, 1988). "On the computer science side," they say, "connectionism appeals to theorists who think that serial machines are too weak and must be replaced with radically new parallel machines, while on the biological side it appeals to those who believe that cognition can only be understood if we study it as neuroscience.... It also appeals to many young cognitive scientists who view the approach as not only anti-establishment (and therefore desirable) but also rigorous and mathematical."

I intend to present here Fodor and Pylyshin's critique of the New Connectionism. Their critique evaluates the New Connectionism as a model of cognitive architecture; that's their interest, and the approach of philosophy, psychology, and linguistics. Why should this interest us as programmers or computer scientists? Because it is very relevant to understanding the potential of neural nets as a programming tool. If the New Connectionism is fundamentally incapable of modeling cognitive architecture, then neural nets are far less powerful than many neural net proponents believe.

It may not be obvious how close cognitive science and computer science have grown in recent years. Even the competitors of the PDP model in cognitive science on the one hand, and the competitors of neural nets in computer science on the other are the same these days. The Classical models that Fodor and Pylyshin set against the New Connectionism were derived from the structure of Turing and Von Neumann machines.

Psychological theory today shapes up largely as a battle between cognitive neural nets and cognitive Turing machines.

Rocks Regarded as Real

Fodor and Pylyshin begin by showing that both Connectionist models and Classical models want to operate at the same level of explanation. This is not a trivial point in their domain. Psychology has a long tradition of reductionism that has spawned several distinct schools.

One such school, Behaviorism, was founded in 1913 in a fiery essay by a relatively unknown young psychologist named John Watson. Watson called for a purely objective science of psychology, jettisoning all the fuzzy-headed introspection of the day. There was a lot to jettison, and so welcome was his argument that not long after this Watson was elected president of the main association of psychologists. Watson subsequently left academia to become an advertising agency executive, perhaps perceiving better than his followers the true mission of Behaviorism.

Behaviorism's most charismatic modern spokesperson was B.F. Skinner. In his autobiography, The Shaping of a Behaviorist, Skinner characterized his interest as radical behaviorism, in which the existence of subjective entities is denied. Skinner died this year, and just weeks before his death he told a reporter that his greatest regret was that he was not understood by his contemporaries. It's true; psychology has moved away from Behaviorist models, and the focus is now on cognitive models that do take things such as thoughts and ideas and mental representations seriously. The reductionism of the Behaviorist school, which may have served a purpose in bringing some rigor to the field 70-odd years ago, now looks deliberately obtuse to most psychologists.

Another reductionist trend of psychology has focused on neural connections. The mind, it maintains, is to be understood in terms of what neurons do: Psychology is neurology, period. This is also not a powerful force in psychology today, and it is important to realize that the New Connectionist models do not, by and large, subscribe to this view. Rumelhart and McClelland, in particular, say that new and useful concepts emerge at different levels of organization.

Fodor and Pylyshin contend that neither the New Connectionist models nor the Classical models are reductionist, and that both want to work at a cognitive level of organization; and they explain what they mean by cognitive.

The world, they argue, has a causal structure at many levels of analysis. There is a scientific story to be told about quarks, and there is a scientific story to be told about atoms, and about molecules. There is a legitimate science of geology, which legitimately considers such entities as rocks and tectonic plates. While we certainly hope that all these stories will be consistent, we don't call quantum physics a new theory of geology, and deny to geologists the reality of rocks. Different models of explanation are appropriate at different levels of observation.

Fodor and Pylyshin maintain, convincingly and apparently uncontroversially, that the appropriate level of explanation for any account of cognitive architecture (such as the Connectionist or Classical models) is the representational states of the organism.

In other words, symbols.

This is an important point for cognitive science because it defines the goals of these two approaches and gives them common observations to examine. Because it defines techniques, it is just as important but far less contentious in computer science. Everyone would agree that you could make a large system maximally fast and efficient by treating it as one entity and coding in machine language. No one would work this way. Divide-and-conquer is one of the most fundamental paradigms of programming. Large programs need intermediate structure, and we would not generally consider it an improvement to strip the objects out of an object-oriented design, the structure from a structured program, the subroutines from a system.

So Connectionist and Classical cognitive science agree about the desired level of explanation; and developers of neural nets and Turing machines agree about the need for intermediate structures. All disputants agree on the need for symbolic processing, although we haven't yet defined what symbols are and how they are to be processed.

So what is the nature of the disagreement between the Classical and Connectionist approaches, which Fodor and Pylyshin say is serious?

It's Not Who You Know, It's How You Know Them

The difference is in what symbols are and in how the system is allowed to operate on them.

For Classical mental models, semantic content is assigned to expressions. For Connectionist models, it's assigned to nodes. These are the symbols, the things that represent something, in the two approaches.

The two approaches also differ in the kinds of primitive operations that can be applied to these content-bearing entities. Connectionist models only allow causal connections as primitive relations among the nodes: When you know how activation and inhibition flow among them, you know everything there is to know about how the nodes in a network are related, claim Fodor and Pylyshin.

Classical models, on the other hand, allow various relations among their content-bearing entities, including, particularly, the relation of constituency. Here is what that implies: Classical models are committed to what Fodor and Pylyshin call "symbol structures." That is, not all symbols are atomic symbols; some are made up of other symbols. As they put it, some content-bearing entities must have constituents that are also content-bearing, and the content of the composite entity must be a function of the contents of the constituents.

This is crucial to the Classical approach; in particular, it allows the processes of a Classical model to operate on an entity in terms of its structure, so that the same process that converts (P & Q) into P can also convert ((X & Y & Z) & (A & B & C)) into (X & Y & Z).

Consider Figure 1 (a)and how a Connectionist machine such as a neural net might interpret it. To the Connectionist machine, the paths in the diagram indicate the possible paths along which excitation and inhibition can flow. When the Connectionist machine draws the inference from (A & B) to A, what happens is that node (A & B) being excited causes node A to be excited.

Figure 1: Connectionist vs. Turing machine

 (a) (A & B)
      /   \
     A     B

 (b) (A & B)
    [(P & Q) --> P; (P & Q) --> Q]

Now consider a Turing machine drawing the inference from (A & B) to A; see Figure 1(b). The Turing machine contains a program that lets it replace any (P & Q) that it finds on its tape with a corresponding P. It reads (A & B), interprets that as a (P & Q) instance, extracts the P part, which is the A, and puts it on the tape.

Both approaches involve the use of symbols. In the Connectionist machine, the nodes (A & B) and A can represent propositions like Bill loves Mary and Mary drives a Jeep, and Bill loves Mary. In the Turing machine, the expressions on the tape can represent the same propositions. But the symbols in the Connectionist machine are all atomic, while the symbols in the Turing machine can have structure.

So the architectural difference between the models is this: In the Classical machine, the objects to which the content A & B is ascribed literally contain, as proper parts, objects to which the content A is ascribed. Real-world constituency is modeled in the constituency relations of the Classical machine's objects. But in the Connectionist machine, none of this is true; the object to which the content A & B is ascribed is causally connected to the object to which the content A is ascribed; but there is no structural (part/whole) relation that holds between them. Although the label attached to the node (A & B) makes it look like it has structure, it does not.

Here's how Fodor and Pylyshin characterize the disagreement: Classical and Connectionist theories disagree about the nature of mental representations; for the former, but not for the latter, mental representations characteristically exhibit a combinatorial constituent structure and combinatorial semantics. Classical and Connectionist theories also disagree about the nature of mental processes; for the former, but not for the latter, mental processes are characteristically sensitive to the combinatorial structure of the representations on which they operate.

Fodor and Pylyshin claim that Connectionist models are wrong on both counts.

A Competency Hearing for Connectionism

Fodor and Pylyshin argue in psychological terms, but there are different ways to test theories and models against human behavior. You can look at actual performance or you can look at competence. Fodor and Pylyshin argue in terms of the latter, in terms of human capacities. The form of the argument is: For any system to be able to do such-and-such a thing that people are able to do, it must have such-and-such a form. There is an analogous argument on the computer science side: Any system that doesn't have such-and-such a form can't do such-and-such interesting things.

Fodor and Pylyshin argue in terms of the productivity of thought, the systematicity and compositionality of cognitive representations, and the systematicity of inference.

Productivity. There is a classic argument, most notably articulated by Noam Chomsky, that purports to prove that certain kinds of mental models can't account for human linguistic competence. Fodor and Pylyshin extend the argument from language to thought. Their argument runs something like this: Human beings are capable of thinking an unbounded variety of thoughts. This unbounded competence must be produced by finite means; there are only a finite number of neurons in the brain. To get unbounded competence by finite means, you need to treat the system of representations as consisting of expressions belonging to a [recursively] generated set. This works only when an unbounded number of the expressions are nonatomic. And this is just what can't happen in a Connectionist model. So, Fodor and Pylyshin conclude, the mind cannot be a PDP.

There is a counterargument to the productivity argument: That humans can't really think infinitely many thoughts. It's unconvincing, but hard to refute. The argument from the systematicity and compositionality of cognitive representations makes it unnecessary to refute it. That argument goes like this:

The ability to think certain thoughts is intrinsically related to the ability to think certain other thoughts. This makes sense only if the thoughts are made up of the same parts. It's not that you couldn't train a neural net to make the right associations between thoughts; it's just that there is nothing in connectionist structure that supports these associations. These associations are crucial; they are the very stuff of which thought is made, and Connectionist models have no explanation for them.

Finally, there's the argument from the systematicity of inference. A neural net model can be constructed to draw the inference A from (A & B), and to draw the inference B from (A & B). But a neural net can just as easily be constructed to make one of these inferences but not the other. We never see such lopsided mental ability in human beings. Why not? Connectionist models don't know.

This would seem to imply the following: If you want to build an inference engine that reasons properly, it can't be merely Connectionist.

Fodor and Pylyshin conclude from this sequence of arguments that something is deeply wrong with Connectionist architecture. This is what's wrong: Because Connectionist architecture denies syntactic and semantic structure in mental representations, it is forced to accept, as possible minds, systems that are arbitrarily unsystematic. This is blatantly contrary to observation. Consequently, Connectionist architecture is inadequate to explain the basic data in its domain.

Furthermore, it's not enough for a Connectionist to agree that all minds are systematic; he must also explain how nature contrives to produce only systematic minds. That, apparently, the Connectionist can't do without recourse to the only existing approach that does predict pervasive systematicity: The Classical approach.

This seems to me a fairly damning critique of Connectionism as a cognitive architecture, and it seems to have some implications for Connectionist computer programming, as well. For neural nets as programming tools, Fodor and Pylyshin's conclusions would appear to imply at least the following limitation: A neural net, viewed theoretically as a computational system, is the equal of a Turing machine. Both are general-purpose computing machines, and both can achieve anything a general-purpose computing machine can achieve. But for tasks requiring operations on symbol structures, neural nets alone are apparently not enough; for that, you need something like a Turing machine. Neural nets may suffice for the low-level implementation of an inference engine, but only if you use the neural net to implement a Turing machine or other conventional architecture, and implement the inference engine using that.