Course Handout - An Introduction to Connectionism
Copyright Notice: This material was written and published in Wales by Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource, and subject only to acknowledging Derek J. Smith's rights under international copyright law to be identified as author may be freely downloaded and printed off in single complete copies solely for the purposes of private study and/or review. Commercial exploitation rights are reserved. The remote hyperlinks have been selected for the academic appropriacy of their contents; they were free of offensive and litigious content when selected, and will be periodically checked to have remained so. Copyright © 2010, High Tower Consultants Limited.
First published online 12:30 BST 1st July 2003, Copyright Derek J. Smith (Chartered Engineer). This version [HT.1 - transfer of copyright] dated 18:00 14th January 2010
Earlier versions of this material appeared in Smith (1996; Chapters 4 and 6). It is repeated here with minor amendments and supported with hyperlinks.
Read firstly the associated paper on Hebbian Cell Assembly Theory.
1 - From Cell Assembly to Neural Network
We begin by revisiting the notion of the cell assembly, to see how the concept has evolved since Hebb's days. What we find is that there have been regular minor upgrades to the concept, but that it still remains by far the best candidate for the job of engram. Moreover - and the significance of this cannot be understated - the concept is now irretrievably merged with an area of study known as neural networks - part of the science of artificial intelligence. In other words, psychology, biology, computing, electronics, and robotics now share a single common goal, namely unravelling the processes of memory and cognition; it is just that some workers prefer to study men and women directly, whilst others prefer things they can more easily dissect, such as goldfish, worms, and printed circuit boards.
The first phase was the so-called "Perceptron" studies. Laboratory experimentation with seeing machines, an area of study which goes back about half a century. The first sketches of what artificial neurons might look like were drawn up by Warren McCullough and Walter Pitts (various from 1943). They called their artificial neurons neurodes, and each was a simple arrangement of electronic components designed to do two main things, namely (a) to output a signal similar to the bursts of action potentials output by biological neurons, and (b) to see to it that this output was modulatable - sensitive to what other neurodes in the vicinity were telling it. In this way, quite simple combinations of neurodes were capable of performing the logical operations AND, OR, and NOT to much the same end effect as the electronic circuitry then being developed for use in computers. The next step was to take many such neurodes, wire them together, and see what happened. McCullough and Pitts themselves designed a pattern recognition network in 1947 and Marvin Minsky built one in 1951 (Minsky, 1994). However, the most famous of the early machines was built by Frank Rosenblatt in the mid-'fifties, and called a perceptron. The essence of Rosenblatt's design was that the neurodes were arranged into two banks, or "layers". One of these was connected to an artificial eye and called an input layer, and the other was connected to the chosen output mechanism (usually an array of flashing lights) and called an output layer. Each point in the input layer was wired to each point in the output layer, and the effective strength of these connections was specifically designed to be varied. Learning was then a process of changing the strength of the connections - a process known as weighting. Another early pattern recognition machine, Oliver Selfridge's (1959) pandemonium was also based on a two-layer design. However, early perceptrons were intrinsically very simply machines. They did not work that well, and in 1969 Marvin Minsky and Seymour Papert of the Artificial Intelligence Laboratory at MIT published a book entitled "Perceptrons" in which they threw serious doubt on how much such devices would ever be able to achieve.
The Minsky-Papert critique brought about a period of soul searching, prompting the development of more powerful methods, and culminating in the modern neural network. Unlike perceptrons, modern neural networks typically have three layers of artificial neurons. Now there is an input layer, an output layer, and - sandwiched between these - an "invisible", or "hidden", layer. All communication between the input and output layers has to go via the invisible layer, giving you two sets of connection weightings to play with, instead of one.
These three-layered neural networks soon proved far more competent than the older two layered ones. The pathfinders in this exciting and rapidly moving area were:
(i) David Marr: In a series of papers (eg. Marr, 1969, 1976, 1982) the British psychologist David Marr analysed the neural microstructure of the cerebellum and cerebrum, and described how biological neural circuits might be modelled as electronic devices.
(ii) James A. Anderson and Geoffrey Hinton: In 1981, Geoffrey Hinton and James A. Anderson edited "Parallel Models of Associative Memory", the first widely read account of NN modelling. Hinton has since gone on to develop NNs capable of translating simple sign language into the equivalent spoken words.
(iii) Teuvo Kohonen: In the early 1980s, Teuvo Kohonen of the University of Helsinki helped develop algorithms capable of recalculating network weights more rapidly after each learning trial (eg. Kohonen 1982, 1984).
(iv) Terrence Sejnowski: During the 1980s, Sejnowski and Rosenberg (eg. 1986) developed a connectionist model called NETtalk [detail], which was able to learn by example how to pronounce simple English sentences from written input.
The beauty of neural networks is that each electronic synapse is purpose-built to take care of its own weighting. The network, in other words, is free to acquire its knowledge in much the same way that biological brains acquire theirs - by learning from experience. In computing terms, they are important because they need no (very expensive) programming. As a result, neural networks are now being put to a growing variety of commercial applications, such as telling coin from coin in vending machines (they "recognise" the different bounces), checking your credit card spending pattern, and steering robots. And, reasonably enough, they have also had a major impact on cognitive theory. Neuroscientists now see neural networks as validating the traditional cell assembly concept: the neural network's weighted connections are simply the electronic equivalent of biology's variable synaptic strengths. Indeed, McNaughton and Nadel (1990) rate neural networks as such an important development of this concept that they term all such networks Hebb-Marr networks. Other authors prefer the term matrix memory to convey the same core concept. And the new science even has a new name - connectionism.
But the real importance of connectionist modelling is that it directly addresses the big issue of memory science, namely representation. Neural networks, in other words, have something to say about what the engram might look like. McClelland (1988) puts it this way:
"Representations in connectionist models are patterns of activation over the units in the network." (Op cit, p109.)
"In the distributed model of memory, each conceptual object is thought of as a pattern of activation over a number of simple processing units." (Ibid, p110.)
"Processing in connectionist models occurs through the evolution of patterns of activation over time." (Ibid, p110.)
"In connectionist terms, the knowledge is stored in the connections. [And if] knowledge is in the connection weights, learning must occur through the adjustment of those weights." (Ibid, p111; emphasis added.)
2 - Parallel Distributed Processing
The 1981 Hinton-Anderson paper also popularised the term "parallel" as a description of cognitive processing. What they meant by this was any processing architecture which allowed more than one set of decisions to be made simultaneously (hence "in parallel"), and therefore at more than one location (hence "distributed"). And the point about distributed information processing is that the necessary memory stores also have to be distributed. Not long afterwards, McClelland and Rumelhart (1985) adopted these ideas, and set up a large inter-University research team known as the Parallel Distributed Processing Research Group, whose basic beliefs are brought out in the following:
"..... the processing system is assumed to consist of a highly interconnected network of units that take on activation values and communicate with other units by sending signals modulated by weights associated with the connections between the units ....." (McClelland and Rumelhart, 1985, p173.)
Parallel distributed processing (or PDP, for short) is thus a fairly standard neural network approach. And yet the PDP approach fully recognises that a single neural network is never going to be enough. It recognises instead that you need many separate neural networks, and, moreoever, that you need to connect them up in a very precise fashion. Giving you, of course, a network of neural networks [readers may find the Norris (1991) review particularly informative in this respect]. With PDP, therefore, you need to know (a) the internal architecture of each individual network (how many layers, what weighting algorithms, etc), and (b) the architecture of the higher order network as well (much as with the biological brain, where you need to know both microanatomy and macroanatomy at the same time).
3 - References