Lecturer's Précis - Hinton, Plaut, and Shallice (1993)
"Simulating Brain Damage"
Copyright Notice: This material was written and published in Wales by Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource, and subject only to acknowledging Derek J. Smith's rights under international copyright law to be identified as author may be freely downloaded and printed off in single complete copies solely for the purposes of private study and/or review. Commercial exploitation rights are reserved. The remote hyperlinks have been selected for the academic appropriacy of their contents; they were free of offensive and litigious content when selected, and will be periodically checked to have remained so. Copyright © 2010, High Tower Consultants Limited.
First published online 07:40 BST 6th May 2004, Copyright Derek J. Smith (Chartered Engineer). This version [HT.1 - transfer of copyright] dated 18:00 14th January 2010
Readers unfamiliar with the basic concepts of the science of neural networks should pre-read our e-handout on "Connectionism", noting carefully the difference between "Perceptrons" and "neural networks". There is a considerably more detailed history of the more general science of artificial intelligence in Parts 4 to 6 of our e-paper on "Short-Term Memory Subtypes in Computing and Artificial Intelligence", if interested. Readers unfamiliar with the basic concepts of modular psycholinguistics should pre-read our e-handout on "The Transcoding Model Series". Alternatively, use the [glossary] links as and when you come to them.
1 - Introduction
The Hinton, Plaut, and Shallice (1993) paper is historically important, in this reviewer's opinion, because it marks a major step forward in connectionist science, namely the move away from monolithic computing architectures to more modular ones [in which respect it draws much the same conclusions as Norris (1991)]. The first author, Geoffrey E. Hinton, is Professor of Machine Learning at the University of Toronto, and shares with a number of other workers the credit for having developed the use of "back propagation algorithms" in the early 1980s. The second author, David C. Plaut, having studied under Hinton, is now Professor of Psychology at Carnegie-Mellon University, and specialises in "parallel distributed processing", the branch of cognitive science which deals with modular cognition. And the third author, Tim Shallice, is Professor of Neuropsychology at the Institute of Cognitive Neuroscience at University College, London, and specialises in the mind's executive functions [glossary]. He is particularly well known (a) for having helped develop a number of assessments of human executive cognition [see under Multiple Errands Test and Six Elements Test in our Memory Glossary], and (b) for having teamed up with America's Donald Norman in the early 1980s to formulate their theory of the Supervisory Attentional System (SAS) [for more on which see Section 12 of our e-paper on "From Frontal Lobe Syndrome to Dysexecutive Syndrome"].
2 - The Model
The paper begins by highlighting the neuropsychological condition known as "acquired dyslexia" [glossary], in which reading problems occur in previously competent adults following brain injury or disease. The lead theorists in the modern history of this topic were Marshall and Newcombe (1966, 1973), and the essence of their explanation is (a) that normal reading ability reflects the co-ordinated simultaneous functioning of a number of discrete cognitive processes [see, for example, Marshall and Newcombe (1973)], and (b) that damage either to one of the processes or to the channels of their co-ordination will not just impair reading but impair it in a diagnostically characteristic way. Here are the modules proposed in the current paper .....
"Grapheme" Units: A grapheme [glossary] is a memory unit capable of representing a single functional unit within a written word, such as the word-initial u in "unit", the word-initial un in "unacceptable", or the word-final ing in "eating". This makes the grapheme the basic building block of the "orthography" [glossary] of a language. Thus when you learn to write, you are learning (a) the graphemes available to you, and (b) how to put them together to make words. The module here proposed is an array of artificial grapheme storage units, in which each artificial unit represents "a particular letter in a specific position within the word" (p60).
"Phoneme" Units: A phoneme [glossary] is a memory unit capable of representing a single functional unit within a spoken word, such as the word-initial /k/ in "cat". This makes the phoneme the basic building block of the "phonology" [glossary] of a language. Thus when you learn to speak, you are learning (a) the phonemes available to you, and (b) how to put them together to make words. The module here proposed is an array of artificial phoneme storage units, in which each artificial unit represents a particular sound.
"Sememe" Units: Likewise, a sememe is a memory unit capable of representing "the meanings of words" by coding them in terms of the "semantic features that describe the thing in question" (p60), and the sememe module here proposed is an array of artificial sememe storage units. The network was constructed to cope with 68 semantic features, and worked as follows .....
"The sememe units do not correspond directly to individual word meanings but rather to semantic features that describe the thing in question. The word cat activates such units as 'mammal', 'has legs', 'soft', and 'fierce'. Units representing such semantic features as 'transparent', 'tastes strong', 'part of limb', or 'made of wood', remain quiescent. Our network has 68 sememe units representing both physical and functional attributes of a word's definition. Each word that we chose was represented by a different combination of active and inactive sememe units." (Hinton, Plaut, and Shallice, 1993, p60; italics original.)
Intermediate Units: The intermediate units, as their name suggests, sit between the grapheme units and the sememe units. They contain artificial memory nodes whose sole function is to learn the association strengths between key patterns in other arrays. They store nothing substantive in their own right, and are thus fully in accordance with the "hidden layer" type of neural network which became popular within connectionism during the 1980s [e-handout]. The learned associations on this occasion allow the system as a whole to learn what letter shapes associate with what meanings, and what meanings associate with what sounds.
"Cleanup" Units: These are additional units designed to improve the performance of the sememe and phoneme arrays. They are discussed in greater detail below.
Key Concept - "Cleanup" Memory: Dealing predominantly with the sememe array, the authors argue forcefully that one of the necessary aspects of semantic encoding is the need to minimise confusion between similar word meanings, and they postulate an array of "cleanup" memory attached to, and supporting, the main semantic store. This ancillary resource is automatically invoked whenever two semantically similar input items excite identical or near-identical sets of units within the main coding array [as might happen, for example, when coding STONE and PEBBLE], and it allows sufficient additional encoding to take place for the final memory traces to be told apart. This is a neat solution to the confusibility problem, but it does mean that the resulting system has more components to go wrong; specifically, when this auxiliary memory is damaged, the accuracy of indexing the main store degrades even though its own content is intact.
The final ensemble of interlinked resources serves to provide the mind with a "semantic space" .....
Key Concept - Semantic Space: A "semantic space" is an abstract (i.e. platform-independent) conceptualisation of a large number of interlaced dimensions of semantic encoding (such as the 68 "semantic features" mentioned above). It is a figment of our imagining mind, in which we may record the distinctive features of something new by associating it with the distinguishing features of everything else we have ever experienced. It is a way of using a list of attributes to define an entity (just as our cat was coded as a soft, fierce, mammal, with legs). It is a "hypervolume", that is to say, a space which cannot exist in the real world, and which is therefore usually simplified (e.g. for purposes of conference presentation or textbook illustration) as a two- or three-dimensional Cartesian space. It is hence a powerful tool for metaphorically visualising a multi-dimensional coding system, as in the biological mind or computer databases. The basic network node is the sememe (as defined above), and the basis of each sememe's encoding is that it scores a certain value (albeit often zero) on every available dimension. Thus "thunder" would score zero on the "tastes strong" feature, but highly on "loud". Similarly, "knee" would score zero on "transparent" but would get top marks on "part of limb". [For a longer discussion, see Lowe (2004 online).
Here is how the authors see a semantic space being provided within their network .....
"The first three layers of the network [.....] take a word-form and convert it to a position somewhere in semantic space. Activity in the cleanup layer then draws the output of the network to the point corresponding to the closest meaning. The region around each word is what physicists and mathematicians know as a point attractor - whenever the network's initial output appears within a certain region, the network's state will inexorably be drawn to one position within the region. This notion of a semantic space dotted with attractors representing the meanings of words has proved valuable for understanding how our network operates and how it can make the same semantic errors that dyslexics do." (p62; bold emphasis added.)
And this is how they see the various modules interacting .....
Figure 1 - A Network for Reading Out Loud: Here we see an eight-module architecture capable of rudimentary reading out loud. Information about the written stimulus word - in this case "CAT" - enters the grapheme unit module [lower green panel] at the bottom of the diagram, and "lights up" (i.e. excites and activates in some way) the grapheme units corresponding to the graphemes |C|, |A|, and |T|. This pattern is then passed forward to the first of the intermediate unit arrays [lower white panel], where it activates the "hidden layer" of "complex association" units. This pattern is then passed forward in turn to the main array of sememe units [lower blue panel], where the cat-appropriate semantic features are switched on [remember that it takes time and training for the intermediate units to "learn" what weightings to apply when linking the graphemes and the corresponding sememes]. At this juncture, the first of the cleanup arrays [lower pink panel] is recruited to assist the discrimination of naturally similar items. Output from the sememe unit array is then passed forward to the second of the intermediate unit arrays [upper white panel], where it activates the hidden layer which has previously been trained to associate intended semantic utterances with the sounds necessary to synthesis their physical production. This pattern is then passed forward to the main array of phoneme units [upper blue panel], where the /kat/ phonetic features are switched on. The second cleanup array [upper pink panel] assists discrimination at this stage. And finally, output from the phoneme unit array is passed to an electronic speech synthesiser to produce audible sound. << AUTHOR'S NOTE: Information passes through this network in much the same way as it does on any of the large modern transcoding models, being constantly coded and recoded along the way. Taking Ellis (1982) as prototypical, start at the upper right input arrow of his diagram (i.e. the one representing textual input) and proceed diagonally down to the lower left output arrow (speech production). It is also much the same as can be traced out on the historical antecedents of such models, namely the diagrams produced by the 19th century aphasiologists - try the same tracking exercise on Kussmaul (1878) or Grashey (1885) to see how little has really changed. >>
Copyright © 2004, Derek J. Smith. Redrawn without the three-dimensional effect from Hinton, Plaut, and Shallice (1993, p64; unnumbered figure).
Finally, here is what the authors see happening to network performance when damaged .....
"If we damage the network by randomly changing the weights in the cleanup mechanism, for example, the boundaries of the attractor for each word will change. As a result, if the network is in a region of semantic space where it was previously drawn to one word, it may now be drawn to a semantically related one instead. Alternatively, if we disrupt the pathway coming from the input, the network's initial output may be closer to the meaning of a semantically related word than to the meaning of the word originally presented. This result clears up one of the first puzzles presented by deep dyslexia [glossary]: why damage to any part of the brain's semantic route produces an essentially similar pattern of misreading. [.....] According to our models, these errors arise naturally as the cleanup neurons use semantic information to try to make sense of the output of the damaged earlier stages." (p62.)
3 - Key Points and Evaluation
The research described in this paper may be seen as ushering in the third age of connectionist research. The first age was based upon internally unsophisticated systems such as Rosenblatt's Perceptron. The second age was based upon non-modular neural networks of the sort developed by Anderson, Hinton, Kohonen, and Sejnowski. And the third (and current) age is characterised by having a superordinate modular architecture involving many subcomponent neural networks, along the lines of the one described above. It is this processing modularity which is important, because not only are modular neural networks more competent, but they offer a convergence of explanation between connectionism and neuropsychological theory. Indeed, neuropsychology has been arguing for superordinate architectures of its own since the mid-19th century [timeline], and has recently become very interested in knowledge "domains" [see, for example, Allport (1985)] because they help to explain the category-specific nature of many clinical disorders [see, for example, Warrington and McCarthy (1987)]. What makes the paper additionally significant in our view is the fact that it also proposes a semantic network in at least one of its modules, in which respect we may view it as continuing the associationist tradition of knowledge [glossary].
Here are the key arguments put forward in this paper, in revision point format .....
4 - References