Lecturer's Précis - Coltheart, Curtis, Atkins, and Haller (1993)
"Models of Reading Aloud: Dual-Route and Parallel-Distributed-Processing Approaches"
Copyright Notice: This material was written and published in Wales by Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource, and subject only to acknowledging Derek J. Smith's rights under international copyright law to be identified as author may be freely downloaded and printed off in single complete copies solely for the purposes of private study and/or review. Commercial exploitation rights are reserved. The remote hyperlinks have been selected for the academic appropriacy of their contents; they were free of offensive and litigious content when selected, and will be periodically checked to have remained so. Copyright © 2010, High Tower Consultants Limited.
First published online 14:15 BST 7th May 2002, Copyright Derek J. Smith (Chartered Engineer). This version [HT.1 - transfer of copyright] dated 18:00 14th January 2010
An earlier version of this material appeared in Smith (1998; Chapter 5). It is repeated here with additional detail and supported with hyperlinks. Although the resulting paper is reasonably self-contained, it is primarily designed to be read as a subordinate file to our e-paper on "The History of the Psycholinguistic Flow Model". Readers new to dyslexia studies may prefer to familiarise themselves with the Marshall and Newcombe (1973) paper before proceeding. Alternatively, use the [glossary] links as and when you come to them. Readers unfamiliar with the basic concepts of the science of neural networks should also pre-read our e-handout on "Connectionism", noting carefully the difference between "Perceptrons" and "neural networks".
1 - Introduction
Following a seminal paper by Marshall and Newcombe (1973), it became fashionable (a) to regard all complex cognitive abilities, especially those involved in communication, as "co-operative processing" - or "modular" - architectures, (b) to attribute individual differences in the residual skills of neurologically compromised patients to individual differences in the brain injuries they had suffered, and (c) to use box-and-arrow flow diagrams to keep track of information in transit through the system. Models of this sort solved many theoretical and clinical problems, especially when it came to identifying which subset of modules was needed at a particular phase of a particular cognitive activity. One such focus of enquiry was the processing required during reading, where the key issue is what goes on when a line of text needs to be looked up in the mind's mental "lexicon" [glossary]. Some interesting phenomena are readily demonstrable .....
Exercise 1 - An Example of Differential Processing Requirements
1 Consider the difference between reciting the words of the national anthem and explaining the rules of the game of soccer. ANSWER: They both require speech output, but differ greatly in the nature of the processing needed to initiate that speech output. To recite the words of the national anthem requires the activation of a rote-learned auditory memory trace, such that each spoken word or short phrase somehow triggers the next in line, and little or no thought needs to be given to what the words are actually saying. To explain the rules of soccer, on the other hand, requires the activation of a large number of inter-associated images, episodic memories [glossary], procedural memories [glossary], and semantic memories [glossary], followed by careful sentence construction and word selection, and all subject to a higher-order narrative plan of some sort. [For (a lot) more on the sequence of events during speech production, see our e-paper on "Speech Errors, Speech Production Models, and Speech Pathology", if interested.]
Exercise 2 - Another Example of Differential Processing Requirements
1 Consider the difference between reading out loud the word "pedal" and the nonword "ledap". ANSWER: They both require much the same sound set to be produced, but differ greatly in the nature of the processing needed to initiate that output. The word "pedal" is probably perceived as a known linguistic whole, whereupon it activates (a) the appropriate central semantic node(s), and (b) the speech production mechanism. This approach cannot be taken with the nonword, because nonwords do not exist in the mind's dictionary of known linguistic wholes (or "visual input lexicon" [glossary]), nor can it have any central semantic referent [glossary] because there is no such thing as a ledap. The fact that most of us can quite happily read such made-up words therefore indicates the existence of an alternative processing pathway, specifically one which can "sound out" words which fail to get through the first pathway. It is the nature of these two pathways - or the "dual routes" as they are commonly known - which is under discussion in this paper.
In "Models of Reading Aloud", Coltheart et al continue in the Marshall and Newcombe tradition. Specifically, they consider a number of "dual route" (DR) models of reading proposed in the period 1973 to 1993, with a view to improving their explanatory power. In Sections 2 to 4 we summarise the DR position, and in Sections 5 and 6 we present and evaluate the authors' improved model.
2 - The Dual Route Theory of Reading
We have already seen from Exercises 1 and 2 that simply by varying the nature of a task we can recruit or discard the services of particular cognitive modules on demand [in Exercise 1 we switched the main semantic system in or out, and in Exercise 2 we did the same with the visual input lexicon]. Given the choice, however, the mind seems to like using its mental dictionary whenever it can [the "lexical route" on the DR model is the real word pathway in Exercise 2], but is usually quite adept at doing without it when it has no alternative [the "nonlexical route" on the DR model is the nonword pathway in Exercise 2]. Here is how the authors summarise the thinking behind the DR proposal .....
"By this view, any word the reader has learned is represented as an entry in a mental dictionary or internal lexicon, and such words can be read aloud by accessing the word's lexical entry from its printed form and retrieving from that entry the word's pronunciation. This is generally described as the lexical route for reading aloud. Readers can, of course, read aloud pronounceable letter strings that they have never seen before: nonwords, for example. Nonwords do not possess lexical entries. Therefore, dual-route theorists claim, the reader must also have available a nonlexical route for reading out loud: a system of rules specifying the relationships between letters and sounds in English. This nonlexical route allows the correct reading aloud of pronounceable nonwords and of words that obey the spelling-sound rules of English, but it delivers incorrect translations of the 'exception' or 'irregular' words of English, words like pint or colonel, that disobey the rules. In summary then, the lexical route will succeed when the input string is a word but will deliver no output when it is a nonword, whereas the nonlexical route will deliver correct output when the input string is a nonword or a regular word and will deliver an incorrect output (a 'regularisation error') when the input string is an exception word." (p589; italics original; bold emphasis added.)
..... and here are glossary explanations of the five key concepts identified therein .....
Key Concept - The Lexicon: See the Psycholinguistics Glossary.
Key Concept - To "Lexicalise": See the Psycholinguistics Glossary.
Key Concept - The Lexical Route: See the Psycholinguistics Glossary.
Key Concept - The Nonlexical Route: See the Psycholinguistics Glossary.
Key Concept - Dual Route Theory: See the Psycholinguistics Glossary.
3 - The Assessment of Acquired Dyslexia
A dyslexia may be classified as "acquired" if it developed in a previously normal reader following head injury or brain disease. Its principal diagnostic sign is, necessarily, some form of reading difficulty, but further diagnostic tests must then be used to determine which of Marshall and Newcombe's dyslexia subtypes one is dealing with. Each test varies a different stimulus dimension, such as regularity (e.g. pint/mint), concreteness-abstractness (e.g. table/truth), and wordness-nonwordness (e.g. pedal/ledap), and different tests probe the integrity of different underlying processes. It follows that the tests and the explanatory model must be adequately co-engineered - there should be just enough tests to reflect every part of the model, that is to say, no theoretical assertion should be left untested.
Coltheart et al then identify six traditionally difficult issues which candidate reading models need to address. These are .....
Question 1 - Does the proposed model adequately explain how skilled readers read exception words aloud?
Question 2 - Does the proposed model adequately explain how skilled readers read nonwords aloud?
Question 3 - Does the proposed model adequately explain how the visual lexical decision task [glossary] is performed?
Question 4 - Does the proposed model adequately explain how surface dyslexia [glossary] arises?
Question 5 - Does the proposed model adequately explain how phonological dyslexia [glossary] arises?
Question 6 - Does the proposed model adequately explain how developmental dyslexia [glossary
Here are some further exercises to consolidate your understanding of the above .....
Exercise 3 - More Examples of the Different Processing Routes
1 Read this sentence out loud. Now take a print of the Ellis (1982) transcoding model, and with a pair of scissors cut off all processing boxes NOT involved in reading out loud.
2 If you have done activity #1 properly, then the boxes which remain are those which (a) see the words, (b) understand them, (c) prepare a behavioural response to them [i.e. form an intention to say an answer out loud], (d) initiate that response, (e) select the words necessary for its execution, and then (f) operate the muscles to make all the necessary noises. [EXTENDED ANSWER AT END.]
3 Read this sentence silently to yourself. Take another print of the Ellis (1982) transcoding model, and with a pair of scissors cut off all processing boxes NOT involved in reading silently.
4 If you have done activity #3 properly, then the boxes which remain are those which (a) see the words, and (b) understand them. [EXTENDED ANSWER AT END.]
5 Read the word "mint" out loud. Take another print of the Ellis (1982) transcoding model, and with a pair of scissors cut off all processing boxes NOT needed when reading regular words out loud. As with activity #1, the boxes which remain are those which turn a visual perception into an act of speech production. But are you free to cut away the grapheme-phoneme conversion pathway? [ANSWER AT END.]
6 Now read the irregular word "pint" out loud. Does the skeleton model from activity #5 still cope? [ANSWER AT END.]
7 Now read the nonword "fint" out loud. Does the skeleton model from activity #5 now need changing? [ANSWER AT END.]
Exercise 4 - Dyslexia Subtypes
1 Take a fresh print of the Ellis (1982) transcoding model, and with a pair of scissors cut off the visual analysis module. Now imagine that a previously skilled adult reader had suffered a brain lesion with the same effect. What would be the nature and extent of his reading deficit?
2 Take a fresh print of the Ellis (1982) transcoding model, and with a pair of scissors cut off the grapheme-phoneme conversion module. What would be the nature of the reading deficit in this case?
3 Take a fresh print of the Ellis (1982) transcoding model, and with a pair of scissors cut the (red) flowline linking the visual input logogen system and the speech output logogen system. What would be the nature of the reading deficit in this case?
NOTE: The Ellis model contains 21 processing modules and 34 information flowlines. We could therefore play this type of game for a very long time before we exhausted all the permutations - indeed, if the model was perfect this would be a good thing to do. However, the model does have known weaknesses, so as an exercise in brief familiarisation we shall make do with the three instances shown above.
4 - The Substantive Issue
The authors then draw attention to a number of "connectionist" models developed during the late 1980s. They select Seidenberg and McClelland's (1989) "PDP Model" as the main object of debate [in fact, they describe it in such detail that we have presented it in a separate précis to save space here]. Here are some introductory notes about PDP .....
Key Concept - "Parallel Distributed Processing" (PDP): As the demand for ever more powerful computers grew during the 1950s and 1960s, electronics engineers tried getting their programs to run simultaneously on more than one processor. This allowed more than one set of decisions to be made simultaneously (hence "in parallel"), and at more than one location (hence "distributed"). When this type of computing architecture worked well (which was not always), it delivered substantial increases in processing speeds, and it was not long before cognitive theorists realised that this was how biological cognition had been working all along. As a result, some very influential modular [i.e. parallel and distributed] diagrams started to appear throughout the 1960s and 1970s [in our telling of the story we select Morton (1964), the Marshall and Newcombe (1973) paper already mentioned, Morton (1979), and Ellis (1982) as a reasonably indicative line of evolution]. Things were then given a slightly more "computational" emphasis when the connectionists McClelland and Rumelhart (1985) set up a large inter-University research team known as the Parallel Distributed Processing Research Group, whose basic beliefs are brought out in the following: "..... the processing system is assumed to consist of a highly interconnected network of units that take on activation values and communicate with other units by sending signals modulated by weights associated with the connections between the units ....." (McClelland and Rumelhart, 1985, p173). <<AUTHOR'S NOTE: The 1964 "supercomputer", the CDC 6600 [details], is an excellent example of one of these early parallel architectures, and - coincidentally or not - is of the same order of internal modular complexity as the Ellis (1982) model.>>
Now the reason Coltheart et al singled out Seidenberg and McClelland's PDP model for attention was that Seidenberg and McClelland had already accused an earlier DR model (Coltheart, 1978) of inadequacy. The thrust of this criticism has been that DR models were not very "connectionist" and could not learn from experience. In common with processing models going all the way back to Wernicke (1874) and Kussmaul (1878), they showed a static array of modules of known functionality and constant capacity. What went on inside these modules could be named, but not computationally described, and full adult levels of performance were presumed throughout.
Coltheart et al therefore looked carefully at the PDP model's ability to cope with the data on dyslexia, and the essence of the resulting criticique was that Seidenberg and McClelland were proposing only a single high-functionality processing route rather than the dual routes so far described. It was good connectionism, but bad psycholinguistics and even worse cognitive neuropsychology. The purpose of the present paper was to show just how bad that model would be at satisfying the six criteria of competence listed above, and the issue in a nutshell boils down to how the models handle (or mishandle) irregular and nonwords, as now summarised .....
"Does the human reading system contain a processing procedure that can correctly translate both exception words and nonwords from print to phonology? Seidenberg and McClelland (1989) say yes; dual route modellers say no." (p590.)
Here is Coltheart et al's final judgement .....
"We have now considered all six of the questions about reading posed earlier in this article. The dual-route model offers a satisfactory answer to all six questions; the Seidenberg-McClelland model offers a satisfactory answer to only one. Their model, after training to asymptote, reads exception words well; however, it cannot read nonwords as well as people can, it cannot perform lexical decisions as well as people can, it cannot be used to explain acquired dyslexias (surface dyslexia and phonological dyslexia), and it cannot be used to explain developmental dyslexias (surface dyslexia and phonological dyslexia)." (p597.)
Nevertheless, Coltheart et al had to concede the two specific advantages of the PDP approach, namely (a) that it uses neural networks to simulate cognitive processing modules, and can therefore inform you about what is going on at the algorithmic level of explanation, and (b) that it learns from experience. The authors responded to this criticism by overhauling the DR model in these two important respects .....
5 - The Improved Model
Coltheart et al called the improved DR model the "dual route cascaded" (DRC) model of reading .....
Key Concept - The Information "Cascade": The authors are here taking the everyday usage of the word "cascade" - as a waterfall of successive lesser falls - and using it metaphorically to describe an information processing architecture which passes information onwards along a chain of processes as soon as it can. Specifically, "as soon as there is any activation at the letter level, activation will be passed on to the word level" (p604). They select the word to contrast with the "thresholded" nature of the earlier models, where positive identifications needed to be made before onward transmission.
The implication of the cascade is that the lexical and nonlexical routes need to be regarded as competing in parallel to find the right perceptual identification of a given textual stimulus - it is just that when the stimulus is a regular known word the lexical route is likely to get to the answer first. On other occasions, such as when dealing with irregular words or nonwords, the grapheme-conversion conversion (GPC) module will already be part-way through its processing by the time the lexical route decides it has no match, and will finish the job off. The authors see the GPC module as working this way .....
"The GPC algorithm learns as follows. Words are presented in a random order, the spelling of each word being presented jointly with its phonetic transcription. For each word, the algorithm attempts to infer all the GPC rules that describe the relationship between that word's spelling and its pronunciation, and the inferred rules are used to update the rules in the current rule base (or to add new rules to this rule base if [necessary]). So, for a word like mint with its pronunciation /mint/, the rules m>/m/, i>/i/, n>/n/, and t>/t/ would be inferred, and the frequencies of these rules in the rule base would be incremented [.....] In the case of an irregular word like pint, a different rule for the grapheme i, the rule i>/I/, would be inferred and added to the rule base." (p599; italics original.)
Coltheart et al then show these suggestions in diagrammatic form, as now reproduced .....
A "High Bandwidth" Model of Reading: Here is Coltheart, Curtis, Atkins, and Haller's (1993) "dual route cascaded" model of reading. The lexical route (the entire left-hand side) caters for a relatively straightforward "dictionary lookup procedure" (p589), and includes such hypothetical information processing modules as Visual Word Detectors, the Semantic System (equivalent to the TPWSGWTAU in Gough, 1972), and a Phonological Output Lexicon. The non-lexical route (right-hand side) caters for a "letter-to-sound rule procedure" (p589), and includes a Grapheme-Phoneme Rule System (equivalent to Gough's Code Book). The lexical route can read words but not nonwords, and the non-lexical route can read nonwords and regular words, but misreads exception words by "regularising" them (p590). The model is therefore similar in most respects to the models of Ellis and Young (1988) and Kay, Lesser, and Coltheart (1992). However, note the use of four parallel flow lines between most of the processing modules. This allows for a two-way flow of both excitatory and inhibitory information (excitatory along the arrows, and inhibitory along the blobs), and is a major innovation in cognitive modelling because it forces theorists to look in more detail at the underlying nature of neural communication channels. When the visual lexicon has a visual match for a word and wants its meaning identified, it activates likely candidates in the Semantic System along the down arrow linking the two processing boxes and inhibits unlikely candidates over the descending inhibitory pathway. Equally, the Semantic System can "prime" the peripheral process to expect contextually likely word forms and not unlikely ones (ascending excitatory and inhibitory pathways respectively).
Redrawn from Coltheart, Curtis, Atkins, and Haller (1993, p598). This version Copyright © 2002, Derek J. Smith.
ASIDE: In fact, the four-wire arrangement shown above strongly echoes what goes on in telecommunications systems, where single wire ("simplex") communication channels allow only the most rudimentary exchange of information, two-wire ("half duplex") systems allow intermittent two-way exchange, and only four-wire ("full duplex") systems come close to allowing the sort of continuous two-way flow which rapid conversation requires. We recommend interested students to the website of Applied Signal Technology, Inc., of Sunnyvale, California, where they will find a paper by Treichler, Larimore, and Johnson (1999/2003 online). This includes two very helpful diagrams (specifically, Figure 1 and Figure 8) to show how "four wire trunks" are used in connecting geographically separate computer CPUs, and our strong suspicion is that the technical requirements of efficiently connecting up anatomically separate brain areas are often very similar.
6 - Evaluation
Structurally speaking, the DRC model here presented is a standard presentation of the cognitive modules believed to be involved in human language processing, and falls entirely within the spirit and conventions of the psycholinguistic modelling tradition [see generally our e-paper on "The History of the Psycholinguistic Flow Model", and more specifically the models of Morton (1981) and Ellis (1982)]. However, its unique selling proposition is that it is now as "computational" as models developed wholly within the PDP tradition. Specifically, it allows activation within one module to be cascaded onwards, while still perhaps incomplete, and it also offers better connectionist credentials in that it focuses on the algorithms at work within the modules and their ability to learn from experience. It thus answers criticism from the PDP camp without abandoning the dual route principle. Moreover, the fact that the model depicts several "four-wire" information flow pathways makes it additionally valuable, because such considerations are extremely rare in cognitive modelling.
Here are the key arguments put forward in this paper, in revision point format .....
7 - References
Exercise 3.2: If you have cut off the auditory input processing leg (top left) as well, then you are guilty of over-enthusiasm. Read the sentence again, and note how you are listening to your own speech output as you do so.
Exercise 3.4: Again you may need to leave elements of the auditory input processing leg in place, to account for our ability to hear oneself reading, even when not doing it out loud.
Exercise 3.5: The answer to this question is no, because the lexical and nonlexical routes should be treated as working in parallel [see Sections 6 and 7]. It is just that on this occasion the lexical route will find the answer first.
Exercise 3.6: The answer to this question is yes, although the grapheme-phoneme conversion module will be the source of the irregular form /pInt/.
Exercise 3.7: The answer to this question is no, because the lexical and nonlexical routes should be treated as working in parallel, even though on this occasion the former cannot contribute at all.