Lecturer's Précis - Coltheart, Curtis,
Atkins, and Haller (1993)
"Models of Reading Aloud: Dual-Route
and Parallel-Distributed-Processing Approaches"
Copyright Notice: This material was
written and published in Wales by Derek J. Smith (Chartered Engineer). It forms
part of a multifile e-learning resource, and subject only to acknowledging
Derek J. Smith's rights under international copyright law to be identified as
author may be freely downloaded and printed off in single complete copies
solely for the purposes of private study and/or review. Commercial exploitation
rights are reserved. The remote hyperlinks have been selected for the academic
appropriacy of their contents; they were free of offensive and litigious
content when selected, and will be periodically checked to have remained so. Copyright © 2002-2018, Derek J. Smith.
|
First published online 14:15 BST 7th May 2002,
Copyright Derek J. Smith (Chartered Engineer). This version [2.0 - copyright] 09:00 BST 2nd July
2018.
An earlier version of this material appeared in Smith (1998; Chapter 5). It is repeated here with additional detail and supported with hyperlinks. Although the resulting paper is reasonably self-contained, it is primarily designed to be read as a subordinate file to our e-paper on "The History of the Psycholinguistic Flow Model". Readers new to dyslexia studies may prefer to familiarise themselves with the Marshall and Newcombe (1973) paper before proceeding. Alternatively, use the [glossary] links as and when you come to them. Readers unfamiliar with the basic concepts of the science of neural networks should also pre-read our e-handout on "Connectionism", noting carefully the difference between "Perceptrons" and "neural networks". |
1 - Introduction
Following a seminal paper by Marshall and Newcombe (1973), it became fashionable (a) to regard all complex cognitive abilities, especially those involved in communication, as "co-operative processing" - or "modular" - architectures, (b) to attribute individual differences in the residual skills of neurologically compromised patients to individual differences in the brain injuries they had suffered, and (c) to use box-and-arrow flow diagrams to keep track of information in transit through the system. Models of this sort solved many theoretical and clinical problems, especially when it came to identifying which subset of modules was needed at a particular phase of a particular cognitive activity. One such focus of enquiry was the processing required during reading, where the key issue is what goes on when a line of text needs to be looked up in the mind's mental "lexicon" [glossary]. Some interesting phenomena are readily demonstrable .....
Exercise 1 - An
Example of Differential Processing Requirements 1 Consider the difference between reciting the words of the national anthem and explaining the rules of the game of soccer. ANSWER: They both require speech output, but differ greatly in the nature of the processing needed to initiate that speech output. To recite the words of the national anthem requires the activation of a rote-learned auditory memory trace, such that each spoken word or short phrase somehow triggers the next in line, and little or no thought needs to be given to what the words are actually saying. To explain the rules of soccer, on the other hand, requires the activation of a large number of inter-associated images, episodic memories [glossary], procedural memories [glossary], and semantic memories [glossary], followed by careful sentence construction and word selection, and all subject to a higher-order narrative plan of some sort. [For (a lot) more on the sequence of events during speech production, see our e-paper on "Speech Errors, Speech Production Models, and Speech Pathology", if interested.] |
Exercise 2 - Another
Example of Differential Processing Requirements 1 Consider the difference between reading out loud the word "pedal" and the nonword "ledap". ANSWER: They both require much the same sound set to be produced, but differ greatly in the nature of the processing needed to initiate that output. The word "pedal" is probably perceived as a known linguistic whole, whereupon it activates (a) the appropriate central semantic node(s), and (b) the speech production mechanism. This approach cannot be taken with the nonword, because nonwords do not exist in the mind's dictionary of known linguistic wholes (or "visual input lexicon" [glossary]), nor can it have any central semantic referent [glossary] because there is no such thing as a ledap. The fact that most of us can quite happily read such made-up words therefore indicates the existence of an alternative processing pathway, specifically one which can "sound out" words which fail to get through the first pathway. It is the nature of these two pathways - or the "dual routes" as they are commonly known - which is under discussion in this paper. |
In "Models of Reading Aloud", Coltheart et al continue in the Marshall and Newcombe tradition. Specifically, they consider a number of "dual route" (DR) models of reading proposed in the period 1973 to 1993, with a view to improving their explanatory power. In Sections 2 to 4 we summarise the DR position, and in Sections 5 and 6 we present and evaluate the authors' improved model.
2 - The Dual
Route Theory of Reading
We have already seen from Exercises 1 and 2 that simply by varying the nature of a task we can recruit or discard the services of particular cognitive modules on demand [in Exercise 1 we switched the main semantic system in or out, and in Exercise 2 we did the same with the visual input lexicon]. Given the choice, however, the mind seems to like using its mental dictionary whenever it can [the "lexical route" on the DR model is the real word pathway in Exercise 2], but is usually quite adept at doing without it when it has no alternative [the "nonlexical route" on the DR model is the nonword pathway in Exercise 2]. Here is how the authors summarise the thinking behind the DR proposal .....
"By
this view, any word the reader has learned is represented as an entry in a
mental dictionary or internal lexicon, and such words can be read aloud
by accessing the word's lexical entry from its printed form and retrieving from
that entry the word's pronunciation. This is generally described as the lexical
route for reading aloud. Readers can, of course, read aloud pronounceable
letter strings that they have never seen before: nonwords,
for example. Nonwords do not possess lexical entries.
Therefore, dual-route theorists claim, the reader must also have
available a nonlexical route for
reading out loud: a system of rules specifying the relationships between
letters and sounds in English. This nonlexical route
allows the correct reading aloud of pronounceable nonwords
and of words that obey the spelling-sound rules of English, but it delivers
incorrect translations of the 'exception' or 'irregular' words of English,
words like pint or colonel, that disobey the rules. In summary
then, the lexical route will succeed when the input string is a word but will
deliver no output when it is a nonword, whereas the nonlexical route will deliver correct output when the input
string is a nonword or a regular word and will
deliver an incorrect output (a 'regularisation error') when the input string is
an exception word." (p589; italics original; bold emphasis added.)
..... and here are glossary explanations of the five key concepts identified therein .....
Key Concept - The Lexicon: See the Psycholinguistics Glossary.
Key Concept - To "Lexicalise": See the Psycholinguistics Glossary.
Key Concept - The Lexical Route: See the Psycholinguistics Glossary.
Key Concept - The Nonlexical Route: See the Psycholinguistics Glossary.
Key Concept - Dual Route Theory: See the Psycholinguistics Glossary.
3 - The
Assessment of Acquired Dyslexia
A dyslexia may be classified as "acquired" if it developed in a previously normal reader following head injury or brain disease. Its principal diagnostic sign is, necessarily, some form of reading difficulty, but further diagnostic tests must then be used to determine which of Marshall and Newcombe's dyslexia subtypes one is dealing with. Each test varies a different stimulus dimension, such as regularity (e.g. pint/mint), concreteness-abstractness (e.g. table/truth), and wordness-nonwordness (e.g. pedal/ledap), and different tests probe the integrity of different underlying processes. It follows that the tests and the explanatory model must be adequately co-engineered - there should be just enough tests to reflect every part of the model, that is to say, no theoretical assertion should be left untested.
Coltheart et al then identify six traditionally difficult issues which candidate reading models need to address. These are .....
Question 1 - Does the proposed model adequately explain how skilled
readers read exception words aloud?
Question 2 - Does the proposed model adequately explain how skilled
readers read nonwords aloud?
Question 3 - Does the proposed model adequately explain how the visual
lexical decision task [glossary]
is performed?
Question 4 - Does the proposed model adequately explain how surface
dyslexia [glossary]
arises?
Question 5 - Does the proposed model adequately explain how
phonological dyslexia [glossary]
arises?
Question 6 - Does the proposed model adequately explain how developmental dyslexia [glossary
] arises?
Here are some further exercises to consolidate your understanding of the above .....
Exercise 3 - More
Examples of the Different Processing Routes 1
Read this sentence out loud. Now take a print of the Ellis (1982) transcoding model, and with a pair of scissors cut off
all processing boxes NOT involved in reading out loud. 2
If you have done activity #1 properly, then the boxes which remain are those
which (a) see the words, (b) understand them, (c) prepare a behavioural
response to them [i.e. form an intention to say an answer out loud], (d)
initiate that response, (e) select the words necessary for its execution, and
then (f) operate the muscles to make all the necessary noises. [EXTENDED ANSWER
AT END.] 3
Read this sentence silently to yourself. Take
another print of the Ellis (1982) transcoding model, and with a pair of scissors cut off
all processing boxes NOT involved in reading silently. 4
If you have done activity #3 properly, then the boxes which remain are those
which (a) see the words, and (b) understand them. [EXTENDED ANSWER AT END.] 5
Read the word "mint" out loud. Take another print of the Ellis (1982) transcoding model, and with a pair of scissors cut off
all processing boxes NOT needed when reading regular words out loud. As with
activity #1, the boxes which remain are those which turn a visual perception into
an act of speech production. But are you free to cut away the
grapheme-phoneme conversion pathway? [ANSWER AT END.] 6
Now read the irregular word "pint" out loud. Does the skeleton
model from activity #5 still cope? [ANSWER AT END.] 7 Now read the nonword "fint" out loud. Does the skeleton model from activity #5 now need changing? [ANSWER AT END.] |
Exercise 4 -
Dyslexia Subtypes 1
Take a fresh print of the Ellis (1982) transcoding model, and with a pair of scissors cut off
the visual analysis module. Now imagine that a previously skilled adult
reader had suffered a brain lesion with the same effect. What would be the
nature and extent of his reading deficit? 2
Take a fresh print of the Ellis (1982) transcoding model, and with a pair of scissors cut off
the grapheme-phoneme conversion module. What would be the nature of the
reading deficit in this case? 3
Take a fresh print of the Ellis (1982) transcoding model, and with a pair of scissors cut the
(red) flowline linking the visual input logogen system and the speech output
logogen system. What would be the nature of the reading deficit in this case? NOTE: The Ellis model contains 21 processing modules and 34 information flowlines. We could therefore play this type of game for a very long time before we exhausted all the permutations - indeed, if the model was perfect this would be a good thing to do. However, the model does have known weaknesses, so as an exercise in brief familiarisation we shall make do with the three instances shown above. |
4 - The
Substantive Issue
The authors then draw attention to a number of "connectionist" models developed during the late 1980s. They select Seidenberg and McClelland's (1989) "PDP Model" as the main object of debate [in fact, they describe it in such detail that we have presented it in a separate précis to save space here]. Here are some introductory notes about PDP .....
Key Concept - "Parallel Distributed Processing"
(PDP): As the demand for
ever more powerful computers grew during the 1950s and 1960s, electronics
engineers tried getting their programs to run simultaneously on more than one
processor. This allowed more than one set of decisions to be made
simultaneously (hence "in parallel"), and at more than one
location (hence "distributed"). When this type of computing
architecture worked well (which was not always), it delivered substantial
increases in processing speeds, and it was not long before cognitive theorists
realised that this was how biological cognition had been working all along.
As a result, some very influential modular [i.e. parallel and distributed]
diagrams started to appear throughout the 1960s and 1970s [in our telling of
the story we select Morton (1964), the Marshall and Newcombe (1973) paper already mentioned, Morton (1979), and Ellis (1982) as a
reasonably indicative line of evolution]. Things were then given a slightly
more "computational" emphasis when the connectionists McClelland and Rumelhart (1985) set up a large inter-University research
team known as the Parallel Distributed Processing Research Group,
whose basic beliefs are brought out in the following: "..... the processing system is assumed to consist of a highly
interconnected network of units that take on activation values and communicate
with other units by sending signals modulated by weights associated with the
connections between the units ....." (McClelland and Rumelhart, 1985, p173). <<AUTHOR'S NOTE: The 1964
"supercomputer", the CDC 6600 [details], is an excellent example of one of these early parallel
architectures, and - coincidentally or not - is of the same order of internal
modular complexity as the Ellis (1982) model.>>
Now the reason Coltheart et al singled out Seidenberg and McClelland's PDP model for attention was that Seidenberg and McClelland had already accused an earlier DR model (Coltheart, 1978) of inadequacy. The thrust of this criticism has been that DR models were not very "connectionist" and could not learn from experience. In common with processing models going all the way back to Wernicke (1874) and Kussmaul (1878), they showed a static array of modules of known functionality and constant capacity. What went on inside these modules could be named, but not computationally described, and full adult levels of performance were presumed throughout.
Coltheart et al therefore looked carefully at the PDP model's ability to cope with the data on dyslexia, and the essence of the resulting criticique was that Seidenberg and McClelland were proposing only a single high-functionality processing route rather than the dual routes so far described. It was good connectionism, but bad psycholinguistics and even worse cognitive neuropsychology. The purpose of the present paper was to show just how bad that model would be at satisfying the six criteria of competence listed above, and the issue in a nutshell boils down to how the models handle (or mishandle) irregular and nonwords, as now summarised .....
"Does
the human reading system contain a processing procedure that can correctly
translate both exception words and nonwords from
print to phonology? Seidenberg and McClelland (1989) say yes; dual route
modellers say no." (p590.)
Here is Coltheart et al's final judgement .....
"We
have now considered all six of the questions about reading posed earlier in
this article. The dual-route model offers a satisfactory answer to all six
questions; the Seidenberg-McClelland model offers a satisfactory answer to only
one. Their model, after training to asymptote, reads exception words well;
however, it cannot read nonwords as well as people
can, it cannot perform lexical decisions as well as people can, it cannot be
used to explain acquired dyslexias (surface dyslexia
and phonological dyslexia), and it cannot be used to explain developmental dyslexias (surface dyslexia and phonological
dyslexia)." (p597.)
Nevertheless, Coltheart et al had to concede the two specific advantages of the PDP approach, namely (a) that it uses neural networks to simulate cognitive processing modules, and can therefore inform you about what is going on at the algorithmic level of explanation, and (b) that it learns from experience. The authors responded to this criticism by overhauling the DR model in these two important respects .....
5 - The
Improved Model
Coltheart et al called the improved DR model the "dual route cascaded" (DRC) model of reading .....
Key Concept - The Information "Cascade": The authors are here taking the everyday
usage of the word "cascade" - as a waterfall of successive lesser
falls - and using it metaphorically to describe an information processing
architecture which passes information onwards along a chain of processes as
soon as it can. Specifically, "as soon as there is any
activation at the letter level, activation will be passed on to the word
level" (p604). They select the word to contrast with the "thresholded" nature of the earlier models, where
positive identifications needed to be made before onward transmission.
The implication of the cascade is that the lexical and nonlexical routes need to be regarded as competing in parallel to find the right perceptual identification of a given textual stimulus - it is just that when the stimulus is a regular known word the lexical route is likely to get to the answer first. On other occasions, such as when dealing with irregular words or nonwords, the grapheme-conversion conversion (GPC) module will already be part-way through its processing by the time the lexical route decides it has no match, and will finish the job off. The authors see the GPC module as working this way .....
"The
GPC algorithm learns as follows. Words are presented in a random order, the
spelling of each word being presented jointly with its phonetic transcription.
For each word, the algorithm attempts to infer all the GPC rules that describe
the relationship between that word's spelling and its pronunciation, and the
inferred rules are used to update the rules in the current rule base (or to add
new rules to this rule base if [necessary]). So, for a word like mint with
its pronunciation /mint/, the rules m>/m/, i>/i/, n>/n/, and t>/t/
would be inferred, and the frequencies of these rules in the rule base would be
incremented [.....] In the case of an irregular word like pint, a
different rule for the grapheme i, the rule i>/I/, would be inferred and added to the rule
base." (p599; italics original.)
Coltheart et al then show these suggestions in
diagrammatic form, as now reproduced .....
A "High
Bandwidth" Model of Reading:
Here is Coltheart, Curtis, Atkins, and Haller's (1993) "dual route
cascaded" model of reading. The lexical route (the entire left-hand
side) caters for a relatively straightforward "dictionary lookup
procedure" (p589), and includes such hypothetical information processing
modules as Visual Word Detectors, the Semantic System
(equivalent to the TPWSGWTAU in Gough, 1972), and a Phonological
Output Lexicon. The non-lexical route (right-hand side) caters for a
"letter-to-sound rule procedure" (p589), and includes a Grapheme-Phoneme
Rule System (equivalent to Gough's Code Book). The lexical route
can read words but not nonwords, and the
non-lexical route can read nonwords and regular
words, but misreads exception words by "regularising" them (p590).
The model is therefore similar in most respects to the models of Ellis and Young (1988) and Kay, Lesser, and Coltheart (1992). However, note the use of four parallel flow
lines between most of the processing modules. This allows for a two-way
flow of both excitatory and inhibitory information (excitatory along the
arrows, and inhibitory along the blobs), and is a major innovation in
cognitive modelling because it forces theorists to look in more detail at the
underlying nature of neural communication channels. When the visual lexicon
has a visual match for a word and wants its meaning identified, it activates
likely candidates in the Semantic System along the down arrow linking the two
processing boxes and inhibits unlikely candidates over the descending
inhibitory pathway. Equally, the Semantic System can "prime"
the peripheral process to expect contextually likely word forms and not
unlikely ones (ascending excitatory and inhibitory pathways respectively). If this diagram fails to load
automatically, it may be accessed separately at |
Redrawn from Coltheart, Curtis, Atkins, and Haller (1993, p598). This version Copyright © 2002, Derek J. Smith. |
ASIDE: In fact, the four-wire
arrangement shown above strongly echoes what goes on in telecommunications
systems, where single wire ("simplex") communication channels allow only
the most rudimentary exchange of information, two-wire ("half
duplex") systems allow intermittent two-way exchange, and only four-wire
("full duplex") systems come close to allowing the sort of continuous
two-way flow which rapid conversation requires. We recommend interested
students to the website of Applied Signal Technology, Inc., of Sunnyvale,
California, where they will find a paper by Treichler,
Larimore, and Johnson (1999/2003 online). This includes
two very helpful diagrams (specifically, Figure 1 and Figure 8) to show how
"four wire trunks" are used in connecting geographically separate
computer CPUs, and our strong suspicion is that the technical requirements of
efficiently connecting up anatomically separate brain areas are often very
similar.
6 - Evaluation
Structurally speaking, the DRC model here presented is a standard presentation of the cognitive modules believed to be involved in human language processing, and falls entirely within the spirit and conventions of the psycholinguistic modelling tradition [see generally our e-paper on "The History of the Psycholinguistic Flow Model", and more specifically the models of Morton (1981) and Ellis (1982)]. However, its unique selling proposition is that it is now as "computational" as models developed wholly within the PDP tradition. Specifically, it allows activation within one module to be cascaded onwards, while still perhaps incomplete, and it also offers better connectionist credentials in that it focuses on the algorithms at work within the modules and their ability to learn from experience. It thus answers criticism from the PDP camp without abandoning the dual route principle. Moreover, the fact that the model depicts several "four-wire" information flow pathways makes it additionally valuable, because such considerations are extremely rare in cognitive modelling.
Here are the key arguments put forward in this paper, in revision point format .....
7 - References
See the Master References List
[Home]
ANSWERS
Exercise
3.2: If you have cut off the auditory input processing leg
(top left) as well, then you are guilty of over-enthusiasm. Read the sentence
again, and note how you are listening to your own speech output as you do
so.
Exercise
3.4: Again you may need to leave elements of the auditory
input processing leg in place, to account for our ability to hear oneself
reading, even when not doing it out loud.
Exercise
3.5: The answer to this question is no, because the
lexical and nonlexical routes should be treated as
working in parallel [see Sections 6 and 7]. It is just that on this occasion
the lexical route will find the answer first.
Exercise
3.6: The answer to this question is yes, although the
grapheme-phoneme conversion module will be the source of the irregular form /pInt/.
Exercise
3.7: The answer to this question is no, because the
lexical and nonlexical routes should be treated as
working in parallel, even though on this occasion the former cannot contribute
at all.