Speech Errors, Speech Production Models, and Speech Pathology
Copyright Notice: This material was written and published in Wales by Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource, and subject only to acknowledging Derek J. Smith's rights under international copyright law to be identified as author may be freely downloaded and printed off in single complete copies solely for the purposes of private study and/or review. Commercial exploitation rights are reserved. The remote hyperlinks have been selected for the academic appropriacy of their contents; they were free of offensive and litigious content when selected, and will be periodically checked to have remained so. Copyright © 2003-2018, Derek J. Smith.
First published 08:00 GMT 29th October 2003; this version [2.0 - copyright] 11:00 BST 9th July 2018.
Some of this material appeared in Smith (1997). It has here been considerably expanded and supported with hyperlinks. Speech and Language Therapy students will probably benefit from refreshing their memories on the difference between segmental and suprasegmental phonology [glossary] before proceeding.
When the language production system is working correctly, it is easy to underestimate its complexity. Every now and then, however, the system slips up and produces an error, and errors in any system can have a tremendous explanatory value. They can tell us, for example, whether apparently separate functions fail separately or together, and thus whether they probably derive from one or more modular processes. With further analysis, they can also tell us which modules communicate with which other modules, what form of encoding is being passed back and forth, and how well protected the communication links are against damage or interference. In this section, we look at the commonest types of everyday speech error.
1.1 Slips of the Tongue
In an early study of the sort of errors we all make in our everyday speech, Boomer and Laver (1968) judged that the phrase was one of the main units of speech production. They based this judgement on the empirical observation that errors rarely transcended phrase boundaries. Boomer and Laver's study prompted a wave of interest in this topic area, and culminated in some powerful new theories. Error corpus data was used, for example, by Gary S. Dell of the Beckman Institute, University of Illinois, to develop his "Spreading Activation Theory" of lexical access [to be discussed in detail in Section 3.1]. Dell (1986) identifies three levels of slip of the tongue error, as follows .....
(a) Sound Errors: These are accidental interchanges of sounds between words. Thus "snow flurries" might become "flow snurries". (Boomer and Laver had already claimed that segmental errors such as these account for about 60% of all errors.)
(b) Morpheme Errors: These are accidental interchanges of morphemes between words. Thus "self-destruct instruction" might become "self-instruct destruction".
(c) Word Errors: These are accidental transpositions of words. Thus "Writing a letter to my mother" might become "Writing a mother to my letter".
Additionally, each of these three levels of error may take various forms .....
(b) Perseverations: Where a later output item is corrupted by an element belonging to an earlier one. Thus "waking rabbits" - "waking wabbits".
(c) Deletions: Where an output element is somehow totally lost. Thus "same state" - "same sate".
Dell then points out that there is a clear same-category pattern to most error occurrences. Thus initial consonants will interact predominantly with other initial consonants, prefixes with other prefixes, and nouns with nouns. This is consistent with verbal storage and retrieval processes also being organised on some sort of same-category basis. Dell also points to the phenomenon of "accommodation", the fact that an error at an early output stage can nevertheless proceed quite happily through all the remaining output stages, without detection, and with subsequent syntactic and morphological changes being correctly but incongruously applied.
Exercise 1 - Speech Errors
1 Block copy the immediately preceding paragraph into a temporary word processing file, and edit it to contain examples of the following types of error .....
Anticipation, perseveration, transposition, and deletion of a sound; Anticipation, perseveration, transposition, and deletion of a morpheme; Word transpositions within a clause; Word transpositions between adjacent clauses; Word transpositions between non-adjacent clauses; Word transpositions between adjacent sentences; Word transpositions between non-adjacent sentences.
2 Read the resulting text out loud. Which types of error are unlikely, in your judgement, to happen in practice? Why? Do you agree with Boomer and Laver's observation that errors typically take place within clauses, not between them?
MIT's Stephanie Shattuck-Hufnagel has focussed on the role of word-onset consonants in speech production planning (Shattuck-Hufnagel, 1987). She found that what she called "sublexical" errors - speech errors in the delivery of an otherwise properly selected real word - tended mainly to affect word-onset consonants. She explained this using a "slot-and-filler" model of word construction, in which there is proposed a separate representation of (a) segments (or "fillers"), and (b) a framework (or "frame") of "slots" to lock those segments into position. The full model runs as follows .....
"Step 1. Selection of a set of candidate open class or content words from the lexicon [glossary - note that there are two substantially different usages of this word, and that Shattuck-Hufnagel here appears to be using the psycholinguistic one - Ed.]. Selection can be accomplished by transferring lexical items to a short-term processing store, or by marking their lexical representations temporarily [see NB below]. These candidate lexical items provide the set of phonemic segments among which final selection for the utterance will be made, and among which interaction errors can occur. The form of each lexical item specifies its segments and their serial order. Step 2. Construction of syllabic structure and other apparatus for associating main lexical stress to the open class lexical items. These processes incorporate the rest of the word minus the onset; word onset consonants are ignored until later. Step 3. Transfer or association of the non-onset portions of content morphemes, now organised into the metrical structures that govern lexical stress, to the emerging phrasal framework. While the hierarchical structure of the phrasal frames that receive the non-onset portions of the content words is not fully specified in this model, we propose that among other things they define two classes of components for open class items: word onset locations (which at this point in the processing remain empty), and locations for the rest of each word (which have now been filled). These two structural components are present for every content word in the phrasal frame, even for vowel-initial words whose word-onset consonant component will not be filled. Step 4. Eventual transfer or association of word-onset consonants into the word-onset locations for content words in the phrasal frame. All segments of the content words of the phrase are now in place. Step 5. Transformation of this representation with its accompanying hierarchical organisation into a complete string of discrete fully specified segmental elements, including those of grammatical morphemes, and subsequently into a pattern of motor commands characterised by substantial temporal overlap in the effects of adjacent segments. This process presumably involves many steps, among them one that is subject to single-segment errors at any position in the word. The influence of suprasegmental structure on interaction errors at this point in the processing is not clear, but non-interaction errors, which are distributed more evenly across word positions, may occur here." (Shattuck-Hufnagel, 1987, p47; bold emphasis added)
An interesting subtype of sound errors is the "Spoonerism". Spoonerisms are an often amusing cluster of word-initial transposition errors, and are named after the Oxford academic, the Reverend W.A. Spooner (1844-1930), in whom the affliction - it is safe to believe - occurred naturally (at least to begin with). Here are some examples of Spooner's own isms .....
Two qualities distinguish Spoonerisms from ordinary sound errors. The first is that the sound transposition generates two proper words, and the second is that the two new words themselves make some different sense together. However, close study of the natural history of the phenomenon raises other interesting observations. Here is an early sceptic, who clearly believed that Spooner was milking his defect for all it was worth .....
"Curiously enough the Spoonerism is named after a man who rarely made Spoonerisms as dictionaries define them. A recent study [Robbins (1966)] indicates that Spooner's Spoonerisms were rather carefully planned - high level humour rather than unintentional error ....." (Mackay, 1970, p323)
Mackay analysed previously published lists of Spoonerisms, carefully rejecting any which could be judged as intentionally humorous or otherwise spurious. Detailed analysis of the 179 examples which remained drew him to a number of conclusions concerning the likely units of speech production. Here are Mackay's specific observations .....
"1. Repeated phonemes usually occurred before and after the reversed phonemes. 2. Reversals before repeated phonemes were as common as reversals after repeated phonemes, contradicting chain association theories. 3. The syllabic position of reversed phonemes was almost invariably identical, indicating that syllables must be a unit in speech production. 4. Consonants in the initial position of syllables were more frequently reversed than would be expected by chance [.....] 5. Significantly more reversals involved the initial phoneme of words than would be expected by chance, indicating a lexical factor in Spoonerisms. 6. Distinctive features of reversed phonemes were usually similar except for place of articulation [.....]. This suggested the possibility of two distinct types of mechanism in speech production: one for Form of Articulation, including voicing, nasality, and openness, and another for Place of Articulation. 7. Consonants were more frequently transposed than vowels. 8. Reversed phonemes occurred closer together in words and sentences than could be expected by chance. 9. [.....] Spoonerisms in German and English were shown to be quantitatively similar [as were] Spoonerisms in Latin, Croatian, Greek, and French, suggesting that phoneme reversals may result from universal underlying mechanisms common to all speakers. 10. No support was found for chain association explanations of Spoonerisms ....." (Mackay, 1970, p347)
Mackay then suggested the existence of some sort of "buffer system", that is to say, a temporary memory store situated part-way down the motor hierarchy [remind me of this] and working on a "store-and-forward" basis.
ASIDE: This usage of the term "buffer" evolved in computer science, where it allows a sending module and a receiving module to work at slightly different speeds if necessary. Output from the sending module is passed to the buffer store at a speed the sending module finds convenient, and read from the buffer store by the receiving module at a speed the receiving module finds convenient. Since minor processing delays can now be absorbed within the individual modules, the operation of the system as a whole can, for a small investment in the intervening resource, be significantly improved.
Here is how Mackay saw the mental buffer system operating .....
"In the present study the reversed phonemes always originated in the same phrase, which further suggests that the buffer system displays no more than one phrase at a time. The syllable must be another unit since reversed phonemes tend to maintain the same syllabic position. However [.....] the fact that in Spoonerisms a unit smaller than the syllable crosses syllable boundaries, suggests the existence of smaller units. The question now arises as to whether phonemes are a unit in this hierarchy. [.....] Another set of questions relates to the buffer system. How much is specified in the buffer system? In the present model, for example, duration of phonemes is left unspecified, but phonemes, syllables, and stress are marked. In what form are the units in the buffer specified? Are articulatory goals or targets represented in the buffer rather than phonemes? Is stress independent of the elements that are stressed? How are syllables coded - in abstract form independent of the phonetic elements comprising them?" (Mackay, 1970, pp341-346)
This sort of progressive encoding and recoding, of course, is classic material for a box-and-arrow explanatory diagram, and, sure enough, Mackay obliges .....
Figure 1 - Mackay's (1970) Speech Production Model: Here is a relatively straightforward four-box model of the lower levels of the speech motor hierarchy. Lexical selection has already taken place, and the selected items "displayed" in the buffer system, "abstractly represented in correct serial order". Here is what happens next .....
"When this buffer system contains a word the corresponding phonemic units at the Individual Phoneme Level [topmost green box] become partially activated, along with a set of programs for modifying these phonemes at the Contextual Integration Level [middle green box]. These levels in turn feed into the Motor Unit Level [lower green box], where reciprocal inhibition is assumed to occur. These motor units code the contextual variants of phonemes, [.....] The units at the Individual Phoneme Level are unordered, and are activated in correct serial order through scanning of the buffer system." (Mackay, 1970, p348)
Redrawn from a black-and-white original in Mackay (1970; Figure 7). This graphic Copyright © 2003, Derek J. Smith.
1.3 The "Tip of the Tongue" Phenomenon
This is the name given to the relatively common everyday experience where we more or less know the word we want to say next, but are unable to bring it all the way to consciousness. The phenomenon has been known about for some time, but recent interest is normally dated to Brown and McNeill (1966), who carried out psycholinguistic research on 56 American undergraduates. They selected 49 low-frequency words (such as apse, nepotism, cloaca, ambergris, and sampan) and prepared brief dictionary definitions of each. Subjects were given a response sheet (similar to that used in Exercise 2 below), and were then presented with each definition (just like opening a dictionary at random, reading an entry, and then trying to guess the word to which it refers). Where subjects either knew or did not know the target word, no response was required, but on approximately 8.5% of trials, they experienced a tip-of-the-tongue (TOT) state - their lexicon had nearly delivered them up the target word, but not quite. On these occasions, they were required to guess at the missing word's first or last letters, the number of syllables it contained, and which syllable they thought carried the primary stress. However, before we discuss their results, here is an opportunity for you to experience the phenomenon for yourself .....
Brown and McNeill's subjects experienced a total of 360 TOT states, of which 233 were "positive TOTs", that is to say, TOTs "for which the data obtained could be scored as accurate or inaccurate" (p280), and the remainder were "negative TOTs", that is to say, TOTs "for which the subject judged the word read out not to have been his target and, in addition, one in which the subject proved unable to recall his own functional target" (p281). The trials were also scored for whether TOTs were similar in sound (Saipan, perhaps, for sampan) or meaning (houseboat, perhaps, for sampan) to the target. There were 224 similar-sound (SS) TOTs, and 95 similar-meaning (SM) TOTs. Of the SS items, 48% had the same number of syllables as the target, compared to only 20% of the SM words. These data were then modelled as though the human word stores were organised like a dictionary, albeit a very complicated one .....
"In real dictionaries, those that are books, entries are ordered alphabetically and bound in place. Such an arrangement is too simple and too inflexible to serve as a model for a mental dictionary. We will suppose that words are entered on keysort cards instead of pages and that the cards are punched for various features of the words entered. With real cards, paper ones, it is possible to retrieve from the total deck any subset punched for a common feature by putting a metal rod through the proper hole. We will suppose that there is in the mind some speedier equivalent of this retrieval technique. The model will be described in terms of a single example. When the target word was sextant, subjects heard the definition: 'A navigational instrument used in measuring angular distances, especially the altitude of sun, moon, and starts at sea'. This definition precipitated a TOT state in 9 subjects of the total 56. [.....] The problem begins with a definition rather than a word and so the subject must enter his dictionary backwards, or in a way that would be [.....] quite impossible for the dictionary that is a book. It is not impossible with keysort cards, providing we suppose that the cards are punched for some set of semantic features. [..... However] in the TOT case the [retrieval] must include a card with the definition of sextant entered on it but with the word itself incompletely entered." (Brown and McNeill, 1966, pp292-293; italics original)
Brown and McNeill then discuss at length exactly how this incomplete lexical entry might be coded. The most obvious suggestion was that it was coded by its first and last letters, so that saucepan, spaceman, and stamen would all be clustered together in some way - hence the Saipan-sampan confusion. But this was dismissed as a touch too simplistic. Instead, they preferred "something more like Sex_tanT" (p295), where not just the first and last letters, but also elements of the first and last syllables also played a part. Brown and McNeill named this type of recall by common feature "generic recall", and saw it as reflecting the coding systems used in verbal memory. This makes the TOT phenomenon itself, as well as the techniques of experimenting with it, relevant across a wide spectrum of communicative cognition, including speech perception, sentence production, and reading, as now demonstrated .....
More recently, Jones and Langford (1987) and Maylor (1990) have looked at how different types of distractor word can interfere with the TOT phenomenon. Maylor, for example, presented TOT items to 15 subjects in their 'fifties, 17 in their 'sixties, and 17 in their 'eighties. A distractor word was presented immediately after each target definition, separated only by a short bleep. These distractors had been carefully chosen to fall into one of four conditions. In Condition P the distractor was phonologically related to the target word (eg. baulk for braise), in Condition S it was semantically related (eg. incubus for banshee), in Condition U it was not related in either way (eg. fossilise for hospice), and in Condition PS it was simultaneously phonologically and semantically related (eg. abnormality for anachronism). They distinguish subjective TOT, where the subject reported the TOT state but could not retrieve any concrete facts about it, and objective TOT, where a letter or letters could be identified and a syllable count or stress location given. Their results indicated that both states occur more frequently when the distractor word was phonologically related to the target word than when it was phonologically unrelated.
And more recently still, Harley and Bown (1998) varied the frequency and phonological distinctiveness of the target words and found "that TOTs are more likely to arise on low-frequency words that have few close phonological neighbours" (p151). They use their data to reflect upon the broader process of "lexicalisation", which they define as "the process of phonological retrieval in speech production given a semantic input" (p152), and they opt for a "two-stage" explanatory model of lexical access, that is to say, a model which strictly separates each word's semantic and phonological representations. TOTs can therefore be seen as arising "when the first stage of lexical access is completed successfully, but not the second" (pp152-153). However, the critical point as far as Harley and Bown are concerned is as follows .....
"Our central result is that phonological neighbours contribute to, rather than hinder, phonological retrieval in speech production. [.....] A TOT occurs when the semantic specification successfully accesses the abstract lemma. This causes the 'feeling of knowing' the word. Nevertheless, the lemma is then unable to pass sufficient activation onto and thereby access the corresponding phonological word form. [.....] There are two possible reasons for failure at this stage. Either the connections between the lemma and the phonological forms might be weakened, or the phonological forms might themselves be weakly represented for these items." (Harley and Bown, 1998, p162)
Finally, although this section is primarily concerned with speech errors in normals, the similarity between the TOT phenomenon and the clinical sign known as "anomia" is too glaring not to get a comment. Goodglass, Kaplan, Weintraub, and Ackerman (1976) studied the confrontational naming ability of a population of aphasics, and began by pointing out this very similarity .....
"The designation of a patient as 'anomic' indicates that his access to lexical terms is poor in relation to the fluency of his articulation and grammar" (Goodglass et al, p145).
Goodglass et al then looked at patients' "tacit knowledge" of the first letter of, and number of syllables in, the words which they were failing to retrieve. In fact, they dated this sort of research to Weisenburg and McBride (1935), who formally recorded how many syllables anomics thought were in the lost names (a test known as the Proust-Lichtheim Test of Inner Speech, in which the patients show by raising an appropriate number of fingers how many syllables they believe are in the word they are having trouble with). In their own research, Goodglass et al tested 42 male aphasics, classified by the Boston Diagnostic Aphasia Test as 13 Broca's type, 8 Wernicke's type, 12 conduction type, and 9 anomic type. Each was shown 48 line-drawing stimulus cards of objects whose name was of intermediate word frequency in English, and containing one, two, three, or four-or-five syllables [for example, clamp, walrus, violin, and refrigerator). The authors conclude .....
"The results indicated a clear cut superiority on the part of conduction aphasics, as compared to Wernicke's and anomic subjects. Conduction aphasics identified both first letter and syllabic length of one third of the words which they could not name. Anomic aphasics succeeded in fewer than one of ten instances and Wernicke's aphasics were not much more successful. Broca's aphasics were correct in one try out of five, and could not be differentiated statistically from either the conduction aphasics on the one hand or the Wernicke's aphasics on the other." (Goodglass et al, 1976, p151)
As to why this should be, the authors looked at the sequential nature of the word production process .....
"..... it appears that word finding is usually an 'all-or-none' process for Wernicke and anomic patients, in the sense that they either recover a name well enough to produce it or they can give little evidence of partial knowledge. Words which are failed then seem to be totally unavailable, as far as recall processes are concerned. However the near perfect multiple choice selections by all subjects indicate that this is a one-way disorder involving recall, but not recognition, [//] In the case of the conduction aphasics the evidence of tacit partial knowledge of many words may indicate a breakdown at a later stage in the naming process. An inner auditory representation may be present but is prevented from setting into motion the final neural events which activate the articulatory system. Either the auditory model is incomplete or, as the disconnection hypothesis suggests, its route to the motor speech area is not consistently available. [//] The failure of Broca's aphasics to match the performance of the conduction aphasics is surprising, since it contradicts the traditional notion that their word finding difficulty is purely at the motor speech level." (Goodglass et al, 1976, p152)
Further differences between the conduction and transcortical motor types of aphasia are discussed in McCarthy and Warrington (1984).
Malapropisms are another phenomenon where the empirical data challenge one's preferred model of lexical organisation, thus .....
"From a collection of over 2000 errors in speech compiled by the first author, we initially selected all errors that involved word substitution (397). From this initial list we eliminated all errors that could have arisen from [other sources]. The remaining corpus comprised 183 errors. These errors, the malapropisms, have some interesting properties. First, the target and the error are of the same grammatical category in 99% of the cases. Second, the target and the error frequently have the same number of syllables (87% agreement in our list). Third, they almost always have the same stress pattern (98% agreement)." (Fay and Cutler, 1977, pp507-508).
Fay and Cutler continue .....
"At a certain point in the production of a sentence a grammatical structure must be framed to carry the meaning that the speaker intends to convey. This structure can be thought of as incorporating both the syntactic properties of the impending utterance (in the form, say, of a phrase structure), and the meanings of the words to be used. What is not in the structure initially is any specification of the phonological characteristics of the chosen words. For these the speech production device must look into its mental dictionary to find a particular entry whose meaning and syntactic category match the specifics embodied in the grammatical structure. [.....] What is this mental dictionary, or lexicon, like? We can conceive of it as similar to a printed dictionary, that is, as consisting of pairings of meanings with sound representations. A printed dictionary has listed at each entry a pronunciation of the word and its definition in terms of other words. In a similar fashion, the mental lexicon must represent at least some aspects of the meaning of the word, although surely not in the same way as does a printed dictionary; likewise, it must include information about the pronunciation of the word although, again, probably not in the same form as an ordinary dictionary." (Fay and Cutler, 1977, pp508-509; bold emphasis added; note that these authors are using the linguistic definition of lexicon, not the psycholinguistic - see glossary)
2 - Hesitations as Indicators of Thinking Time
"'Time is the measure of all things', not least mental activities; and time when people appear to be doing nothing is the kind of time psychologists most like to measure." (Butterworth, 1980, p155)
The idea that hesitation phenomena might indicate psychological processing time goes back to Donders' work in the 1860s, but modern interest is best placed with the work of Frieda Goldman-Eisler (various from 1951). In one of her early studies (Goldman-Eisler, 1958), she demonstrated that hesitation pauses preceded phrases rich in new information. She was then followed by Donald S. Boomer, who studied the relationship between both filled and silent pauses and their position within the grammatical clause (eg. Boomer, 1965). Boomer tape recorded spontaneous speech from 16 male American students, and analysed the transcripts for silences longer than 200 milliseconds, filled pauses, and the suprasegmental "phonemic clause" boundaries .....
Key Concept - Phonemic Clause: "A phonemic clause is an intonational unit consisting of a single intonation contour, one primary stress and a terminal juncture, and is also called a 'tone group'" (Butterworth, 1980, pp156-157). Alternatively, it is "a grammatical structure produced within a single intonation contour, and bounded by junctures [silences, or significant changes in phonetic pitch, stress, or duration]." (Crystal, 2003, p348)
Boomer then numbered the successive word boundaries in clauses, on the assumption that they presented "an ordered series of opportunities for hesitation" (p162). For example .....
1and 2the 3weather 4was 5hot
Here is his argument .....
"In general, there will be as many possible locations as words in the clause, each location being labelled with the ordinal number of the word it precedes. Occasional arbitrary exceptions were made in this study for multiple-element proper nouns such as Bill Smith and San Francisco, for combinatory groups like thank you and what-you-may-call-it, and for certain 'tags' such as you know and you see. These were counted as single words, as were syntactically superfluous repetitions of words, as in I took the ... the train. Filled pauses themselves and word-fragments were also excluded from the count. [//] The corpus contained a total of 1593 phonemic clauses of which 713 contained one or more hesitations. Hesitations totalled 1127, 749 unfilled pauses and 378 filled pauses. [.....] Results. The hypothesis that hesitations tend to occur at the beginning of phonemic clauses was strongly supported [although] the greatest frequency of hesitations is not at the outset but at position 2, after the first word of the clause [see the column of data highlighted in red in the table below - Ed.]. This is true for all nine of the array distributions representing clause lengths from two to ten words." (Boomer, 1965, pp162-163; italics original; bold emphasis added)
Rochester (1973) provides a handy review of the early literature on hesitations, if interested.
2.1 Lindsley's Work
Lindsley (1975) designed a study to determine how many sentence units are planned in advance of speech initiation. Noting that the subject and main verb introduces the part of the sentence known as the predicate, he identified three possible planning strategies, as follows .....
Pre-Predicate Model: This model "characterises a speaker who initiates his utterance as soon as he has completed selection of the subject" (Lindsley, 1975, p3).
Post-Predicate Model: This model "characterises a speaker who delays initiating his utterance until after he has completed selecting the verb as well as the subject" (Lindsley, 1975, p3).
Semi-Predicate Model: This is a compromise model which "characterises a speaker who delays initiating his utterance until after he has completed some selection of the verb as well as selection of the subject" (Lindsley, 1975, pp3-4)
Now the point about the three explanatory models is that they make different predictions on initiation latency. The pre-predicate model predicts that the to-be-selected verb contributes nothing to any speech initiation latency, and, by implication, that "the speaker responds as though he were treating the subject and verb as independent responses" (p3). The post-predicate model predicts that verb selection does contribute to the initiation latency, and, by implication, that "the speaker responds as though he were treating the subject and verb as interdependent aspects of a larger response unit: the sentence as a whole" (p3). And the semi-predicate model falls between these two extremes, accepting some contribution to latency from verb selection.
To test which model might be operating, Lindsley devised a picture-description task to generate response latency data. He presented subjects with pictures containing an actor (a man, woman, girl, or boy) engaged in a specific action (touching, kicking, greeting, etc.), and compared response latencies when producing utterances of different length and grammatical form .....
S-Only Sentences: Here the subject had to name the actor depicted on the card. Example: "The girl".
V-Only Sentences: Here the subject had to name the action depicted on the card. Example: "Greeting".
S-V Sentences: Here the subject had to name both the actor and the action in sentence form. Example: "The girl is greeting".
S-V-O Sentences: Here the subject had to name actor, action, and object in sentence form. Example: "The girl is greeting the boy".
The amount of lexical decision making was also varied by holding either S or V constant across a number of cards, so that (a) it would always be the girl, say, who was doing something (S constant), or (b) the action would always be greeting, say, regardless of which actor was depicted (V constant). Where a different S or V were possible, Lindsley codes them as dS, dS + dV, and dV sentences; where a constant S or V, Lindsley codes them as cS, cV, etc. Data were then obtained for all permutations of c and d and sentence type, including mixed sentences such as cS + dV. These data indicated (a) that it takes longer to initiate an S-V utterance than an S-only utterance, thus arguing against the pre-predicate model, and (b) that S+V utterances were shorter than S-only naming when the actor was already known, thus arguing against the post-predicate model. Lindsley therefore concluded .....
"..... it seems most likely that the speakers of S-V sentences, represented by dS + dV and cS + dV do employ consistently a specific speech strategy characterised by the semi-predicate model. This speech strategy entails an initial portion of verb selection being deliberately performed before the initiation of the utterance. Whenever this initial portion of verb selection occurs in series with subject selection or takes longer than any parallel stages of subject selection, it delays the initiation of the utterance until after it has been completed. [.....] The results of this research, singling out the Semi-predicate model, are consistent with those of the hesitation studies [citations] in demonstrating that speech is initiated before all information about an utterance has been processed or linguistically coded." (Lindsley, 1975, pp10-19)
Or to put it simply, we typically start talking to an idea as soon as we have decided on the subject and have some idea of the action we wish to describe.
2.2 Butterworth's Work
Butterworth (1975) studied not the individual pauses - what he termed "the microstructure of hesitation" - but rather the overall proportion of pausings to speech - "the macrostructure of hesitation" (Butterworth, 1975, p75). He adopted Henderson, Goldman-Eisler, and Skarbek's (1966) concept of the "temporal cycles" of speech, that is to say, alternating periods of hesitancy and fluency, and collected speech samples from eight male subjects. He then analysed transcripts of these samples for the demarcation points of both the cycles and the Ideas. Here are his conclusions .....
"Clause boundaries appear to be a necessary but not sufficient condition for the onset of both cycles and new Ideas, in that the vast majority of cycles and the Idea divisions given by any subject coincided with clause boundaries but a very substantial number of clause boundaries were not coincident with either cycles or Ideas. There was somewhat better match between sentences, Ideas, and cycles. Taking Ideas and sentences first, of 35 criterial Ideas boundaries - ie. where more than half the subjects agreed on the location of an Idea division - all but four coincided with sentence boundaries [.....]. Thus, of clause types, the relevant kind for Ideas seems to be sentences; but Ideas may consist of more than one sentence. [//] With regard to cycles, about half coincided with sentence starts and three-fourths will all kinds of clause boundaries. this left cycles consisting of more than one sentence in some cases and parts of sentences in about half the cases [.....//] The results presented here are consistent with the hypothesis that the cycles represent integral planning units for the speaker, and shed light on what these planning units consist of linguistically. First, the speaker tends to plan ahead in terms of well-understood linguistic units - namely clauses and sentences. Second, he appears to have the ability to chunk together several clauses or sentences as one superordinate planned structure integrated by some kind of semantic unity. [.....] If this is correct, then serious qualifications are required of Boomer's thesis that the main unit of planning is the phonemic clause (Boomer, 1965; Boomer and Laver, 1968). If speakers do encode speech into phonemic clause units, then this will occur well down the hierarchy of encoding processes and will be a process of a quite different kind from the planning of cyclic segments." (Butterworth, 1975, pp83-84)
In a later paper, Butterworth (1980) revisited the importance of hesitation data. He began by pointing out that sentence boundary pauses probably help listeners as much as speakers, because they give them time to consolidate their understanding of the message just received. He then drew again on the cycles described by Henderson, Goldman-Eisler, and Skarbek (1966) .....
"Typically, cycles last about 18 s, but some as long as 30 s, which means that they will contain, on average, five to eight clauses, ie. generally two or more sentences [citations]. Since, as we have seen, semantic factors were responsible for pause time variations, we should look for semantic rather than syntactic units. [//] I therefore asked independent judges to divide transcripts of speech like 'Ideas' [.....]. Taking those points in the texts where more than half the judges agreed that one Idea ended and the next began, and comparing these with cycle boundaries, a significant correspondence between Idea and cycle boundaries was found. Although the correspondence was reliable it was not complete. Some cycles did not begin at an Idea boundary, and some Ideas did not coincide with cycles. Why these discrepancies should occur is not clear. [.....] One thing is established, however: both Idea and cycle boundaries almost invariably coincide with clause boundaries." (Butterworth, 1980, p165; bold emphasis added)
Butterworth is another to buy into the buffer system concept, and, indeed, points out that there may well be more than one of the things to have to worry about .....
"Several authors, most notably Morton (1970), have argued for a 'Response Buffer' which can hold a string of words for output following lexical selection. This buffer is held to operate in both speech production and short-term memory tasks. [However,] Shallice and Butterworth (1977) reported one case of severe impairment of auditory-verbal STM, without a concomitant increase in the hesitancy of speech. The most plausible interpretation of these results is that, contra Morton, the buffer used in STM tasks is not used in speech." (Butterworth, 1980, pp165-166; italics original; bold emphasis added)
2.3 Developmental Data
Finally, MacWhinney and Osser (1977) give some developmental data. They studied 20 British five year olds, and analysed their hesitation behaviour by sex and social class. They concluded as follows .....
"The first major result of this study has been the identification of three major planning functions: preplanning, coplanning, and avoidance of superfluous vocalisation. The styles in verbal planning reflect basic differences in cognitive processing. Underlying all three planning function, however, is one central commonality - verbal planning takes time. While the speaker is trying to figure out what to say and how to say it, the conversation moves on. Given this inevitable forward movement in time and his own problems in formulating his utterance, the speaker may do one of two things. He may attempt to fully formulate what he is going to say before he says it. Alternatively, he may start talking and hope to be able to figure out his utterance in medias res. Whether he pauses initially or attempts to patch together an ongoing sentence, he has a further option. He may either use superfluous verbalisation to cover his pauses and errors or he may simply remain silent. The principal components analysis in this experiment indicates that the 13 hesitation phenomena examined in this study can be grouped into these three functional categories: coplanning, preplanning, and avoidance of superfluous verbalisation. [.....] The second major result of this study has been the finding that, for 5-year-olds, differences in verbal planning functions are more related to sex than to social class. Boys were found to do more coplanning, while girls made greater use of preplanning. Moreover, boys showed more use of superfluous verbalisations than girls." (MacWhinney and Osser, 1977, p984)
3 - Lexical Structure Models
In this section, we look in more detail at the ins and outs of lexical retrieval .....
Having considered the many threads of evidence then available to him, Dell (1986) proposed a "Spreading-Activation Theory" of lexical retrieval. Here is the main thread of his argument .....
"The principal assumption of the theory is that at each level a representation of the sentence-to-be-spoken is constructed. Thus, a planned utterance will exist at various times as a semantic, syntactic, morphological, and a phonological representation. The theory describes the construction of the latter three representations. [.....] The construction of a representation at each level goes on simultaneously with that of the other levels [ie. parallel processing - Ed.], with the rate of processing depending on factors intrinsic to the level and on the rate of processing of the level immediately above it. [.....] The basic idea of the theory is that the tagged nodes constituting a higher representation activate nodes that may be used for the immediately lower representation through a spreading-activation mechanism. The lower representation is constructed as the generative rules associated with that level build a frame, or ordered set of categories, and the insertion rules fill in the slots of the frame [.....]. When an item is selected for a slot, it becomes part of the developing lower representation and so it receives its tag. Thus, a principal mechanism for the translation of information from one representation to another is spreading activation through the lexicon. [.....] When a node has an activation level greater than zero, it sends some proportion of its activation level to all nodes connected to it (spreading). This proportion is not necessarily the same for each connection. When the activation [reaches] its destination node, it adds to that node's current activation level (summation). [.....] Activation is assumed to decay exponentially [over time] towards zero. These operations, spreading, summation, and decay, apply to all of the nodes in the lexical network at all times, regardless of whether the node is part of a representation (tagged or not). [.....] One of the important assumptions regarding spreading activation in the theory is that all connections are two way. If Node A connects to B, then B connects to A. Given the nature of the connections and the hierarchical structure of the lexical network, each connection can be classified as either excitatory top-down, or excitatory bottom-up. For each top-down connection, such as that from a particular morpheme to a particular phoneme, there is a bottom-up connection in the reverse direction. These bottom-up connections deliver positive feedback from later to earlier levels and play a critical role in the theory. Their presence makes processing in the network highly interactive [and] generates some nonobvious predictions. [.....] Constructing Representations: In this section I outline how a lower representation is constructed given a higher representation. The first important concept is that of the current node. It is that item of the higher level representation that is in the process of being transferred into corresponding items at the immediately lower level. [.....] When the construction of a lower representation begins, the current node is that node of the higher representation that is tagged as first. The initial step in the translation is the activation of the current node." (Dell, 1986, pp287-288; italics original; bold emphasis added)
The big question, of course, is which processes go wrong to produce a given error, and Dell's simple answer was that "no one process is at fault" (p289). Speech errors are simply natural consequences of the way the mind is organised. Thus .....
"For example, in the planning of an utterance many concepts would legitimately become activated that would not actually appear in the utterance. This background activation might include activation from concepts that were either presuppositions or inferences that were necessary in the semantic and pragmatic planning of the utterance. For example, if one were to say Could you close the door?, one would certainly have processed the presupposition that the door was open. As a result the concept for open might be active, and because of the spreading activation, the word, morpheme, and phoneme nodes associated with [it] would become activated, perhaps resulting in the slip Could you open, I mean close ....." (Dell, 1986, p291; italics original)
The error corpus data on the location of errors are also important .....
"In general, items are more likely to move short distances. Misordered sounds and morphemes tend to move to adjacent content words that are in the same phrase (Boomer and Laver, 1968; Garrett, 1975; MacKay, 1970). Misordered words move greater distances, possibly because the planning chunks at the syntactic level are larger, or because words can only move to appropriate syntactic slots (Garrett, 1975)." (Dell, 1986, p293)
4 - Modular Speech Production Models
Dell's theory deals primarily with word retrieval at a micro level. It accepts the basic two stage theory and identifies four levels of representation within those stages, but although it has a lot to say about what might be going on at neural levels, it does not fully address the modularity of the processing. Other workers have taken a more macroscopic view, and have not only tried to map the modules and processes involved in speech production, but have started to push upwards into the realms of pragmatics.
ASIDE: Praxis and pragmatics actually share the same linguistic root, namely the Greek word prassein = "to do", via its derivations praxis ("doing") and pragma ("deed"). Defects of praxis are known as dyspraxias. We have already dealt at length with the motor hierarchy under a number of separate headings. For example, it is the output leg on the standard A-shaped control hierarchy model. For specific examples see Craik (1945), Frank (1963), and Norman (1990), for the history of the motor hierarchy in general see our e-paper on "The Motor Hierarchy", and for an introduction to theories of biological motor programming, see our e-paper on "Motor Programming". And the motor hierarchy for speech is one of the output legs on the standard X-shaped psycholinguistic transcoding model. See, for example, Ellis (1982), Ellis and Young (1988), and Kay, Lesser, and Coltheart (1992). For the history of this model layout, see our e-paper on "Transcoding Models".
In this section, we shall look at some of the most influential of these modular models.
4.1 Lordat's Very Early Speech Production Model
This is the subject of a dedicated separate paper. See Lordat (1843).
4.2 Other Early Speech Production Models
Lordat's was the first of many 19th century speech production models, of which the following may be worth a quick browse, if interested in the historical aspects of the subject .....
Lichtheim (1885) (still the standard model for 21st Century medical training)
4.3 Shannon's Idealised Communication Model
The late 1940s saw a wave of interest in the engineering aspects of communication. This led to telecommunications experts borrowing freely from the aphasiology literature, and, in turn, to psycholinguistics borrowing back much of the resulting vocabulary during the ensuing decade [specifically, words such as encoding, information, working memory, signal-to-noise ratio, and feedback]. The engineer who did most to systematise the way we look at communication and its failures was Claude Shannon, then with the Bell Telephone Company [fuller story in our e-paper on "Shannonian Communication Theory", if interested].
4.4 Fromkin's "Utterance Generator" Model
The 1950s saw increasing interest in psycholinguistic experimentation, with major works by George Miller (Miller, 1951), Colin Cherry (Cherry, 1957), and Roger Brown (Brown, 1958). This experimentation was complemented by an expanding literature on the psycholinguistic impact of brain damage led by the likes of Harold Goodglass (1920-2002) and Norman Geschwind (1926-1984). This groundwork then generated a number of explanatory models across psychology as a whole. Donald Broadbent became one of the lead-theorists for selective attention, John Morton did the same for modular language processing, Atkinson and Shiffrin did the same for short-term memory, and Alan Baddeley added working memory. The pivotal work in the field of speech production was Fromkin's (1971, 1973) "Utterance Generator" model, which largely resurrected Lordat's 19th century scheme of things with the following six-stage explanatory analysis .....
Stage 1 Processing - Semantic System: This is where the meaning to be conveyed is first generated.
Stage 2 Processing - Syntactic System: This is where an appropriate syntactic "slot" structure is decided upon.
Stage 3 Processing - Lexical System: This is where content words are extracted from the lexicon to help give shape to the developing sentence.
Stage 4 Processing - Prosodic System: This is where an appropriate intonation pattern is decided upon.
Stage 5 Processing - Phonological Assembly: This is where function words [glossary] are inserted at key points in the emerging sentence structure, and then abstract sounds attached to the words and morphemes as they fall into position within each clause.
Stage 6 Processing - Phonetic System: This is where concrete sounds are attached to the abstract sounds, and muscle activation commences.
Exercise 4 - So is it Six Stages or Three?
Most of the 19th century models settled for three or four stages of speech production, and Requin, Riehle, and Seal (1988) have argued that three hierarchical processing levels is nature's norm for biological motor behaviour. Yet most of the models mentioned in this section end up with five or six. Suggest how this apparent disagreement might be explained.
4.5 Garrett's Speech Production Model
This is the subject of a dedicated separate paper. See Garrett (1990). [At two points in his model, Garrett shows two processes dealing with different aspects of the processing simultaneously - thus going some way towards fitting six processes into only three processing modules.]
4.6 Butterworth's Modern Speech Production Model
Drawing on both Fromkin and Garrett, Butterworth (1985) offers a flow diagram similar to Garrett's, in which the following "processing systems, or modules" are identified .....
Semantic System: This is Fromkin's "Stage 1 Processing" as defined above. Butterworth regards it as passing information to "the next three systems in parallel" (p68).
Syntactic System: This is Fromkin's "Stage 2 Processing" as defined above. Butterworth regards it as receiving the first of the three streams of information coming out of the semantic system, and as using this information to set up appropriate sentence and clause constructions.
Lexical System: This is Fromkin's "Stage 3 Processing" as defined above. Butterworth regards it as receiving the second stream of information coming out of the semantic system, and as using this information to select suitable words "from an inventory - lexicon - of word forms" (p69).
Prosodic System: This is Fromkin's "Stage 4 Processing" as defined above. Butterworth regards it as receiving the third stream of information coming out of the semantic system, and as using this information to choose "an appropriate intonation contour" (p69).
Phonological Assembly System: This is Fromkin's "Stage 5 Processing" as defined above. Butterworth regards it as setting up a "phonemic string with syntactic bracketing".
Phonetic System: This is Fromkin's "Stage 6 Processing" as defined above. Butterworth regards it as taking the output from the Phonological Assembly System, and as then generating suitable motor commands. This is the point at which the abstract phonemes begin to turn into concrete phones [glossary]. It is also the point at which coarticulation takes place.
Figure 2 - Butterworth's (1985) Speech Production Model: This diagram lays out the modules described above in the now-familiar general layout. We shall therefore comment only on the model's uniquenesses .....
If this diagram fails to load automatically, it may be accessed separately at
4.7 Levelt School Models
Levelt (1989) published a major monograph on speech production under the title "Speaking: From Intention to Articulation". As head of the Max Planck Institute for Psycholinguistics, one of his major points was to consider the difference between "lexical encoding", the retrieval (and creation if necessary) of words to express ideas, and "syntactic encoding", the retrieval and sequencing of words to express ideas .....
"But languages differ enormously in the degree to which they exploit [lexical encoding]. While a Turkish speaker's grammatical encoding consists for the most part of such lexical encoding, an English speaker is extremely 'conservative' in the sense that he normally uses words he has heard often in the past. For the English speaker, lexical encoding plays a minor role in grammatical encoding; the action is in syntactic encoding. A theory of the speaker should, of course, encompass both kinds of grammatical encoding. As a matter of fact, however, almost nothing is known about the psychology of lexical encoding." (Levelt, 1989, p186)
In an attempt to cast some light on the processes of lexical encoding, Levelt did much to popularise the use of the term "lemma" [see earlier discussion, Section 1.3]. Thus .....
"..... from the point of view of language production a lexical entry can be split up into two parts: its lemma and its form information . This theoretical distinction can be extended to the mental lexicon as a whole. Lemmas can be said to be 'in the lemma lexicon', and morpho-phonological forms to be 'in the form lexicon'. Each lemma 'points' to its corresponding form. [.....] The semantic information in a lemma specifies what conceptual conditions have to be fulfilled in the message for the lemma to be activated; it is the lemma's meaning. These conditions can be stated in the same propositional format as messages. [.....] A lemma's syntactic information specifies the item's syntactic category, its assignment of grammatical functions, and a set of diacritic feature variables or parameters." (Levelt, 1989, pp187-190)
Further down the system, Levelt sees the process of phonological encoding as working this way .....
"Phonological encoding is a process by which the phonological specifications of lexical items are retrieved and mapped onto a fluently pronounceable string of syllables. Unpacking a word's phonological specifications and using them to retrieve the appropriate syllable programs involves various levels of processing. Studies of the tip-of-the-tongue phenomenon in which this process of phonological unpacking is blocked or slowed, support this view." (Levelt, 1989, pp361-362)
Two years later, Donald (1991) drew on Levelt's work in his own "evolutionary" theory of the speech motor hierarchy .....
Figure 3 -Donald's (1991) Speech Production Model: This model was developed from earlier models by Butterworth (1980, 1985) and Levelt (1989). It places a Linguistic Controller L at the top of a "vertically integrated" speech system. L then creates "narrative models" out of ideas released to it (a) from episodic memory as the result of current stimulation, and (b) from a mental structure he calls the Mimetic Controller, a hypothetical mechanism believed to be responsible for the production of "conscious, self-initiated, representational acts that are intentional but not linguistic" (Donald, 1991, p168) [this being nothing less than the evolutionary advance which brought about the emergence of the modern human]. The lower processes are jointly responsible for the "lexical assembly" of the final utterance. This brings in subprocesses for selecting, sequencing, and determining the correct form of the words to be produced. The Phonetic Plan "maps the assembled utterance onto neuromotor paths and, ultimately, the vocal musculature". All this makes for the generally familiar layout, so again we only need to point out the model's uniquenesses .....
If this diagram fails to load automatically, it may be accessed separately at
Redrawn from a black-and-white original in Donald (1991, p260; Figure 7.2). This graphic Copyright © 2003, Derek J. Smith.
Finally, Levelt, Roelofs, and Meyer (1999) are typical of the latest offering from Levelt's research unit .....
Figure 4 - Levelt, Roelofs, and Meyer's (1999) Speech Production Model: This diagram, too, adopts the now familiar general layout, so again we shall note only its points of uniqueness .....
If this diagram fails to load automatically, it may be accessed separately at
Redrawn from a black-and-white original in Levelt, Roelofs, and Meyer (1999, p3; Figure 1). This graphic Copyright © 2003, Derek J. Smith.
5 - Feedback in Speech Production Models
The topic of feedback was introduced in our e-paper on "The Basics of Cybernetics", and is especially important to speech production theory. Gracco and Abbs (1987) are among many to point out that continuous speech involves continuous feedback, that is to say, that the continuous execution of a motor program requires an equally continuous stream of sensory information from muscle and cutaneous senses throughout the respiratory, laryngeal, and orofacial regions. Similarly, but at a higher level of analysis, Levelt (1989) devotes an entire chapter to the topic of self-monitoring and self-repair. Among the types of feedback Levelt deals with are .....
* Am I saying what I meant to say?
* Is this the way I meant to say it?
* Is what I am saying socially appropriate?
* Am I selecting the right words?
* Am I using the right syntax and morphology?
* Am I making any phonological errors?
* Is my articulation at the right speed and pitch?
Successful speech production, in other words, is a constant battle against error, and those errors can pop up anywhere. The phrases we then use to interrupt and correct ourselves (phrases such as "sorry", "I mean", "let me put that another way", etc.) are known generically as "editing expressions" (Hockett, 1967). Levelt (1989) summarised the issue thus .....
"The major feature of editor theories [of monitoring] is that production results are fed back through a device that is external to the production system. Such a device is called an editor or a monitor. This device can be distributed in the sense that it can check in-between results at different levels of processing. The editor may, for instance, monitor the construction of the preverbal message, the appropriateness of lexical access, the well-formedness of syntax, or the flawlessness of phonological-form access. There is, so to speak, a watchful little homonculus connected to each processor." (Levelt, 1989, pp467-468; italics original; bold emphasis added)
In this section, we look at how feedback and editing have been studied objectively .....
5.1 Early Studies
Lee (1951) pioneered a technique of replaying a person's speech to that person's own ears, subject to a variable time delay. Here is how he profiles his method .....
"In order to produce delayed speech feedback, it is necessary to return the speaker's speech to his own ears approximately one quarter second after he has spoken. This is best accomplished by means of a magnetic tape [machine]. The subject reads a moderately difficult text into the recording microphone with the playback gain control to the telephone headset turned off, and a normal reading pattern is established. The playback gain to the earphones is then advanced until the subject's speech is disturbed." (Lee, 1951, p53)
Using this experimental set-up, Lee found that there were two types of common effect. Subjects either (a) slowed down and raised their voices, or else (b) began to speak haltingly, repeating syllables in a form of "artificial stutter". The same phenomenon emerged with skilled tympanists reading a drum-beat, and for the key presses of skilled Morse Code operators. Lee gives the following specific examples .....
aluminum..... degrades to aluminum-num.....
ten-nine-eight-seven.... degrades to ten-nine-nine-eight-seven.....
Lee interpreted these findings as evidence of a multiple loop control hierarchy, with four levels of feedback, as follows .....
The "Thought Loop": The top control level releases individual thoughts for action, and then monitors that action for successful progress and completion. The highest level feedback loop then monitors the output for what would nowadays be termed its pragmatic appropriacy [strictly speaking, its "perlocutionary effect"].
The "Word Loop": The second highest loop monitors speech production for word selection accuracy.
The "Voice Loop": The third highest loop monitors speech production at whole-syllable level for morphological accuracy.
The Articulating Loop": Finally, the lowest loop monitors speech production checking that the right phonemes have been used within each syllable.
It is confusion at the hand-over between the second and the third level which presumably causes the aluminum-num syllable repetition. There were no single-phoneme repetitions. Here is Lee's own conclusion .....
"The satisfaction at each stage by a monitoring system is required; otherwise the machine halts, repeats, or repeats corrected. Repetition of sentences and words is volitional for emphasis, increased clarity, or correction of gross errors. Repetition of syllables is probably involuntary, or reflex, and it is at this stage that artificial stutter is manifested. Repetition of phonemes has not been artificially induced by delayed speech feedback in [our] observation ....." (Lee, 1951, p54)
So compelling were these early studies, that Mysak (1966) explicitly put cybernetics and speech pathology in bed together in his book "Speech Pathology and Feedback Theory".
5.2 Editing and Editing Expressions
Motley, Camden, and Baars (1982) argued the existence of a function of "prearticulatory editing", as follows .....
"Editing has been described as a phase of speech production which occurs after the phonological phase (ie. after the impending message has evolved its phonological representation) but before the articulatory phase, and which operates to test or check the linguistic integrity of the incoming phoneme strings. The edit presumably approves for subsequent articulation those phoneme strings which are linguistically appropriate; but vetoes and attempts to replace those which are linguistically anomalous, thus preventing their articulation." (Motley, Camden, and Baars, 1982, p578)
Motley et al then carried out a dozen or so studies in the late 1970s and early 1980s on cleverly induced Spoonerisms. They called their method SLIP - for "Spoonerisms of Laboratory-Induced Predisposition", explaining it as follows .....
"Subjects are instructed to read silently a series of tachistoscopically-presented word pairs, speaking aloud certain cued 'target' pairs. Unbeknownst to the subject, these target word pairs are immediately preceded in the series by 'interference' pairs designed to phonologically resemble the spoonerised version of the intended target. For example, the subject might read silently the interference items barred dorm and bought dog immediately before seeing and attempting to articulate the target darn bore. About 30% of subjects' attempted target utterances result in a spoonerism - barn door in this example. Our most typical design has been to compare the frequency of anomalous versus legitimate error utterances; anomaly being defined according to various linguistic and quasi-linguistic criteria." (Motley, Camden, and Baars, 1982, p579; italics original)
They then compared what they call the "slip-rate differential" between "legitimate" Spoonerisms and "anomalous" ones .....
"Our most typical result is that legitimate errors far outnumber anomalous ones. For example, a lexically legitimate SLIP spoonerisms like darn boor > barn door will occur much more frequently than a similar but lexically anomalous one like dart board > bart doard. [//] This slip-rate differential [has] been the primary form of evidence for prearticulatory editing. That is to say, the above example [.....] can be taken as evidence that when the SLIP subject constructs a lexically legitimate phoneme string, the string is allowed to be output; whereas when the subject constructs lexically anomalous potential output, it is vetoed (by the edit), and its articulation is disallowed." (Motley, Camden, and Baars, 1982, p579; italics original)
As to the underlying neural mechanisms, Crosson (1985) offers a view of speech production involving Broca's area, Wernicke's area, and various substructures of the thalamus and basal ganglia, all interlinked by circulating and re-circulating white matter tracts, and delivering both semantic and phonological monitoring. For details, see the separate paper, Crosson (1985).
Exercise 5 - Improving the Diagrams
Levelt's model has lost Butterworth's two-part higher functions system, so it fails to separate semantics and pragmatics. It has also lost Garrett's sentence type and clause structure frames, does not deal at all well with parallel processing [we criticise merely the diagram here, which does not reflect the full richness of the Levelt School's broader theory], and has only one "up arrow" when there are potentially many. Use your diagramming skills to produce a bigger, better, model [in other words, add in Levelt's "watchful little homonculus", if you can, and wherever you can].
6 - Pathological States Attributable to Defective Biological Control Systems
Now the reason box-and-arrow models are so important to clinicians is that there a number of very well known communication pathologies - not least, stuttering, dyspraxia, and dyslexia - which are actually cybernetic problems at heart. We have already covered stammering in Section 5.1, so here are some of the others .....
6.1 Dyslexia Resulting from Poor Head/Eye Muscle Control
Dyslexia is an inability to process visually presented text efficiently. Whilst this is at first sight a perceptual problem, the very complexity of the oculomotor control system makes it a motor problem as well. You cannot read if you cannot control the movement of your eyes. When reading this text, for example, your eyes will be fixating after every eight characters (about every one and a half words) (Rayner and Pollatsek, 1989), and many authorities (typically Pavlidis, 1981/1985) believe that developmental dyslexia can be explained by defects in sequencing these fixations for maximum information uptake. Developmental dyslexics do appear to have eye movement patterns which differ from those of normal readers (Rayner and Pollatsek, 1989). However, this factor per se has not been strongly confirmed. Indeed, Rayner and Pollatsek place greater store in Stein and Fowler's (1982, 1984) findings of "vergence control" problems in dyslexics. Vergence movements are those which keep both eyes pointing at the same centre of attention. In normal readers, the two eyes move "conjugately", that is to say, they track at the same speed and in the same direction. Stein and Fowler's data suggests that about one in six cases of developmental dyslexia can improve reading performance with treatment of this problem in isolation.
As to the cybernetics of eye control, the oculomotor control system serves a variety of biologically essential behaviours such as food search and predator avoidance (Galiana, 1990). It therefore needs to be every bit as functionally sophisticated as the skeletomuscular system it is helping to guide. This functionality is provided by having a complex of feedforward, predictive, and feedback control loops at work. To start with, there are mechanisms controlling the automatic focussing of the lens, binocular vergence, and the automatic stopping down of pupillary aperture. There are then additional mechanisms to control the automatic positioning of the eye relative to the head as the head moves relative to both the body and the external world. These latter mechanisms place heavy information processing demands on the vestibular system, the system which processes the information provided by the semicircular canals of the inner ear (the "labyrinth"), the body's balance detectors. Information from the semicircular canals travels to the brainstem down the vestibular branch of the vestibulocochlear nerve (CN VIII). Here it links in via the vestibular nuclei of the lower pons to the cerebellum and a host of other components of the extrapyramidal system. Good reviews of this subject area can be found in Peterson and Richmond (1988), Galiana (1990), and Berthoz, Graf, and Vidal (1992).
The motor disorders which characterise Parkinson's disease are conventionally attributed to disorders of muscle control circuitry. Wiener himself likened Parkinsonian tremor to the oscillations of under-"damped" control loops (Wiener, 1950), Flowers (1978) blames lack of prediction, Harrington and Haaland (1991) blame "central processing deficits", and Dinnerstein, Frigyesi, and Lowenthal (1962) blame slower than normal proprioceptive feedback for a variety of the standard Parkinsonian symptoms, such as rigidity, slowness, and lack of coordination.
6.3 Learning Difficulties
Many categories of learning difficulty present with an inability (amongst other things) to communicate effectively at a pragmatic level. This can be alleviated to a greater or lesser extent by training at what Williamson (1992) describes as "backchannel" skills. These include a wide variety of both vocal and nonvocal responses, such as nods, shakes, grunts, facial expressions, etc., whose function is to feed back to a speaker the extent to which his/her utterances are being understood. [This, by definition, must be working to Lee's highest level feedback loop - the "thought" loop.]
Anomia is an inability to find the name-word for something which is otherwise perfectly well understood. It is a very common clinical sign, and can arise from a variety of disease processes, both focal and diffuse, although it is particularly associated with injuries to the angular gyrus (Marshall, 1980). In its simplest form, anomia presents as difficulty with confrontational naming tasks, although the ability to describe a concept tangentially in the hope that this will compensate for the absence of its proper name is frequently preserved. Thus a patient might say "you cut your food with it" if s/he could not access the word "knife". This stratagem is known as circumlocution. It is even possible for the lost target word to be included in the circumlocution even though it had been unavailable in isolation, as with "I'd use it to comb my hair" as a substitution for the word "comb" (Benson, 1979). Marshall (1980, p62) passes on a nice example of word finding difficulties in a patient describing a picture .....
"'That's the ..... you know, the ..... very much like they got on the ..... on something very much. I don't say that it's the proper one but it's like er er ..... I can't say it but I can just ..... yes, that could be it, could be a bit like that, yes. [etc.] (Marshall, 1977)."
As far as the explanatory models are concerned, we are fortunate that they were originally drawn up with anomia in mind. Driven by the mass of clinical data accumulated since the 1860's, all modern models consistently separate ideation from word selection, that is to say, they separate the semantic system from the output lexicons [this being the crucial difference between the linguistic and psycholinguistic usage of the word lexicon]. Gnosis is what the semantic system does, and naming is what the speech output lexicon does. Morton himself relates anomic aphasia to problems moving outwards from the semantic system to the output lexicon, just as did the nineteenth century diagram makers before him. He then contrasts this with optic aphasia where there are problems moving inwards towards the semantic system from the visual input lexicons (Morton, 1985).
But anomic aphasia is not the only condition in which word finding difficulty is found. Benson (1979) distinguishes no less than nine subtypes, of which the following five are to some extent aphasic .....
(a) Word Production Anomia: This is a confrontational naming defect, but one which is resolvable upon phonemic cueing (or "prompting"). If the patient is given the first letter of the target word, the whole word suddenly becomes available. Patients can appear to "know" the target name, but either cannot initiate its production at all, or else produce a neologism instead. (This is therefore a condition analogous to the "tip of the tongue" phenomenon discussed in Section 1.3 above.)
(b) Word Selection Anomia: This is another confrontational naming defect, but this time it is not usually resolvable by cueing. Gnosis is intact (because patients can immediately point to the object in question if told its name), and conversational speech is otherwise fluent and effortless.
(c) Semantic Anomia: This is an inability to use an object's name as a mental symbol. It is superficially similar to (b), but patients cannot point to the object in question if told its name.
(d) Category-Specific Anomia: This is an anomia for a particular conceptual class of objects. It is quite rare, nevertheless it has prompted authors such as Baron (1976) and Allport (1985) to describe the semantic lexicon as having various regions (or "zones", or "domains", etc.), each dealing with a particular class of attributes. Thus an object's pictorial attributes, colour attributes, positional attributes, "eye-head-body movement" attributes, and even smell and taste attributes, are regarded as being stored in separate parts of one large distributed engram system.
(e) Modality-Specific Anomia: This is an anomia for objects presented in one modality but not another (visual, for example, but not auditory). However, it is probably best treated as an optic (or auditory) aphasia, rather than as an anomia as such.
This term derives from Kussmaul (1877) and refers to a Broca's-type aphasic condition characterised by sentence foreshortening and word morphology problems. The foreshortening is not haphazard, however, for it involves omitting many/all of a sentence's function words (articles, conjunctions, pronouns, prepositions, and auxiliary verbs) and inflectional word endings (-s, -ed, -ing). The end result is what is known as telegraphic speech, a word sequence built up mainly of nouns, but broken up by the occasional verb and qualifier (Goodglass and Menn, 1985). The conjunction "and" is often spared, although this may evidence a repair strategy more than a true cognitive ability. Here are some examples .....
"First morning, drink coffee, and sweep and go field, afternoon such a pill, one and go field ....." (Heilbronner, 1906, cited in McCarthy and Warrington, 1990.)
"Cinderella ... poor ... um 'dopted her ... scrubbed floor, um, tidy ... poor, um ... 'dopted ... Si-sisters and mother ... ball. Ball, prince, um, shoe ...[prompt to continue] Scrubbed and uh washed and uh ... tidy, uh, sisters and mother, prince, no, prince, yes. Cinderella hooked prince. (laughs). Um, um shoe, um, twelve o'clock, ball /pInaSt/, finished." (Schwartz, Linebarger, and Saffran, 1985, p84; Patient "ME".)
Further examples in McCarthy and Warrington (1985).
As far as the underlying anatomy is concerned McCarthy and Warrington (1990, p185) conclude that "the [Broca's symptom complex] is often associated with relatively widespread lesions affecting both anterior language areas (frontal lobe), deeper structures (insula), as well as anterior temporal lobe damage", and as far as the underlying processing is concerned Kolk, Van Grunsven, and Keyser (1985) and Caplan (1985) have explicitly linked agrammatic conditions to Garrett's model (which, it will be recalled, was originally developed from speech error data from normal subjects). They conclude that internal language is inherently telegraphic at the best of times, at least at all stages prior to Garrett's functional stage.
A similar line of argument has been developed more recently by Grodzinsky (1990), who has approached agrammatism as a linguist. He describes surface speech as lacking both non-lexical terminals and governed prepositions. Indeed, in stark contrast to the anomias, the only thing agrammatic patients are left with is a naming ability! However, it is unlikely that a final answer will be possible until more is known about normal speech production, that is to say, until we have better speech production models to work with. (And, specifically, models which can link the hard facts of linguistic theory to the more advanced theories of semantic memory structure.)
Exercise 6 - Agrammatism Simulated
1 Rewrite the preceding paragraph to exclude all articles, conjunctions, pronouns, prepositions, and auxiliary verbs, and all noun- and verb-root endings. Read the residual text out loud.
6.6 Jargon Aphasia
The term "jargon aphasia" derives from Alajouanine, Sabouraud, and Ribaucourt (1952), and is "a rare and spectacular manifestation of an aphasic condition" (Butterworth, 1985, p61). By contrast with agrammatism, the phonology and prosody of the host language are retained, as are many of the rules of morphology (the nonsense is often appropriately matched for number, case, and gender). In addition, the patient is often blissfully unaware of the impairment. Three different syndromes have been identified (Butterworth, 1985) .....
(a) Semantic Jargon: This is where "the words employed, although real, are semantically inappropriate, sometimes to the extent of seeming stripped of their normal meaning" (Butterworth, 1985, p63). Here is a specimen: "Experimenter: What does 'strike while the iron is hot' mean? Patient: Better to be good and to Post Office and to Pillar Box and to distribution to mail and survey and headmaster. Southern Railways very good and London and Scotland" (Kinsbourne and Warrington, 1963; Patient "EF", cited in Buckingham, 1985).
(b) Neologistic Jargon: This is where speech includes made-up words - words not found in the dictionary. Butterworth (1979) reports that neologisms were used as nouns (61%), verbs (20%), or adjectives (15%) - the categories known as content words, where each word must be chosen from a large number of options. Neologisms were rare in function word context (4% in total). Here is a specimen: "A man is asked the question, 'Who is running the store now?' He replies, 'I don't know. Yes the bick, uh, yes I would say that the mick daysys nosis or chpickters. Course, I have also missed on the carfter teck. Do you know what that is? I've, uh, token to ingish. They have been toast sosilly. They'd have been put to myafa and made palis and, uh, myadakal senda you. That is me alordisdus. That makes anacronous senda'" (Buckingham and Kertesz, 1976, cited in Marshall, 1980, p62).
(c) Phonemic Jargon: This is where speech degenerates into a succession of meaningless sounds, so that it becomes impossible to identify word boundaries. Some phonotactic rules remain obeyed, as with the clusters "tr", "nkr", "str", and "mbr" in the following specimen: "When asked to read the sentence It shall be in the power of the College to examine or not examine any licentiate, previously to his admission to a fellowship, as they shall think fit, he produced the following: A the be what in the temother of the trothotodoo to majorum or that emidrate ein einkrastrai mestreit to ketra totombreidei to ra fromtreido as that kekritest." (Perecman and Brown, 1981, p178; italics original.)
Exercise 7 - The Three Jargon Types Simulated
1 Rewrite the next paragraph, replacing every second noun by a semantically random word or short phrase (picked from a dictionary "with a pin"). Read the resulting "semantic jargon" out loud.
2 Repeat (1), but this time replacing every fourth word with a made-up (nonsense) word. Read the resulting "neologistic jargon" out loud.
3 Just read the following "phonemic jargon" out loud "temother of the trothotodoo to majorum or that emidrate ".
As far as the underlying anatomy is concerned, Kertesz (1981) reviewed ten cases of neologistic jargon in detail and found a significant pattern to the underlying lesions. He concluded that "the most consistently affected regions are the supramarginal gyrus, the posterior parietal operculum, the inferior parietal lobule, the first portion of the first temporal gyrus, the posterior temporal operculum (planum temporale), and the angular gyrus" (Kertesz, 1981, p100).
6.7 Dyspraxia of Speech
Given our earlier definition of praxis, it follows that the essence of a dyspraxia has to be an impaired ability to initiate voluntary movements - an inability to move the tongue to lick the lips when commanded, for example.
ASIDE - PRAXIS VS REFLEX: It is important to realise that the defect is solely one of initiating the movement, and that the muscles and motor systems themselves are intact. If the initiation is reflex or unwilled in any way - licking honey from the lips, perhaps - the information comes across the standard A-shaped processing hierarchy rather than down it, and the movement can be performed perfectly well!
Dyspraxic defects were first formally described by the German aphasiologist Liepmann (1900, 1905), and his explanation stuck closely to the speech production stages paradigm we saw so much of in Sections 3 and 4 above. Patients who cannot mentally conceive of having the required movement are deemed to have an ideational apraxia ('ideatorische apraxie'), patients who can have the idea, but not communicate that idea to the appropriate motor systems, are deemed to have an ideokinetic apraxia ('ideo-kinetische apraxie'), and patients whose motor systems cannot cope properly with the instructions sent to them are deemed to have a kinetic apraxia ('kinetische apraxie'). Subsequently, the German psychiatrist Kleist (1912) described the deficit of constructional apraxia, in which the ability to organise actions in space is affected. He regarded this as yet another disconnection syndrome, this time of the ability to transmit information between the processes of spatial analysis and those of voluntary action. A vivid description of dyspraxic speech is given by Darley, Aronson, and Brown .....
"As they speak, they struggle to position their articulators correctly. They visibly and audibly grope as they struggle to produce correct articulatory postures and to accomplish a sequence of these postures in forming words. Their articulation is frequently off target. They often recognise that they are off target and effortfully try to correct the error. Their errors recur, nonetheless, but they are not always the same; the errors on a series of trials are highly variable. As patients struggle to avoid articulatory error by careful programming of muscle movements, they slow down, space their words and syllables evenly, and stress them equally. Thus the prosody of their speech is altered as well as their articulation." (Darley, Aronson, and Brown, 1975, p250)
Phoneme substitutions, additions, repetitions, and prolongations are common. Thus .....
"I am looking an a drawring or a-a pec-picture of what is apparently a tor-nuh-ner-nor-tornatiuhd blew-brewing in the c-countryside. This is having an nuh-nuhmediate and frightening ef-f-ff-fuh-feck on a fairm famerly num-ber-ing - - - six uh humans and af-ff-sss-uh-sh-suh-sorted farm uh animals. There are quick-uh-ly going into a-a sss-sor-sormb uh cellar with fright in their ar-uh-eyes and in their every - movement. (Darley, Aronson, and Brown, 1975, p250. Underlines indicate errors and hyphens indicate hesitation.)
In the language of control theory, the suspicion is that a major feedforward mechanism is failing. Indeed, this underlies the distinction between planning and executive apraxias adopted by such authorities as Michael Crary of the University of Florida Health Science Centre, Gainesville. (Crary, 1993, with due acknowledgement to an earlier paper by Roy, 1978). The Roy-Crary scheme identifies four subtypes of the disorder .....
(a) Primary Planning Dyspraxia: This is a high level planning defect, and is usually associated with frontal lobe pathology.
(b) Secondary Planning Dyspraxia: This is a lower level planning defect - a defect of spatial organisation, perhaps, rather than an inability to think ahead in its broadest sense. It is usually associated with parietal-occipital pathology.
(c) Unit Dyspraxia: This is a defect which affects one particularly muscle system more than the remainder.
(d) Executive Dyspraxia: This is an inability to execute movements which have successfully passed the planning stage. It is usually associated with premotor area pathology. According to Roy: "..... although the patient is able to plan the motor activity as the frontal areas are intact and the pathways from the frontal to the premotor area are undisturbed, he is unable to execute the movement sequence properly due to damage to the area responsible for outputting the planned motor sequence" (Roy, 1978, p197).
Also from the University of Florida, Rothi and Heilman (1996) bring the story full circle by explicitly cross-mapping Liepmann's early descriptions onto a modern cognitive neuropsychological model (not dissimilar to our old friend, the PALPA model). The meaning of actions - that is to say, their utility to the organism in drawing up its plans - is seen as being mediated by a central "action semantics" store, whilst the supporting physical attributes are mediated by a secondary array of "action lexicons". It is a significant vindication of the transcoding school to find that their models - derived as they were from dyslexias and the like - can be applied just as effectively to human movement in its broadest form!
6.8 Dysarthria of Speech
Before proceeding, students should refresh their memories concerning the essentials of the motor system [revise it now]. Brainstem and cranial nerve anatomy [revise it now] is particularly important, as is the differential layout of the pyramidal and extrapyramidal systems!
Dysarthria of speech is the name given to speech difficulties arising from problems controlling the musculature involved. Some idea of the precision required can be gained from the fact that the tongue and lip movements in normal speech need to be coordinated to within 1/100th of a second and made to an accuracy of 1 mm (Netsell, 1986). So what happens when this accuracy of placing and timing starts to fall away? Well to start with .....
"In dysarthria one finds evidence of slowness, weakness, incoordination, or change of tone of the speech musculature. [All] basic motor processes - respiration, phonation, resonance, articulation, and prosody - are variably involved. [The] most characteristic error made by dysarthric patients is imprecise production of consonants, usually in the form of distortions and omissions." (Darley, Aronson, and Brown, 1975, p251.)
The situation is seriously complicated, however, because the motor mechanisms mentioned above (respiration, phonation, resonance, articulation, and prosody) are controlled by a minutely intricate arrangement of both spinal and cranial nerves. There are, as a result, several clinically distinct subtypes of dysarthria. Darley, Aronson, and Brown identify no less than six, as now described .....
(a) Flaccid Dysarthria: This type of dysarthria arises from damage at the level of the lower motor neuron. Two clusters of lower motor neurons are significant. Firstly, there are spinal nerve outflows from both thoracic and lumbar regions. These innervate abdominal and intercostal muscles, including the diaphragm, and thereby control the respiratory aspect of phonation. Secondly, there are cranial nerve outflows from the lower pons and the medulla. Of these, CN V (trigeminal), CN VII (facial), CN X (vagus), and CN XII (hypoglossal) have sensory and motor functions in pharynx and jaw. Clinically, the salient features are muscle weakness and hypotonia, and reduced or absent stretch reflexes. The condition is found in the various subtypes of bulbar palsy (depending on which of the cranial nerves is damaged). Speech presents as "slow, hypernasal, and breathy, with reduced loudness and reduced pitch variability" (Netsell, 1986, p41).
(b) Spastic Dysarthria: This type of dysarthria arises from damage at the level of the upper motor neuron. As a result, lateralised lesions to corticospinal tracts will present as contralateral defects, whilst those to corticobulbar tracts (not fully decussating) will present as milder bilateral weaknesses of tongue, lips, or face. The condition is found in what is known as pseudobulbar palsy (bilateral corticobulbar damage at the level of the upper motor neuron) and spastic hemiplegia. Speech presents as slow, indistinct, and effortful, as though produced against resistance. The descriptions "rasping" and "dragging" may also be encountered.
(c) Ataxic Dysarthria: This type of dysarthria arises from damage to the cerebellar system, most frequently bilaterally. The condition is found in, for example, Friedreich's ataxia. The salient features are inaccuracy and clumsiness of articulation, alternating with some staccato speech periods and unexpected changes of pitch.
(d) Hypokinetic Dysarthria: This type of dysarthria arises from damage to the extrapyramidal system, thus implicating structures such as the basal ganglia. Clinically, the salient features are slowness of movement, limited range, rigidity, and tremor at rest. The condition is found in Parkinson's disease and some Huntington's disease. Speech is soft due to poor coordination of respiration and phonation, and wavering due to inefficient vocal fold "hunting". When laryngeal closure eventually fails totally, then all voicing is lost and only whispering is possible.
(e) Hyperkinetic Dysarthria: This type of dysarthria also arises from damage to the extrapyramidal system, but with a different set of salient features, namely jerks, tics, chorea, ballism, and athetosis. The condition is found in some Huntington's disease, Tourette's syndrome, and Sydenham's chorea (more commonly known as St Vitus' dance). Speech presents as hesitant and randomly discontinuous due to disruption of orderly phonation.
(f) Mixed Dysarthrias: This type of dysarthria arises from multifocal or diffuse lesions, that is to say, those which damage more than one part of the motor system. The condition is found in multiple sclerosis, and the symptoms are highly varied as a result of the multifocal nature of the underlying pathology.
7 - Remarks
Finally, it is worth looking again at the speech models identified in Section 4 in the light of the speech production defects listed above. The anomias seem to be defects at or around linguistic control level, either with having an idea in the first place, or else with extracting an item from the semantic lexicon with which to express that idea. The agrammatisms seem to be defects at or around the lexical assembly area: the small clause as it is produced by the linguistic controller is assumed to be inherently telegraphic at the best of times, but in this particular type of defect is then denied further expansion by defective morphological and syntactic assembly processes. The jargons are defects across a somewhat broader spectrum: in all three subtypes, it is as if (a) ideation itself is impaired, or (b) the linguistic controller and sentence assembly processes are so impaired as to make it seem as though ideation is impaired. The dyspraxias, too, can show themselves at a number of levels along the planning-execution dimension, from ideation at the top to phonetic planning at the bottom. And the dysarthrias are defects in execution, in either its feedforward or feedback aspects. Note how difficult it is to locate the defect accurately in most cases.