Course Handout - Speech Errors, Speech
Production Models, and Speech Pathology
Copyright Notice: This material was
written and published in Wales by Derek J. Smith (Chartered Engineer). It forms
part of a multifile e-learning resource, and subject
only to acknowledging Derek J. Smith's rights under international copyright law
to be identified as author may be freely downloaded and printed off in single
complete copies solely for the purposes of private study and/or review.
Commercial exploitation rights are reserved. The remote hyperlinks have been
selected for the academic appropriacy of their
contents; they were free of offensive and litigious content when selected, and
will be periodically checked to have remained so. Copyright
© 2010, High Tower Consultants Limited.
|
|
First published online 08:00 GMT 29th October 2003, Copyright Derek J.
Smith (Chartered Engineer). This version [HT.1 - transfer of copyright] dated 18:00 14th January 2010
Some of this material appeared in Smith (1997). It has here been considerably expanded and supported with hyperlinks.
Speech and Language Therapy students will probably benefit from refreshing
their memories on the difference between segmental and suprasegmental
phonology [glossary]
before proceeding.
When the language production system is working correctly, it is easy to underestimate its complexity. Every now and then, however, the system slips up and produces an error, and errors in any system can have a tremendous explanatory value. They can tell us, for example, whether apparently separate functions fail separately or together, and thus whether they probably derive from one or more modular processes. With further analysis, they can also tell us which modules communicate with which other modules, what form of encoding is being passed back and forth, and how well protected the communication links are against damage or interference. In this section, we look at the commonest types of everyday speech error. But firstly, should your linguistics be a tad rusty, some vocabulary .....
Click here
for the "Psycholinguistics" Glossary. Alternatively, wait
until a word catches you out, and providing it has been hyperlinked thus - [glossary]
- click on that instead.
1.1 Slips of
the Tongue
In an early study of the sort of errors we all make in our everyday speech, Boomer and Laver (1968) judged that the phrase [glossary] was one of the main units of speech production. They based this judgement on the empirical observation that errors rarely transcended phrase boundaries. Boomer and Laver's study prompted a wave of interest in this topic area, and culminated in some powerful new theories. Error corpus data was used, for example, by Gary S. Dell of the Beckman Institute, University of Illinois, to develop his "Spreading Activation Theory" of lexical access [to be discussed in detail in Section 3.1]. Dell (1986) identifies three levels of slip of the tongue error, as follows .....
(a) Sound Errors: These are
accidental interchanges of sounds between words [glossary]. Thus "snow flurries" might become
"flow snurries". (Boomer and Laver had
already claimed that segmental errors such as these account for about
60% of all errors.)
(b) Morpheme Errors: These are
accidental interchanges of morphemes [glossary] between words. Thus "self-destruct
instruction" might become "self-instruct destruction".
(c) Word Errors: These are
accidental transpositions of words. Thus "Writing a letter to my
mother" might become "Writing a mother to my letter".
Additionally, each of these three levels of error may take various forms .....
(a) Anticipations: Where an early
output item is corrupted by an element belonging to a later one. Thus "reading
list" - "leading list".
(b) Perseverations: Where a later
output item is corrupted by an element belonging to an earlier one. Thus
"waking rabbits" - "waking wabbits".
(c) Deletions: Where an output
element is somehow totally lost. Thus "same state" - "same
sate".
To support this analysis, Dell (1986, p291) gives the following examples of the different ways in which the sentence "Some swimmers sink" might fail .....
|
Error |
Type |
|
Sim swimmers sink |
phoneme anticipation |
|
swum simmers sink |
phoneme shift |
|
Some simmers sink |
phoneme deletion |
|
Sim swummers sink |
phoneme exchange |
|
Some swummers sink |
phoneme perseveration |
|
Some swinkers sink |
cluster anticipation |
|
Some sinkers swim |
stem exchange |
|
Some swimmers swim |
stem perseveration or word substitution |
|
Some swimmers drown |
word substitution |
Dell then points out that there is a clear same-category pattern to most error occurrences. Thus initial consonants will interact predominantly with other initial consonants, prefixes with other prefixes, and nouns [glossary] with nouns. This is consistent with verbal storage and retrieval processes also being organised on some sort of same-category basis. Dell also points to the phenomenon of "accommodation", the fact that an error at an early output stage can nevertheless proceed quite happily through all the remaining output stages, without detection, and with subsequent syntactic and morphological changes being correctly but incongruously applied.
|
Exercise 1 -
Speech Errors 1 Block copy the immediately
preceding paragraph into a temporary word processing file, and edit it to
contain examples of the following types of error ..... Anticipation, perseveration, transposition, and deletion of a
sound; Anticipation, perseveration, transposition, and deletion of a
morpheme; Word transpositions within a clause [glossary]; Word transpositions between adjacent clauses; Word
transpositions between non-adjacent clauses; Word transpositions between
adjacent sentences [glossary]; Word transpositions between non-adjacent sentences. 2 Read the resulting text out loud. Which types of error are unlikely, in your judgement, to happen in practice? Why? Do you agree with Boomer and Laver's observation that errors typically take place within clauses, not between them? |
MIT's Stephanie Shattuck-Hufnagel has focussed on the role of word-onset consonants in speech production planning (Shattuck-Hufnagel, 1987). She found that what she called "sublexical" errors - speech errors in the delivery of an otherwise properly selected real word - tended mainly to affect word-onset consonants. She explained this using a "slot-and-filler" model of word construction, in which there is proposed a separate representation of (a) segments (or "fillers"), and (b) a framework (or "frame") of "slots" [glossary] to lock those segments into position. The full model runs as follows .....
"Step
1. Selection of a set of candidate open class or content words from the
lexicon [glossary -
note that there are two substantially different usages of this word, and that
Shattuck-Hufnagel here appears to be using the
psycholinguistic one]. Selection can be accomplished by transferring lexical
items to a short-term processing store, or by marking their lexical
representations temporarily [see NB below]. These candidate lexical items
provide the set of phonemic segments among which final selection for the
utterance will be made, and among which interaction errors can occur.
The form of each lexical item specifies its segments and their serial
order. Step 2. Construction of syllabic structure and other
apparatus for associating main lexical stress to the open class [glossary]
lexical items. These processes incorporate the
rest of the word minus the onset; word onset consonants are ignored until
later. Step 3. Transfer or association of the non-onset portions of
content morphemes, now organised into the metrical structures that govern
lexical stress, to the emerging phrasal framework. While the hierarchical
structure of the phrasal frames that receive the non-onset portions of the
content words is not fully specified in this model, we propose that among other
things they define two classes of components for open class items: word onset
locations (which at this point in the processing remain empty), and locations
for the rest of each word (which have now been filled). These two structural
components are present for every content word in the phrasal frame, even for
vowel-initial words whose word-onset consonant component will not be filled. Step
4. Eventual transfer or association of word-onset consonants into the
word-onset locations for content words in the phrasal frame. All segments of
the content words of the phrase are now in place. Step 5. Transformation
of this representation with its accompanying hierarchical organisation into a
complete string of discrete fully specified segmental elements, including those
of grammatical morphemes, and subsequently into a pattern of motor commands
characterised by substantial temporal overlap in the effects of adjacent
segments. This process presumably involves many steps, among them one that is
subject to single-segment errors at any position in the word. The influence of suprasegmental structure on interaction errors at this
point in the processing is not clear, but non-interaction errors, which are
distributed more evenly across word positions, may occur here." (Shattuck-Hufnagel, 1987, p47; bold
emphasis added)
1.2 Spoonerisms
An interesting subtype of sound errors is the "Spoonerism". Spoonerisms are an often amusing cluster of word-initial transposition errors, and are named after the Oxford academic, the Reverend W.A. Spooner (1844-1930), in whom the affliction - it is safe to believe - occurred naturally (at least to begin with). Here are some examples of Spooner's own isms .....
[More]
Two qualities distinguish Spoonerisms from ordinary sound errors. The first is that the sound transposition generates two proper words [unlike Dell's "flow snurries" case, for example, where the word "snurries" is a non-word], and the second is that the two new words themselves make some different sense together. However, close study of the natural history of the phenomenon raises other interesting observations. Here is an early sceptic, who clearly believed that Spooner was milking his defect for all it was worth .....
"Curiously
enough the Spoonerism is named after a man who rarely made Spoonerisms as
dictionaries define them. A recent study [Robbins (1966)] indicates that
Spooner's Spoonerisms were rather carefully planned - high level humour rather
than unintentional error ....." (Mackay, 1970, p323)
Mackay analysed previously published lists of Spoonerisms, carefully rejecting any which could be judged as intentionally humorous or otherwise spurious. Detailed analysis of the 179 examples which remained drew him to a number of conclusions concerning the likely units of speech production. Here are Mackay's specific observations .....
"1.
Repeated phonemes usually occurred before and after the reversed phonemes [glossary]. 2. Reversals before repeated phonemes were as
common as reversals after repeated phonemes, contradicting chain association
theories. 3. The syllabic position of reversed phonemes was almost
invariably identical, indicating that syllables [glossary] must be a unit in speech production. 4.
Consonants in the initial position of syllables were more frequently reversed
than would be expected by chance [.....] 5. Significantly more reversals
involved the initial phoneme of words than would be expected by chance,
indicating a lexical factor in Spoonerisms. 6. Distinctive features
of reversed phonemes were usually similar except for place of articulation
[.....]. This suggested the possibility of two distinct types of mechanism in
speech production: one for Form of Articulation, including voicing, nasality,
and openness, and another for Place of Articulation. 7. Consonants
were more frequently transposed than vowels. 8. Reversed phonemes
occurred closer together in words and sentences than could be expected by
chance. 9. [.....] Spoonerisms in German and English were shown to be
quantitatively similar [as were] Spoonerisms in Latin, Croatian, Greek, and
French, suggesting that phoneme reversals may result from universal underlying
mechanisms common to all speakers. 10. No support was found for
chain association explanations of Spoonerisms ....." (Mackay, 1970, p347)
Mackay then suggested the existence of some sort of "buffer system", that is to say, a temporary memory store situated part-way down the motor hierarchy [reminder] and working on a "store-and-forward" basis.
ASIDE: This usage of the term
"buffer" evolved in computer science, where it allows a sending
module and a receiving module to work at slightly different speeds if
necessary. Output from the sending module is passed to the buffer store at a
speed the sending module finds convenient, and read from the buffer store by
the receiving module at a speed the receiving module finds convenient. Since
minor processing delays can now be absorbed within the individual modules, the
operation of the system as a whole can, for a small investment in the
intervening resource, be significantly improved. Click here for fuller story, if interested.
Here is how Mackay saw the mental buffer system
operating .....
"In
the present study the reversed phonemes always originated in the same phrase,
which further suggests that the buffer system displays no more than one phrase
at a time. The syllable must be another unit since reversed phonemes tend to
maintain the same syllabic position. However [.....] the fact that in
Spoonerisms a unit smaller than the syllable crosses syllable boundaries,
suggests the existence of smaller units. The question now arises as to whether
phonemes are a unit in this hierarchy. [.....] Another set of questions relates
to the buffer system. How much is specified in the buffer system? In the
present model, for example, duration of phonemes is left unspecified, but
phonemes, syllables, and stress are marked. In what form are the units in the
buffer specified? Are articulatory goals or targets
represented in the buffer rather than phonemes? Is stress independent of the
elements that are stressed? How are syllables coded - in abstract form
independent of the phonetic elements comprising them?" (Mackay, 1970, pp341-346)
This sort of progressive encoding and recoding, of course, is classic material for a box-and-arrow explanatory diagram, and, sure enough, Mackay obliges .....
|
Figure 1 -
Mackay's (1970) Speech Production Model: Here is a relatively straightforward four-box model of the lower
levels of the speech motor hierarchy. Lexical selection has already taken place,
and the selected items "displayed" in the buffer system,
"abstractly represented in correct serial order". Here is what
happens next ..... "When
this buffer system contains a word the corresponding phonemic units at the
Individual Phoneme Level [topmost green box] become partially activated,
along with a set of programs for modifying these phonemes at the Contextual
Integration Level [middle green box]. These levels in turn feed into the
Motor Unit Level [lower green box], where reciprocal inhibition is assumed to
occur. These motor units code the contextual variants of phonemes, [.....]
The units at the Individual Phoneme Level are unordered, and are activated in
correct serial order through scanning of the buffer system." (Mackay,
1970, p348) |
|
Redrawn from a black-and-white original in Mackay (1970; Figure 7). This graphic Copyright © 2003, Derek J. Smith. |
1.3 The
"Tip of the Tongue" Phenomenon
This is the name given to the relatively common everyday experience where we more or less know the word we want to say next, but are unable to bring it all the way to consciousness. The phenomenon has been known about for some time, but recent interest is normally dated to Brown and McNeill (1966), who carried out psycholinguistic research on 56 American undergraduates. They selected 49 low-frequency words (such as apse, nepotism, cloaca, ambergris, and sampan) and prepared brief dictionary definitions of each. Subjects were given a response sheet (similar to that used in Exercise 2 below), and were then presented with each definition (just like opening a dictionary at random, reading an entry, and then trying to guess the word to which it refers). Where subjects either knew or did not know the target word, no response was required, but on approximately 8.5% of trials, they experienced a tip-of-the-tongue (TOT) state - their lexicon had nearly delivered them up the target word, but not quite. On these occasions, they were required to guess at the missing word's first or last letters, the number of syllables it contained, and which syllable they thought carried the primary stress. However, before we discuss their results, here is an opportunity for you to experience the phenomenon for yourself .....
Exercise 2 - The Tip of the Tongue Phenomenon
Brown and McNeill's subjects experienced a total of 360 TOT states, of which 233 were "positive TOTs", that is to say, TOTs "for which the data obtained could be scored as accurate or inaccurate" (p280), and the remainder were "negative TOTs", that is to say, TOTs "for which the subject judged the word read out not to have been his target and, in addition, one in which the subject proved unable to recall his own functional target" (p281). The trials were also scored for whether TOTs were similar in sound (Saipan, perhaps, for sampan) or meaning (houseboat, perhaps, for sampan) to the target. There were 224 similar-sound (SS) TOTs, and 95 similar-meaning (SM) TOTs. Of the SS items, 48% had the same number of syllables as the target, compared to only 20% of the SM words. These data were then modelled as though the human word stores were organised like a dictionary, albeit a very complicated one .....
"In
real dictionaries, those that are books, entries are ordered alphabetically and
bound in place. Such an arrangement is too simple and too inflexible to serve
as a model for a mental dictionary. We will suppose that words are entered on keysort cards instead of pages and that the cards are
punched for various features of the words entered. With real cards, paper ones,
it is possible to retrieve from the total deck any subset punched for a common
feature by putting a metal rod through the proper hole. We will suppose that
there is in the mind some speedier equivalent of this retrieval
technique."
ASIDE: This is precisely the
sort of game played by the Cambridge psycholinguist Richard H. Richens, who used punched cards in the late 1940s for automatic translation [for the fuller story of
which, see our e-paper on "Short-Term Memory Subtypes in
Computing and Artificial Intelligence", Part 4 (Section 4.1), if interested].
"The
model will be described in terms of a single example. When the target word was sextant,
subjects heard the definition: 'A navigational instrument used in measuring
angular distances, especially the altitude of sun, moon, and starts at sea'.
This definition precipitated a TOT state in 9 subjects of the total 56. [.....]
The problem begins with a definition rather than a word and so the subject must
enter his dictionary backwards, or in a way that would be [.....] quite
impossible for the dictionary that is a book. It is not impossible with keysort cards, providing we suppose that the cards are punched
for some set of semantic features. [..... However] in the TOT case the
[retrieval] must include a card with the definition of sextant entered
on it but with the word itself incompletely entered." (Brown and McNeill,
1966, pp292-293; italics original)
Brown and McNeill then discuss at length exactly how this incomplete lexical entry might be coded. The most obvious suggestion was that it was coded by its first and last letters, so that saucepan, spaceman, and stamen would all be clustered together in some way - hence the Saipan-sampan confusion. But this was dismissed as a touch too simplistic. Instead, they preferred "something more like Sex_tanT" (p295), where not just the first and last letters, but also elements of the first and last syllables also played a part. Brown and McNeill named this type of recall by common feature "generic recall", and saw it as reflecting the coding systems used in verbal memory. This makes the TOT phenomenon itself, as well as the techniques of experimenting with it, relevant across a wide spectrum of communicative cognition, including speech perception, sentence production, and reading, as now demonstrated .....
Exercise 3 - Accessing the Lexicon using First and Last Letters Only
More recently, Jones and Langford (1987) and Maylor (1990) have looked at how different types of distractor word can interfere with the TOT phenomenon. Maylor, for example, presented TOT items to 15 subjects in their 'fifties, 17 in their 'sixties, and 17 in their 'eighties. A distractor word was presented immediately after each target definition, separated only by a short bleep. These distractors had been carefully chosen to fall into one of four conditions. In Condition P the distractor was phonologically related to the target word (eg. baulk for braise), in Condition S it was semantically related (eg. incubus for banshee), in Condition U it was not related in either way (eg. fossilise for hospice), and in Condition PS it was simultaneously phonologically and semantically related (eg. abnormality for anachronism). They distinguish subjective TOT, where the subject reported the TOT state but could not retrieve any concrete facts about it, and objective TOT, where a letter or letters could be identified and a syllable count or stress location given. Their results indicated that both states occur more frequently when the distractor word was phonologically related to the target word than when it was phonologically unrelated.
And more recently still, Harley and Bown (1998) varied the frequency and phonological distinctiveness of the target words and found "that TOTs are more likely to arise on low-frequency words that have few close phonological neighbours" (p151). They use their data to reflect upon the broader process of "lexicalisation"[glossary], which they define as "the process of phonological retrieval in speech production given a semantic input" (p152), and they opt for a "two-stage" explanatory model of lexical access, that is to say, a model which strictly separates each word's semantic and phonological representations. TOTs can therefore be seen as arising "when the first stage of lexical access is completed successfully, but not the second" (pp152-153). However, the critical point as far as Harley and Bown are concerned is as follows .....
"Our
central result is that phonological neighbours contribute to, rather than
hinder, phonological retrieval in speech production. [.....] A TOT occurs when
the semantic specification successfully accesses the abstract lemma [glossary].
This causes the 'feeling of knowing' the word. Nevertheless, the lemma is then
unable to pass sufficient activation onto and thereby access the corresponding
phonological word form. [.....] There are two possible reasons for failure at
this stage. Either the connections between the lemma and the phonological forms
might be weakened, or the phonological forms might themselves be weakly
represented for these items." (Harley and Bown,
1998, p162)
Finally, although this section is primarily concerned with speech errors in normals, the similarity between the TOT phenomenon and the clinical sign known as "anomia" is too glaring not to get a comment. Goodglass, Kaplan, Weintraub, and Ackerman (1976) studied the confrontational naming ability of a population of aphasics, and began by pointing out this very similarity .....
"The
designation of a patient as 'anomic' indicates that his access to lexical terms
is poor in relation to the fluency of his articulation and grammar" (Goodglass et al, p145).
Goodglass et al then looked at patients' "tacit knowledge" of the first letter of, and number of syllables in, the words which they were failing to retrieve. In fact, they dated this sort of research to Weisenburg and McBride (1935), who formally recorded how many syllables anomics thought were in the lost names (a test known as the Proust-Lichtheim Test of Inner Speech, in which the patients show by raising an appropriate number of fingers how many syllables they believe are in the word they are having trouble with). In their own research, Goodglass et al tested 42 male aphasics, classified by the Boston Diagnostic Aphasia Test as 13 Broca's type, 8 Wernicke's type, 12 conduction type, and 9 anomic type [readers unfamiliar with these typings will find pen pictures of each in our Neuro-Glossary]. Each was shown 48 line-drawing stimulus cards of objects whose name was of intermediate word frequency in English, and containing one, two, three, or four-or-five syllables [for example, clamp, walrus, violin, and refrigerator). The authors conclude .....
"The
results indicated a clear cut superiority on the part of conduction aphasics,
as compared to Wernicke's and anomic subjects.
Conduction aphasics identified both first letter and syllabic length of one
third of the words which they could not name. Anomic aphasics succeeded in
fewer than one of ten instances and Wernicke's
aphasics were not much more successful. Broca's
aphasics were correct in one try out of five, and could not be differentiated
statistically from either the conduction aphasics on the one hand or the Wernicke's aphasics on the other." (Goodglass et al, 1976, p151)
As to why this should be, the authors looked at the sequential nature of the word production process .....
".....
it appears that word finding is usually an 'all-or-none' process for Wernicke and anomic patients, in the sense that they either
recover a name well enough to produce it or they can give little evidence of
partial knowledge. Words which are failed then seem to be totally unavailable,
as far as recall processes are concerned. However the near perfect multiple
choice selections by all subjects indicate that this is a one-way disorder
involving recall, but not recognition, [//] In the case of the conduction
aphasics the evidence of tacit partial knowledge of many words may indicate a
breakdown at a later stage in the naming process. An inner auditory
representation may be present but is prevented from setting into motion the
final neural events which activate the articulatory
system. Either the auditory model is incomplete or, as the disconnection
hypothesis suggests, its route to the motor speech area is not consistently
available. [//] The failure of Broca's aphasics to
match the performance of the conduction aphasics is surprising, since it
contradicts the traditional notion that their word finding difficulty is
purely at the motor speech level." (Goodglass et
al, 1976, p152)
Further differences between the conduction and transcortical motor types of aphasia are discussed in McCarthy and Warrington (1984).
1.4
Malapropisms
One final form of everyday speech error is the malapropism. Malapropisms are characterised by the "ludicrous misuse of a word, especially by confusion with one of similar sound" [source]. "The term derives from Mrs. Malaprop in R. B. Sheridan's The Rivals, who erred humorously with phrases like 'as headstrong as an allegory [alligator] on the banks of the Nile'" [citation]. Malapropisms are another phenomenon where the empirical data challenge one's preferred model of lexical organisation, thus .....
"From
a collection of over 2000 errors in speech compiled by the first author, we
initially selected all errors that involved word substitution (397). From this
initial list we eliminated all errors that could have arisen from [other
sources]. The remaining corpus comprised 183 errors. These errors, the
malapropisms, have some interesting properties. First, the target and the error
are of the same grammatical category in 99% of the cases. Second, the target
and the error frequently have the same number of syllables (87% agreement in
our list). Third, they almost always have the same stress pattern (98%
agreement)." (Fay and Cutler, 1977, pp507-508).
Fay and Cutler continue .....
"At a certain point in the production of a sentence a grammatical structure must be framed to carry the meaning that the speaker intends to convey. This structure can be thought of as incorporating both the syntactic properties of the impending utterance (in the form, say, of a phrase structure[glossary]), and the meanings of the words to be used. What is not in the structure initially is any specification of the phonological characteristics of the chosen words. For these the speech production device must look into its mental dictionary to find a particular entry whose meaning and syntactic category match the specifics embodied in the grammatical structure. [.....] What is this mental dictionary, or lexicon, like? We can conceive of it as similar to a printed dictionary, that is, as consisting of pairings of meanings with sound representations. A printed dictionary has listed at each entry a pronunciation of the word and its definition in terms of other words. In a similar fashion, the mental lexicon must represent at least some aspects of the meaning of the word, although surely not in the same way as does a printed dictionary; likewise, it must include information about the pronunciation of the word although, again, probably not in the same form as an ordinary dictionary." (Fay and Cutler, 1977, pp508-509; bold emphasis added; note that these authors are using the linguistic definition of lexicon, not the psycholinguistic - see glossary)
2 - Hesitations
as Indicators of Thinking Time
"'Time
is the measure of all things', not least mental activities; and time when
people appear to be doing nothing is the kind of time psychologists most like
to measure." (Butterworth, 1980, p155)
The idea that hesitation phenomena might indicate psychological processing time goes back to Donders' work in the 1860s, but modern interest is best placed with the work of Frieda Goldman-Eisler (various from 1951). In one of her early studies (Goldman-Eisler, 1958), she demonstrated that hesitation pauses preceded phrases rich in new information. She was then followed by Donald S. Boomer, who studied the relationship between both filled and silent pauses and their position within the grammatical clause (eg. Boomer, 1965). Boomer tape recorded spontaneous speech from 16 male American students, and analysed the transcripts for silences longer than 200 milliseconds, filled pauses, and the suprasegmental "phonemic clause" boundaries .....
Key Concept - Phonemic Clause: "A phonemic clause is an intonational unit consisting of a single intonation
contour, one primary stress and a terminal juncture, and is also called a 'tone
group'" (Butterworth, 1980, pp156-157).
Alternatively, it is "a grammatical structure produced within a single
intonation contour, and bounded by junctures [silences, or significant changes
in phonetic pitch, stress, or duration]." (Crystal, 2003, p348)
Boomer then numbered the successive word boundaries in clauses, on the assumption that they presented "an ordered series of opportunities for hesitation" (p162). For example .....
1and 2the 3weather
4was 5hot
Here is his argument .....
"In
general, there will be as many possible locations as words in the clause, each
location being labelled with the ordinal number of the word it precedes.
Occasional arbitrary exceptions were made in this study for multiple-element
proper nouns such as Bill Smith and San Francisco, for
combinatory groups like thank you and what-you-may-call-it, and
for certain 'tags' such as you know and you see. These were
counted as single words, as were syntactically superfluous repetitions of
words, as in I took the ... the train. Filled pauses themselves and
word-fragments were also excluded from the count. [//] The corpus contained a
total of 1593 phonemic clauses of which 713 contained one or more hesitations.
Hesitations totalled 1127, 749 unfilled pauses and 378 filled pauses. [.....]
Results. The hypothesis that hesitations tend to occur at the beginning of
phonemic clauses was strongly supported [although] the greatest frequency
of hesitations is not at the outset but at position 2, after the first word of
the clause [see the column of data highlighted in red in the table below -
Ed.]. This is true for all nine of the array distributions representing clause
lengths from two to ten words." (Boomer, 1965, pp162-163;
italics original; bold emphasis added)
And here is the supporting summary data table, with the columns showing the hesitations at successive boundary locations, and the rows showing the number of words in the clause .....
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
All |
|
|
2 |
18 |
28 |
|
|
|
|
|
|
|
|
46 |
|
3 |
16 |
68 |
29 |
|
|
|
|
|
|
|
113 |
|
4 |
34 |
81 |
39 |
27 |
|
|
|
|
|
|
181 |
|
5 |
28 |
80 |
37 |
22 |
21 |
|
|
|
|
|
188 |
|
6 |
21 |
65 |
22 |
26 |
16 |
17 |
|
|
|
|
167 |
|
7 |
14 |
67 |
25 |
20 |
13 |
19 |
8 |
|
|
|
166 |
|
8 |
9 |
38 |
18 |
11 |
11 |
14 |
13 |
7 |
|
|
121 |
|
9 |
7 |
19 |
13 |
15 |
4 |
6 |
6 |
4 |
13 |
|
87 |
|
10 |
4 |
15 |
4 |
8 |
5 |
7 |
5 |
3 |
4 |
3 |
58 |
|
All |
151 |
461 |
187 |
129 |
70 |
63 |
32 |
14 |
17 |
3 |
|
Rochester (1973) provides a handy review of the early literature on hesitations, if interested.
2.1 Lindsley's Work
Lindsley (1975) designed a study to determine how many sentence units are planned in advance of speech initiation. Noting that the subject [glossary] and main verb [glossary] introduces the part of the sentence [glossary] known as the predicate [glossary], he identified three possible planning strategies, as follows .....
Pre-Predicate Model: This
model "characterises a speaker who initiates his utterance as soon as he
has completed selection of the subject" (Lindsley,
1975, p3).
Post-Predicate Model: This
model "characterises a speaker who delays initiating his utterance until
after he has completed selecting the verb as well as the subject" (Lindsley, 1975, p3).
Semi-Predicate Model: This is
a compromise model which "characterises a speaker who delays initiating his
utterance until after he has completed some selection of the verb as well as
selection of the subject" (Lindsley, 1975, pp3-4)
Now the point about the three explanatory models is that they make different predictions on initiation latency. The pre-predicate model predicts that the to-be-selected verb contributes nothing to any speech initiation latency, and, by implication, that "the speaker responds as though he were treating the subject and verb as independent responses" (p3). The post-predicate model predicts that verb selection does contribute to the initiation latency, and, by implication, that "the speaker responds as though he were treating the subject and verb as interdependent aspects of a larger response unit: the sentence as a whole" (p3). And the semi-predicate model falls between these two extremes, accepting some contribution to latency from verb selection.
To test which model might be operating, Lindsley devised a picture-description task to generate response latency data. He presented subjects with pictures containing an actor (a man, woman, girl, or boy) engaged in a specific action (touching, kicking, greeting, etc.), and compared response latencies when producing utterances of different length and grammatical form .....
S-Only Sentences: Here the
subject had to name the actor depicted on the card. Example:
"The girl".
V-Only Sentences: Here the
subject had to name the action depicted on the card. Example:
"Greeting".
S-V Sentences: Here the
subject had to name both the actor and the action in sentence form. Example:
"The girl is greeting".
S-V-O Sentences: Here the
subject had to name actor, action, and object in sentence form. Example:
"The girl is greeting the boy".
The amount of lexical decision making was also varied by holding either S or V constant across a number of cards, so that (a) it would always be the girl, say, who was doing something (S constant), or (b) the action would always be greeting, say, regardless of which actor was depicted (V constant). Where a different S or V were possible, Lindsley codes them as dS, dS + dV, and dV sentences; where a constant S or V, Lindsley codes them as cS, cV, etc. Data were then obtained for all permutations of c and d and sentence type, including mixed sentences such as cS + dV. These data indicated (a) that it takes longer to initiate an S-V utterance than an S-only utterance, thus arguing against the pre-predicate model, and (b) that S+V utterances were shorter than S-only naming when the actor was already known, thus arguing against the post-predicate model. Lindsley therefore concluded .....
".....
it seems most likely that the speakers of S-V sentences, represented by dS + dV and cS
+ dV do employ consistently a specific speech
strategy characterised by the semi-predicate model. This speech strategy
entails an initial portion of verb selection being deliberately performed
before the initiation of the utterance. Whenever this initial portion of verb
selection occurs in series with subject selection or takes longer than any
parallel stages of subject selection, it delays the initiation of the utterance
until after it has been completed. [.....] The results of this research,
singling out the Semi-predicate model, are consistent with those of the
hesitation studies [citations] in demonstrating that speech is initiated before
all information about an utterance has been processed or linguistically
coded." (Lindsley, 1975, pp10-19)
Or to put it simply, we typically start talking to an idea as soon as we have decided on the subject and have some idea of the action we wish to describe.
2.2
Butterworth's Work
Butterworth (1975) studied not the individual pauses - what he termed "the microstructure of hesitation" - but rather the overall proportion of pausings to speech - "the macrostructure of hesitation" (Butterworth, 1975, p75). He adopted Henderson, Goldman-Eisler, and Skarbek's (1966) concept of the "temporal cycles" of speech, that is to say, alternating periods of hesitancy and fluency, and collected speech samples from eight male subjects. He then analysed transcripts of these samples for the demarcation points of both the cycles and the Ideas. Here are his conclusions .....
"Clause
boundaries appear to be a necessary but not sufficient condition for the onset
of both cycles and new Ideas, in that the vast majority of cycles and the Idea
divisions given by any subject coincided with clause boundaries but a very
substantial number of clause boundaries were not coincident with either cycles
or Ideas. There was somewhat better match between sentences, Ideas, and cycles.
Taking Ideas and sentences first, of 35 criterial
Ideas boundaries - ie. where more than half the
subjects agreed on the location of an Idea division - all but four coincided
with sentence boundaries [.....]. Thus, of clause types, the relevant kind for
Ideas seems to be sentences; but Ideas may consist of more than one sentence.
[//] With regard to cycles, about half coincided with sentence starts and
three-fourths will all kinds of clause boundaries. this left cycles consisting
of more than one sentence in some cases and parts of sentences in about half
the cases [.....//] The results presented here are consistent with the
hypothesis that the cycles represent integral planning units for the speaker,
and shed light on what these planning units consist of linguistically. First,
the speaker tends to plan ahead in terms of well-understood linguistic units -
namely clauses and sentences. Second, he appears to have the ability to chunk
together several clauses or sentences as one superordinate
planned structure integrated by some kind of semantic unity. [.....] If this is
correct, then serious qualifications are required of Boomer's thesis that the
main unit of planning is the phonemic clause (Boomer, 1965; Boomer and Laver,
1968). If speakers do encode speech into phonemic clause units, then this will
occur well down the hierarchy of encoding processes and will be a process of a
quite different kind from the planning of cyclic segments." (Butterworth,
1975, pp83-84)
In a later paper, Butterworth (1980) revisited the importance of hesitation data. He began by pointing out that sentence boundary pauses probably help listeners as much as speakers, because they give them time to consolidate their understanding of the message just received. He then drew again on the cycles described by Henderson, Goldman-Eisler, and Skarbek (1966) .....
"Typically,
cycles last about 18 s, but some as long as 30 s, which means that they will
contain, on average, five to eight clauses, ie.
generally two or more sentences [citations]. Since, as we have seen, semantic
factors were responsible for pause time variations, we should look for semantic
rather than syntactic units. [//] I therefore asked independent judges to
divide transcripts of speech like 'Ideas' [.....]. Taking those points in the
texts where more than half the judges agreed that one Idea ended and the next
began, and comparing these with cycle boundaries, a significant correspondence
between Idea and cycle boundaries was found. Although the correspondence was
reliable it was not complete. Some cycles did not begin at an Idea boundary,
and some Ideas did not coincide with cycles. Why these discrepancies should
occur is not clear. [.....] One thing is established, however: both Idea and
cycle boundaries almost invariably coincide with clause boundaries."
(Butterworth, 1980, p165; bold emphasis added)
Butterworth is another to buy into the buffer
system concept, and, indeed, points out that there may well be more than one of
the things to have to worry about .....
"Several
authors, most notably Morton (1970), have argued for a 'Response Buffer' which
can hold a string of words for output following lexical selection. This buffer
is held to operate in both speech production and short-term memory tasks. [However,]
Shallice and Butterworth (1977) reported one case of
severe impairment of auditory-verbal STM, without a
concomitant increase in the hesitancy of speech. The most plausible
interpretation of these results is that, contra Morton, the buffer
used in STM tasks is not used in speech."
(Butterworth, 1980, pp165-166; italics original; bold
emphasis added)
2.3
Developmental Data
Finally, MacWhinney and Osser (1977) give some developmental data. They studied 20 British five year olds, and analysed their hesitation behaviour by sex and social class. They concluded as follows .....
"The
first major result of this study has been the identification of three major
planning functions: preplanning, coplanning, and
avoidance of superfluous vocalisation. The styles in verbal planning reflect
basic differences in cognitive processing. Underlying all three planning
function, however, is one central commonality - verbal planning takes time.
While the speaker is trying to figure out what to say and how to say it, the conversation
moves on. Given this inevitable forward movement in time and his own problems
in formulating his utterance, the speaker may do one of two things. He may
attempt to fully formulate what he is going to say before he says it.
Alternatively, he may start talking and hope to be able to figure out his
utterance in medias res. Whether he pauses initially or attempts to patch
together an ongoing sentence, he has a further option. He may either use
superfluous verbalisation to cover his pauses and errors or he may simply
remain silent. The principal components analysis in this experiment indicates
that the 13 hesitation phenomena examined in this study can be grouped into
these three functional categories: coplanning,
preplanning, and avoidance of superfluous verbalisation. [.....] The second
major result of this study has been the finding that, for 5-year-olds,
differences in verbal planning functions are more related to sex than to social
class. Boys were found to do more coplanning, while
girls made greater use of preplanning. Moreover, boys showed more use of
superfluous verbalisations than girls." (MacWhinney
and Osser, 1977, p984)
3 - Lexical
Structure Models
In this section, we look in more detail at the ins and outs of lexical retrieval .....
3.1 The
Spreading-Activation Theory of Lexical Retrieval
Having considered the many threads of evidence then available to him, Dell (1986) proposed a "Spreading-Activation Theory" of lexical retrieval. Here is the main thread of his argument .....
"The
principal assumption of the theory is that at each level a representation of
the sentence-to-be-spoken is constructed. Thus, a planned utterance will exist
at various times as a semantic, syntactic, morphological, and a phonological
representation. The theory describes the construction of the latter three
representations. [.....] The construction of a representation at each level
goes on simultaneously with that of the other levels [ie.
parallel processing - Ed.], with the rate of processing depending on factors intrinsic
to the level and on the rate of processing of the level immediately above it.
[.....] The basic idea of the theory is that the tagged nodes constituting a
higher representation activate nodes that may be used for the immediately lower
representation through a spreading-activation mechanism. The lower
representation is constructed as the generative rules associated with that
level build a frame, or ordered set of categories, and the insertion rules fill
in the slots of the frame [.....]. When an item is selected for a slot, it
becomes part of the developing lower representation and so it receives its tag.
Thus, a principal mechanism for the translation of information from one
representation to another is spreading activation through the lexicon.
[.....] When a node has an activation level greater than zero, it sends some
proportion of its activation level to all nodes connected to it (spreading).
This proportion is not necessarily the same for each connection. When the
activation [reaches] its destination node, it adds to that node's current
activation level (summation). [.....] Activation is assumed to decay
exponentially [over time] towards zero. These operations, spreading, summation,
and decay, apply to all of the nodes in the lexical network at all times,
regardless of whether the node is part of a representation (tagged or not).
[.....] One of the important assumptions regarding spreading activation in the
theory is that all connections are two way. If Node A connects to B, then B
connects to A. Given the nature of the connections and the hierarchical
structure of the lexical network, each connection can be classified as either
excitatory top-down, or excitatory bottom-up. For each top-down connection,
such as that from a particular morpheme to a particular phoneme, there is a
bottom-up connection in the reverse direction. These bottom-up connections
deliver positive feedback from later to earlier levels and play a critical role
in the theory. Their presence makes processing in the network highly interactive
[and] generates some nonobvious predictions. [.....] Constructing
Representations: In this section I outline how a lower representation is
constructed given a higher representation. The first important concept is
that of the current node. It is that item of the higher level
representation that is in the process of being transferred into corresponding
items at the immediately lower level. [.....] When the construction of a lower
representation begins, the current node is that node of the higher representation
that is tagged as first. The initial step in the translation is the
activation of the current node." (Dell, 1986, pp287-288;
italics original; bold emphasis added)
The big question, of course, is which processes go wrong to produce a given error, and Dell's simple answer was that "no one process is at fault" (p289). Speech errors are simply natural consequences of the way the mind is organised. Thus .....
"For
example, in the planning of an utterance many concepts would legitimately
become activated that would not actually appear in the utterance. This
background activation might include activation from concepts that were either
presuppositions or inferences that were necessary in the semantic and pragmatic
planning of the utterance. For example, if one were to say Could you close
the door?, one would certainly have processed the presupposition that the
door was open. As a result the concept for open might be active, and
because of the spreading activation, the word, morpheme, and phoneme nodes associated
with [it] would become activated, perhaps resulting in the slip Could you
open, I mean close ....." (Dell, 1986, p291;
italics original)
The error corpus data on the location of errors are also important .....
"In
general, items are more likely to move short distances. Misordered
sounds and morphemes tend to move to adjacent content words that are in the
same phrase (Boomer and Laver, 1968; Garrett, 1975; MacKay, 1970). Misordered words move greater distances, possibly because
the planning chunks at the syntactic level are larger, or because words can
only move to appropriate syntactic slots (Garrett, 1975)." (Dell, 1986, p293)
3.2 The Role of
Gesture
SECTION TO FOLLOW, OCTOBER 2004
4 - Modular
Speech Production Models
Dell's theory deals primarily with word retrieval at a micro level. It accepts the basic two stage theory and identifies four levels of representation within those stages, but although it has a lot to say about what might be going on at neural levels, it does not fully address the modularity of the processing. Other workers have taken a more macroscopic view, and have not only tried to map the modules and processes involved in speech production, but have started to push upwards into the realms of pragmatics.
ASIDE: Praxis and pragmatics
actually share the same linguistic root, namely the Greek word prassein = "to do", via its derivations praxis
("doing") and pragma
("deed"). Defects of praxis are known as dyspraxias.
We have already dealt at length with the motor hierarchy under a number of
separate headings. For example, it is the output leg on the standard A-shaped
control hierarchy model. For specific examples see Craik
(1945), Frank (1963), and Norman (1990), for the
history of the motor hierarchy in general see our e-paper on "The Motor Hierarchy", and for an introduction to theories of biological
motor programming, see our e-paper on "Motor Programming". And the motor hierarchy for speech is one of the
output legs on the standard X-shaped psycholinguistic transcoding
model. See, for example, Ellis (1982), Ellis and Young (1988), and Kay, Lesser, and Coltheart (1992). For the history of this model layout, see our e-paper on "Transcoding
Models".
In this section, we shall look at some of the most influential of these modular models.
4.1 Lordat's Very Early Speech Production Model
This is the subject of a dedicated separate paper. See Lordat (1843).
4.2 Other Early
Speech Production Models
Lordat's was the first of many 19th century speech production models, of which the following may be worth a quick browse, if interested in the historical aspects of the subject .....
4.3 Shannon's
Idealised Communication Model
The late 1940s saw a wave of interest in the engineering aspects of communication. This led to telecommunications experts borrowing freely from the aphasiology literature, and, in turn, to psycholinguistics borrowing back much of the resulting vocabulary during the ensuing decade [specifically, words such as encoding, information, working memory, signal-to-noise ratio, and feedback]. The engineer who did most to systematise the way we look at communication and its failures was Claude Shannon, then with the Bell Telephone Company [fuller story in our e-paper on "Shannonian Communication Theory", if interested].
4.4 Fromkin's "Utterance Generator" Model
The 1950s saw increasing interest in psycholinguistic experimentation, with major works by George Miller (Miller, 1951), Colin Cherry (Cherry, 1957), and Roger Brown (Brown, 1958). This experimentation was complemented by an expanding literature on the psycholinguistic impact of brain damage led by the likes of Harold Goodglass (1920-2002) and Norman Geschwind (1926-1984). This groundwork then generated a number of explanatory models across psychology as a whole. Donald Broadbent became one of the lead-theorists for selective attention, John Morton did the same for modular language processing, Atkinson and Shiffrin did the same for short-term memory, and Alan Baddeley added working memory. The pivotal work in the field of speech production was Fromkin's (1971, 1973) "Utterance Generator" model, which largely resurrected Lordat's 19th century scheme of things with the following six-stage explanatory analysis .....
Stage 1 Processing - Semantic System: This is where the meaning to be conveyed is first generated.
Stage 2 Processing - Syntactic System: This is where an appropriate syntactic
"slot" [glossary]
structure is decided upon.
Stage 3 Processing - Lexical System: This is where content words [glossary] are
extracted from the lexicon to help give shape to the developing sentence.
Stage 4 Processing - Prosodic System: This is where an appropriate intonation pattern is decided upon.
Stage 5 Processing - Phonological Assembly: This is where function words [glossary]
are inserted at key points in the emerging sentence structure, and then
abstract sounds attached to the words and morphemes as they fall into position
within each clause.
Stage 6 Processing - Phonetic System: This is where concrete sounds are attached to the abstract sounds, and
muscle activation commences.
|
Exercise 4 - So
is it Six Stages or Three? Most of the 19th century models settled for three or four stages of speech production, and Requin, Riehle, and Seal (1988) have argued that three hierarchical processing levels is nature's norm for biological motor behaviour. Yet most of the models mentioned in this section end up with five or six. Suggest how this apparent disagreement might be explained. |
4.5 Garrett's
Speech Production Model
This is the subject of a dedicated separate paper. See Garrett (1990). [At two points in his model, Garrett shows two processes dealing with different aspects of the processing simultaneously - thus going some way towards fitting six processes into only three processing modules.]
4.6
Butterworth's Modern Speech Production Model
Drawing on both Fromkin and Garrett, Butterworth (1985) offers a flow diagram similar to Garrett's, in which the following "processing systems, or modules" are identified .....
Semantic System: This is Fromkin's "Stage 1 Processing" as defined above.
Butterworth regards it as passing information to "the next three systems
in parallel" (p68).
Syntactic System: This is Fromkin's "Stage 2 Processing" as defined above.
Butterworth regards it as receiving the first of the three streams of
information coming out of the semantic system, and as using this information to
set up appropriate sentence and clause constructions [glossary].
Lexical System: This is Fromkin's "Stage 3 Processing" as defined above.
Butterworth regards it as receiving the second stream of information coming out
of the semantic system, and as using this information to select suitable words
"from an inventory - lexicon - of word forms" (p69).
Prosodic System: This is Fromkin's "Stage 4 Processing" as defined above.
Butterworth regards it as receiving the third stream of information coming out
of the semantic system, and as using this information to choose "an
appropriate intonation contour" (p69).
Phonological Assembly System: This
is Fromkin's "Stage 5 Processing" as
defined above. Butterworth regards it as setting up a "phonemic string
with syntactic bracketing".
Phonetic System: This is Fromkin's "Stage 6 Processing" as defined above.
Butterworth regards it as taking the output from the Phonological Assembly
System, and as then generating suitable motor commands. This is the point at
which the abstract phonemes [glossary]
begin to turn into concrete phones [glossary].
It is also the point at which coarticulation [glossary]
takes place.
|
Figure 2 -
Butterworth's (1985) Speech Production Model: This diagram lays out the modules described above in
the now-familiar general layout. We shall therefore comment only on the
model's uniquenesses .....
|
|
Redrawn from a black-and-white original in Butterworth (1985, p68; Figure 3.1). This graphic Copyright © 2003, Derek J. Smith. |
4.7 Levelt School Models
Levelt (1989) published a major monograph on speech production under the title "Speaking: From Intention to Articulation". As head of the Max Planck Institute for Psycholinguistics, one of his major points was to consider the difference between "lexical encoding", the retrieval (and creation if necessary) of words to express ideas, and "syntactic encoding", the retrieval and sequencing of words to express ideas .....
"But
languages differ enormously in the degree to which they exploit [lexical
encoding]. While a Turkish speaker's grammatical encoding consists for the most
part of such lexical encoding, an English speaker is extremely 'conservative'
in the sense that he normally uses words he has heard often in the past. For
the English speaker, lexical encoding plays a minor role in grammatical
encoding; the action is in syntactic encoding. A theory of the speaker should,
of course, encompass both kinds of grammatical encoding. As a matter of fact,
however, almost nothing is known about the psychology of lexical
encoding." (Levelt, 1989, p186)
In an attempt to cast some light on the processes of lexical encoding, Levelt did much to popularise the use of the term "lemma" [see earlier discussion, Section 1.3]. Thus .....
".....
from the point of view of language production a lexical entry can be split up
into two parts: its lemma and its form information []. This theoretical
distinction can be extended to the mental lexicon as a whole. Lemmas can be
said to be 'in the lemma lexicon', and morpho-phonological
forms to be 'in the form lexicon'. Each lemma 'points' to its corresponding
form. [.....] The semantic information in a lemma specifies what
conceptual conditions have to be fulfilled in the message for the lemma to be
activated; it is the lemma's meaning. These conditions can be stated in
the same propositional format as messages. [.....] A lemma's syntactic
information specifies the item's syntactic category, its assignment of
grammatical functions, and a set of diacritic feature variables or
parameters." (Levelt, 1989, pp187-190)
Further down the system, Levelt sees the process of phonological encoding as working this way .....
"Phonological
encoding is a process by which the phonological specifications of lexical items
are retrieved and mapped onto a fluently pronounceable string of syllables.
Unpacking a word's phonological specifications and using them to retrieve the
appropriate syllable programs involves various levels of processing. Studies of
the tip-of-the-tongue phenomenon in which this process of phonological
unpacking is blocked or slowed, support this view." (Levelt,
1989, pp361-362)
Two years later, Donald (1991) drew on Levelt's work in his own "evolutionary" theory of the speech motor hierarchy .....
|
Figure 3
-Donald's (1991) Speech Production Model: This model was developed from earlier models by Butterworth (1980, 1985)
and Levelt (1989). It places a Linguistic
Controller L at the top of a "vertically integrated" speech system.
L then creates "narrative models" out of ideas released to it (a)
from episodic memory as the result of current stimulation, and (b) from a mental
structure he calls the Mimetic Controller, a hypothetical mechanism
believed to be responsible for the production of "conscious,
self-initiated, representational acts that are intentional but not
linguistic" (Donald, 1991, p168) [this being
nothing less than the evolutionary advance which brought about the emergence
of the modern human]. The lower processes are jointly responsible for the
"lexical assembly" of the final utterance. This brings in subprocesses for selecting, sequencing, and determining
the correct form of the words to be produced. The Phonetic Plan "maps
the assembled utterance onto neuromotor paths and,
ultimately, the vocal musculature". All this makes for the generally
familiar layout, so again we only need to point out the model's uniquenesses .....
|
|
Redrawn from a black-and-white original in Donald (1991, p260; Figure 7.2). This graphic Copyright © 2003, Derek J. Smith. |
Finally, Levelt, Roelofs, and Meyer (1999/2003 online) are typical of the latest offering from Levelt's research unit .....
|
Figure 4 - Levelt, Roelofs, and Meyer's
(1999) Speech Production Model:
This diagram, too, adopts the now familiar general layout, so again we shall note only its points of uniqueness .....
|
|
Redrawn from a black-and-white original in Levelt, Roelofs, and Meyer (1999, p3; Figure 1). This graphic Copyright © 2003, Derek J. Smith. |
5 - Feedback in
Speech Production Models
The topic of feedback was introduced in our e-paper on "The Basics of Cybernetics", and is especially important to speech production theory. Gracco and Abbs (1987) are among many to point out that continuous speech involves continuous feedback, that is to say, that the continuous execution of a motor program requires an equally continuous stream of sensory information from muscle and cutaneous senses throughout the respiratory, laryngeal, and orofacial regions. Similarly, but at a higher level of analysis, Levelt (1989) devotes an entire chapter to the topic of self-monitoring and self-repair. Among the types of feedback Levelt deals with are .....
* Am I saying what I meant to say?
* Is this the way I meant to say it?
* Is what I am saying socially appropriate?
* Am I selecting the right words?
* Am I using the right syntax and
morphology?
* Am I making any phonological
errors?
* Is my articulation at the right
speed and pitch?
Successful speech production, in other words, is a constant battle against error, and those errors can pop up anywhere. The phrases we then use to interrupt and correct ourselves (phrases such as "sorry", "I mean", "let me put that another way", etc.) are known generically as "editing expressions" (Hockett, 1967). Levelt (1989) summarised the issue thus .....
"The
major feature of editor theories [of monitoring] is that production results are
fed back through a device that is external to the production system.
Such a device is called an editor or a monitor. This device can be
distributed in the sense that it can check in-between results at different
levels of processing. The editor may, for instance, monitor the construction of
the preverbal message, the appropriateness of lexical access, the well-formedness of syntax, or the flawlessness of
phonological-form access. There is, so to speak, a watchful little homonculus connected to each processor." (Levelt, 1989, pp467-468; italics
original; bold emphasis added)
In this section, we look at how feedback and editing have been studied objectively .....
5.1 Early
Studies
Lee (1951) pioneered a technique of replaying a person's speech to that person's own ears, subject to a variable time delay. Here is how he profiles his method .....
"In
order to produce delayed speech feedback, it is necessary to return the
speaker's speech to his own ears approximately one quarter second after he has
spoken. This is best accomplished by means of a magnetic tape [machine]. The
subject reads a moderately difficult text into the recording microphone with
the playback gain control to the telephone headset turned off, and a normal
reading pattern is established. The playback gain to the earphones is then
advanced until the subject's speech is disturbed." (Lee, 1951, p53)
Using this experimental set-up, Lee found that there were two types of common effect. Subjects either (a) slowed down and raised their voices, or else (b) began to speak haltingly, repeating syllables in a form of "artificial stutter". The same phenomenon emerged with skilled tympanists reading a drum-beat, and for the key presses of skilled Morse Code operators. Lee gives the following specific examples .....
aluminum..... degrades to aluminum-num.....
ten-nine-eight-seven....
degrades to ten-nine-nine-eight-seven.....
Lee interpreted these findings as evidence of a multiple loop control hierarchy, with four levels of feedback, as follows .....
The "Thought Loop":
The top control level releases individual thoughts for action, and then
monitors that action for successful progress and completion. The highest level
feedback loop then monitors the output for what would nowadays be termed its
pragmatic appropriacy [strictly speaking, its "perlocutionary effect" - glossary].
The "Word Loop":
The second highest loop monitors speech production for word selection accuracy.
The "Voice Loop":
The third highest loop monitors speech production at whole-syllable level for
morphological accuracy.
The Articulating Loop":
Finally, the lowest loop monitors speech
production checking that the right phonemes
have been used within each syllable.
It is confusion at the hand-over between the second and the third level which presumably causes the aluminum-num syllable repetition. There were no single-phoneme repetitions. Here is Lee's own conclusion .....
"The
satisfaction at each stage by a monitoring system is required; otherwise the
machine halts, repeats, or repeats corrected. Repetition of sentences and words
is volitional for emphasis, increased clarity, or correction of gross errors.
Repetition of syllables is probably involuntary, or reflex, and it is at this
stage that artificial stutter is manifested. Repetition of phonemes has
not been artificially induced by delayed speech feedback in [our] observation
....." (Lee, 1951, p54)
So compelling were these early studies, that Mysak (1966) explicitly put cybernetics and speech pathology in bed together in his book "Speech Pathology and Feedback Theory".
5.2 Editing and
Editing Expressions
Motley, Camden, and Baars (1982) argued the existence of a function of "prearticulatory editing", as follows .....
"Editing
has been described as a phase of speech production which occurs after the
phonological phase (ie. after the impending message
has evolved its phonological representation) but before the articulatory
phase, and which operates to test or check the linguistic integrity of the
incoming phoneme strings. The edit presumably approves for subsequent
articulation those phoneme strings which are linguistically appropriate; but
vetoes and attempts to replace those which are linguistically anomalous, thus
preventing their articulation." (Motley, Camden, and Baars,
1982, p578)
Motley et al then carried out a dozen or so studies in the late 1970s and early 1980s on cleverly induced Spoonerisms. They called their method SLIP - for "Spoonerisms of Laboratory-Induced Predisposition", explaining it as follows .....
"Subjects
are instructed to read silently a series of tachistoscopically-presented
word pairs, speaking aloud certain cued 'target' pairs. Unbeknownst to the
subject, these target word pairs are immediately preceded in the series by
'interference' pairs designed to phonologically resemble the spoonerised version of the intended target. For example,
the subject might read silently the interference items barred dorm and bought
dog immediately before seeing and attempting to articulate the target darn
bore. About 30% of subjects' attempted target utterances result in a
spoonerism - barn door in this example. Our most typical design has been
to compare the frequency of anomalous versus legitimate error utterances;
anomaly being defined according to various linguistic and quasi-linguistic
criteria." (Motley, Camden, and Baars, 1982, p579; italics original)
They then compared what they call the "slip-rate differential" between "legitimate" Spoonerisms and "anomalous" ones .....
"Our
most typical result is that legitimate errors far outnumber anomalous ones. For
example, a lexically legitimate SLIP spoonerisms like darn boor > barn
door will occur much more frequently than a similar but lexically anomalous
one like dart board > bart doard.
[//] This slip-rate differential [has] been the primary form of evidence for prearticulatory editing. That is to say, the above example
[.....] can be taken as evidence that when the SLIP subject constructs a
lexically legitimate phoneme string, the string is allowed to be output;
whereas when the subject constructs lexically anomalous potential output, it is
vetoed (by the edit), and its articulation is disallowed." (Motley,
Camden, and Baars, 1982, p579;
italics original)
As to the underlying neural mechanisms, Crosson (1985) offers a view of speech production involving Broca's area, Wernicke's area, and various substructures of the thalamus and basal ganglia, all interlinked by circulating and re-circulating white matter tracts, and delivering both semantic and phonological monitoring. For details, see the separate paper, Crosson (1985).
|
Exercise 5 -
Improving the Diagrams Levelt's model has lost Butterworth's two-part higher functions system, so it fails to separate semantics and pragmatics. It has also lost Garrett's sentence type and clause structure frames, does not deal at all well with parallel processing [we criticise merely the diagram here, which does not reflect the full richness of the Levelt School's broader theory], and has only one "up arrow" when there are potentially many. Use your diagramming skills to produce a bigger, better, model [in other words, add in Levelt's "watchful little homonculus", if you can, and wherever you can]. |
6 - Pathological
States Attributable to Defective Biological Control Systems
Now the reason box-and-arrow models are so important to clinicians is that there a number of very well known communication pathologies - not least, stuttering, dyspraxia, and dyslexia - which are actually cybernetic problems at heart. We have already covered stammering in Section 5.1, so here are some of the others .....
6.1 Dyslexia
Resulting from Poor Head/Eye Muscle Control
Dyslexia is an inability to process visually presented text efficiently. Whilst this is at first sight a perceptual problem, the very complexity of the oculomotor control system makes it a motor problem as well. You cannot read if you cannot control the movement of your eyes. When reading this text, for example, your eyes will be fixating after every eight characters (about every one and a half words) (Rayner and Pollatsek, 1989), and many authorities (typically Pavlidis, 1981/1985) believe that developmental dyslexia can be explained by defects in sequencing these fixations for maximum information uptake. Developmental dyslexics do appear to have eye movement patterns which differ from those of normal readers (Rayner and Pollatsek, 1989). However, this factor per se has not been strongly confirmed. Indeed, Rayner and Pollatsek place greater store in Stein and Fowler's (1982, 1984) findings of "vergence control" problems in dyslexics. Vergence movements are those which keep both eyes pointing at the same centre of attention. In normal readers, the two eyes move "conjugately", that is to say, they track at the same speed and in the same direction. Stein and Fowler's data suggests that about one in six cases of developmental dyslexia can improve reading performance with treatment of this problem in isolation.
As to the cybernetics of eye control, the oculomotor control system serves a variety of biologically essential behaviours such as food search and predator avoidance (Galiana, 1990). It therefore needs to be every bit as functionally sophisticated as the skeletomuscular system it is helping to guide. This functionality is provided by having a complex of feedforward, predictive, and feedback control loops at work. To start with, there are mechanisms controlling the automatic focussing of the lens, binocular vergence, and the automatic stopping down of pupillary aperture. There are then additional mechanisms to control the automatic positioning of the eye relative to the head as the head moves relative to both the body and the external world. These latter mechanisms place heavy information processing demands on the vestibular system, the system which processes the information provided by the semicircular canals of the inner ear (the "labyrinth"), the body's balance detectors. Information from the semicircular canals travels to the brainstem down the vestibular branch of the vestibulocochlear nerve (CN VIII). Here it links in via the vestibular nuclei of the lower pons to the cerebellum and a host of other components of the extrapyramidal system. Good reviews of this subject area can be found in Peterson and Richmond (1988), Galiana (1990), and Berthoz, Graf, and Vidal (1992).
6.2
Parkinsonism
The motor disorders which characterise Parkinson's disease are conventionally attributed to disorders of muscle control circuitry. Wiener himself likened Parkinsonian tremor to the oscillations of under-"damped" control loops (Wiener, 1950), Flowers (1978) blames lack of prediction, Harrington and Haaland (1991) blame "central processing deficits", and Dinnerstein, Frigyesi, and Lowenthal (1962) blame slower than normal proprioceptive feedback for a variety of the standard Parkinsonian symptoms, such as rigidity, slowness, and lack of coordination.
6.3 Learning
Difficulties
Many categories of learning difficulty present with an inability (amongst other things) to communicate effectively at a pragmatic level. This can be alleviated to a greater or lesser extent by training at what Williamson (1992) describes as "backchannel" skills. These include a wide variety of both vocal and nonvocal responses, such as nods, shakes, grunts, facial expressions, etc., whose function is to feed back to a speaker the extent to which his/her utterances are being understood. [This, by definition, must be working to Lee's highest level feedback loop - the "thought" loop.]
6.4 Anomia
Anomia [glossary] is an inability to find the name-word for something which is otherwise perfectly well understood. It is a very common clinical sign, and can arise from a variety of disease processes, both focal and diffuse, although it is particularly associated with injuries to the angular gyrus (Marshall, 1980). In its simplest form, anomia presents as difficulty with confrontational naming tasks [glossary], although the ability to describe a concept tangentially in the hope that this will compensate for the absence of its proper name is frequently preserved. Thus a patient might say "you cut your food with it" if s/he could not access the word "knife". This stratagem is known as circumlocution [glossary]. It is even possible for the lost target word to be included in the circumlocution even though it had been unavailable in isolation, as with "I'd use it to comb my hair" as a substitution for the word "comb" (Benson, 1979). Marshall (1980, p62) passes on a nice example of word finding difficulties in a patient describing a picture .....
"'That's
the ..... you know, the ..... very much like they got on the ..... on something
very much. I don't say that it's the proper one but it's like er er ..... I can't say it but I
can just ..... yes, that could be it, could be a bit like that, yes. [etc.]
(Marshall, 1977)."
As far as the explanatory models are concerned, we are fortunate that they were originally drawn up with anomia in mind. Driven by the mass of clinical data accumulated since the 1860's, all modern models consistently separate ideation from word selection, that is to say, they separate the semantic system from the output lexicons [this being the crucial difference between the linguistic and psycholinguistic usage of the word lexicon - see glossary]. Gnosis [glossary] is what the semantic system does, and naming is what the speech output lexicon does. Morton himself relates anomic aphasia to problems moving outwards from the semantic system to the output lexicon, just as did the nineteenth century diagram makers before him. He then contrasts this with optic aphasia where there are problems moving inwards towards the semantic system from the visual input lexicons (Morton, 1985).
But anomic aphasia is not the only condition in which word finding difficulty is found. Benson (1979) distinguishes no less than nine subtypes, of which the following five are to some extent aphasic .....
(a) Word Production Anomia: This is a
confrontational naming defect, but one which is resolvable upon phonemic
cueing (or "prompting"). If the patient is given the first
letter of the target word, the whole word suddenly becomes available. Patients
can appear to "know" the target name, but either cannot initiate its
production at all, or else produce a neologism instead. (This is therefore a
condition analogous to the "tip of the tongue" phenomenon discussed
in Section 1.3 above.)
(b) Word Selection Anomia: This is
another confrontational naming defect, but this time it is not usually
resolvable by cueing. Gnosis is intact (because patients can immediately point
to the object in question if told its name), and conversational speech is
otherwise fluent and effortless.
(c) Semantic Anomia: This is
an inability to use an objects's name as a mental
symbol. It is superficially similar to (b), but patients cannot point to the
object in question if told its name.
(d) Category-Specific Anomia: This is
an anomia for a particular conceptual class of
objects. It is quite rare, nevertheless it has prompted authors such as Baron
(1976) and Allport (1985) to
describe the semantic lexicon as having various regions (or "zones",
or "domains", etc.), each dealing with a particular class of
attributes. Thus an object's pictorial attributes, colour attributes,
positional attributes, "eye-head-body movement" attributes, and even
smell and taste attributes, are regarded as being stored in separate parts of
one large distributed engram system.
(e) Modality-Specific Anomia: This is
an anomia for objects presented in one modality but
not another (visual, for example, but not auditory). However, it is probably
best treated as an optic (or auditory) aphasia, rather than as an anomia as such.
6.5 Agrammatism
This term derives from Kussmaul (1877) and refers to a Broca's-type aphasic condition characterised by sentence foreshortening and word morphology problems. The foreshortening is not haphazard, however, for it involves omitting many/all of a sentence's function words (articles, conjunctions, pronouns, prepositions, and auxiliary verbs) and inflectional word endings (-s, -ed, -ing). The end result is what is known as telegraphic speech, a word sequence built up mainly of nouns, but broken up by the occasional verb and qualifier (Goodglass and Menn, 1985). The conjunction "and" is often spared, although this may evidence a repair strategy more than a true cognitive ability. Here are some examples .....
"First
morning, drink coffee, and sweep and go field, afternoon such a pill, one and
go field ....." (Heilbronner, 1906, cited in
McCarthy and Warrington, 1990.)
"Cinderella
... poor ... um 'dopted her ... scrubbed floor, um,
tidy ... poor, um ... 'dopted ... Si-sisters and
mother ... ball. Ball, prince, um, shoe ...[prompt to continue] Scrubbed and uh
washed and uh ... tidy, uh, sisters and mother, prince, no, prince, yes. Cinderella
hooked prince. (laughs). Um, um shoe, um, twelve o'clock, ball /pInaSt/, finished." (Schwartz, Linebarger,
and Saffran, 1985, p84;
Patient "ME".)
Further
examples in McCarthy and Warrington (1985).
As far as the underlying anatomy is concerned
McCarthy and Warrington (1990, p185) conclude that
"the [Broca's symptom complex] is often
associated with relatively widespread lesions affecting both anterior language
areas (frontal lobe), deeper structures (insula), as
well as anterior temporal lobe damage", and as far as the underlying
processing is concerned Kolk, Van Grunsven,
and Keyser (1985) and Caplan (1985) have explicitly
linked agrammatic conditions to Garrett's model (which,
it will be recalled, was originally developed from speech error data from normal
subjects). They conclude that internal language is inherently telegraphic at
the best of times, at least at all stages prior to Garrett's functional stage.
A similar line of argument has been developed more recently by Grodzinsky (1990), who has approached agrammatism as a linguist. He describes surface speech as lacking both non-lexical terminals and governed prepositions. Indeed, in stark contrast to the anomias, the only thing agrammatic patients are left with is a naming ability! However, it is unlikely that a final answer will be possible until more is known about normal speech production, that is to say, until we have better speech production models to work with. (And, specifically, models which can link the hard facts of linguistic theory to the more advanced theories of semantic memory structure.)
|
Exercise 6 - Agrammatism Simulated 1 Rewrite the preceding paragraph to exclude all articles, conjunctions, pronouns, prepositions, and auxiliary verbs, and all noun- and verb-root endings. Read the residual text out loud. |
6.6 Jargon
Aphasia
The term "jargon aphasia" derives from Alajouanine, Sabouraud, and Ribaucourt (1952), and is "a rare and spectacular manifestation of an aphasic condition" (Butterworth, 1985, p61). By contrast with agrammatism, the phonology and prosody of the host language are retained, as are many of the rules of morphology (the nonsense is often appropriately matched for number, case, and gender). In addition, the patient is often blissfully unaware of the impairment. Three different syndromes have been identified (Butterworth, 1985) .....
(a) Semantic Jargon: This is where "the words employed, although real,
are semantically inappropriate, sometimes to the extent of seeming stripped of
their normal meaning" (Butterworth, 1985, p63).
Here is a specimen: "Experimenter: What does 'strike while the iron
is hot' mean? Patient: Better to be good and to Post Office and to
Pillar Box and to distribution to mail and survey and headmaster. Southern
Railways very good and London and Scotland" (Kinsbourne
and Warrington, 1963; Patient "EF", cited
in Buckingham, 1985).
(b) Neologistic
Jargon: This is where speech includes
made-up words - words not found in the dictionary. Butterworth (1979) reports
that neologisms were used as nouns (61%), verbs (20%), or adjectives (15%) -
the categories known as content words, where each word must be chosen from a
large number of options. Neologisms were rare in function word context (4% in
total). Here is a specimen: "A man is asked the question, 'Who is running
the store now?' He replies, 'I don't know. Yes the bick,
uh, yes I would say that the mick daysys
nosis or chpickters.
Course, I have also missed on the carfter teck. Do you know what that is? I've, uh, token to ingish. They have been toast sosilly.
They'd have been put to myafa and made palis and, uh, myadakal senda you. That is me alordisdus.
That makes anacronous senda'"
(Buckingham and Kertesz, 1976, cited in Marshall,
1980, p62).
(c) Phonemic Jargon: This is where speech degenerates into a succession of
meaningless sounds, so that it becomes impossible to identify word boundaries.
Some phonotactic rules remain obeyed, as with the
clusters "tr", "nkr",
"str", and "mbr"
in the following specimen: "When asked to read the sentence It shall be
in the power of the College to examine or not examine any licentiate,
previously to his admission to a fellowship, as they shall think fit, he
produced the following: A the be what in the temother
of the trothotodoo to majorum
or that emidrate ein einkrastrai mestreit to ketra totombreidei to ra fromtreido as that kekritest." (Perecman
and Brown, 1981, p178; italics original.)
|
Exercise 7 - The
Three Jargon Types Simulated 1
Rewrite the next paragraph, replacing every second noun by a semantically
random word or short phrase (picked from a dictionary "with a
pin"). Read the resulting "semantic jargon" out loud. 2
Repeat (1), but this time replacing every fourth word with a made-up
(nonsense) word. Read the resulting "neologistic
jargon" out loud. 3 Just read the following "phonemic jargon" out loud "temother of the trothotodoo to majorum or that emidrate ". |
As far as the underlying anatomy is concerned, Kertesz (1981) reviewed ten cases of neologistic jargon in detail and found a significant pattern to the underlying lesions. He concluded that "the most consistently affected regions are the supramarginal gyrus, the posterior parietal operculum, the inferior parietal lobule, the first portion of the first temporal gyrus, the posterior temporal operculum (planum temporale), and the angular gyrus" (Kertesz, 1981, p100).
6.7 Dyspraxia
of Speech
Given our earlier definition of praxis, it follows that the essence of a dyspraxia has to be an impaired ability to initiate voluntary movements - an inability to move the tongue to lick the lips when commanded, for example.
ASIDE - PRAXIS VS REFLEX: It
is important to realise that the defect is solely one of initiating the
movement, and that the muscles and motor systems themselves are intact. If the
initiation is reflex or unwilled in any way - licking honey from the lips,
perhaps - the information comes across the standard A-shaped processing
hierarchy rather than down it, and the movement can be performed perfectly
well!
Dyspraxic defects were first formally described by the German aphasiologist Liepmann (1900, 1905), and his explanation stuck closely to the speech production stages paradigm we saw so much of in Sections 3 and 4 above. Patients who cannot mentally conceive of having the required movement are deemed to have an ideational apraxia ('ideatorische apraxie'), patients who can have the idea, but not communicate that idea to the appropriate motor systems, are deemed to have an ideokinetic apraxia ('ideo-kinetische apraxie'), and patients whose motor systems cannot cope properly with the instructions sent to them are deemed to have a kinetic apraxia ('kinetische apraxie'). Subsequently, the German psychiatrist Kleist (1912) described the deficit of constructional apraxia, in which the ability to organise actions in space is affected. He regarded this as yet another disconnection syndrome, this time of the ability to transmit information between the processes of spatial analysis and those of voluntary action. A vivid description of dyspraxic speech is given by Darley, Aronson, and Brown .....
"As
they speak, they struggle to position their articulators correctly. They
visibly and audibly grope as they struggle to produce correct articulatory postures and to accomplish a sequence of these
postures in forming words. Their articulation is frequently off target. They
often recognise that they are off target and effortfully
try to correct the error. Their errors recur, nonetheless, but they are not
always the same; the errors on a series of trials are highly variable. As
patients struggle to avoid articulatory error by
careful programming of muscle movements, they slow down, space their words and
syllables evenly, and stress them equally. Thus the prosody of their speech is
altered as well as their articulation." (Darley, Aronson, and Brown, 1975,
p250)
Phoneme substitutions, additions, repetitions, and prolongations are common. Thus .....
"I
am looking an a drawring
or a-a pec-picture of what is
apparently a tor-nuh-ner-nor-tornatiuhd blew-brewing in the c-countryside.
This is having an nuh-nuhmediate and
frightening ef-f-ff-fuh-feck on a fairm famerly num-ber-ing - - - six uh humans and af-ff-sss-uh-sh-suh-sorted
farm uh animals. There are quick-uh-ly
going into a-a sss-sor-sormb uh cellar
with fright in their ar-uh-eyes and in
their every - movement. (Darley, Aronson, and Brown, 1975, p250.
Underlines indicate errors and hyphens indicate hesitation.)
In the language of control theory, the suspicion is that a major feedforward mechanism is failing. Indeed, this underlies the distinction between planning and executive apraxias adopted by such authorities as Michael Crary of the University of Florida Health Science Centre, Gainesville. (Crary, 1993, with due acknowledgement to an earlier paper by Roy, 1978). The Roy-Crary scheme identifies four subtypes of the disorder .....
(a) Primary Planning
Dyspraxia: This is a high level
planning defect, and is usually associated with frontal lobe pathology.
(b) Secondary Planning
Dyspraxia: This is a lower level
planning defect - a defect of spatial organisation, perhaps, rather than an
inability to think ahead in its broadest sense. It is usually associated with
parietal-occipital pathology.
(c) Unit Dyspraxia: This is a defect which affects one particularly muscle
system more than the remainder.
(d) Executive Dyspraxia:
This is an inability to execute
movements which have successfully passed the planning stage. It is usually
associated with premotor area pathology. According to
Roy: "..... although the patient is able to plan the motor activity as the
frontal areas are intact and the pathways from the frontal to the premotor area are undisturbed, he is unable to execute the
movement sequence properly due to damage to the area responsible for outputting
the planned motor sequence" (Roy, 1978, p197).
Also from the University of Florida, Rothi and Heilman (1996) bring the story full circle by explicitly cross-mapping Liepmann's early descriptions onto a modern cognitive neuropsychological model (not dissimilar to our old friend, the PALPA model). The meaning of actions - that is to say, their utility to the organism in drawing up its plans - is seen as being mediated by a central "action semantics" store, whilst the supporting physical attributes are mediated by a secondary array of "action lexicons". It is a significant vindication of the transcoding school to find that their models - derived as they were from dyslexias and the like - can be applied just as effectively to human movement in its broadest form!
6.8 Dysarthria of Speech
Before proceeding, students should refresh their memories
concerning the essentials of the motor system [revise it now]. Brainstem and cranial nerve anatomy [revise it now] is particularly important, as is the differential layout
of the pyramidal and extrapyramidal systems!
Dysarthria of speech is the name given to speech difficulties arising from problems controlling the musculature involved. Some idea of the precision required can be gained from the fact that the tongue and lip movements in normal speech need to be coordinated to within 1/100th of a second and made to an accuracy of 1 mm (Netsell, 1986). So what happens when this accuracy of placing and timing starts to fall away? Well to start with .....
"In
dysarthria one finds evidence of slowness, weakness, incoordination, or change of tone of the speech
musculature. [All] basic motor processes - respiration, phonation, resonance,
articulation, and prosody - are variably involved. [The] most characteristic
error made by dysarthric patients is imprecise
production of consonants, usually in the form of distortions and
omissions." (Darley, Aronson, and Brown, 1975, p251.)
The situation is seriously complicated, however, because the motor mechanisms mentioned above (respiration, phonation, resonance, articulation, and prosody) are controlled by a minutely intricate arrangement of both spinal and cranial nerves. There are, as a result, several clinically distinct subtypes of dysarthria. Darley, Aronson, and Brown identify no less than six, as now described .....
(a) Flaccid Dysarthria: This
type of dysarthria arises from damage at the level of
the lower motor neuron. Two clusters of lower motor neurons are
significant. Firstly, there are spinal nerve outflows from both thoracic
and lumbar regions. These innervate abdominal and intercostal
muscles, including the diaphragm, and thereby control the respiratory aspect of
phonation. Secondly, there are cranial nerve outflows from the lower pons and the medulla. Of these, CN
V (trigeminal), CN VII (facial), CN
X (vagus), and CN XII
(hypoglossal) have sensory and motor functions in pharynx and jaw. Clinically,
the salient features are muscle weakness and hypotonia,
and reduced or absent stretch reflexes. The condition is found in the various
subtypes of bulbar palsy (depending on which of the cranial nerves is
damaged). Speech presents as "slow, hypernasal,
and breathy, with reduced loudness and reduced pitch variability" (Netsell, 1986, p41).
(b) Spastic Dysarthria: This
type of dysarthria arises from damage at the level of
the upper motor neuron. As a result, lateralised lesions to corticospinal tracts will present as contralateral
defects, whilst those to corticobulbar tracts (not
fully decussating) will present as milder bilateral weaknesses of tongue, lips,
or face. The condition is found in what is known as pseudobulbar
palsy (bilateral corticobulbar damage at the
level of the upper motor neuron) and spastic hemiplegia.
Speech presents as slow, indistinct, and effortful, as though produced against
resistance. The descriptions "rasping" and "dragging" may
also be encountered.
(c) Ataxic Dysarthria: This
type of dysarthria arises from damage to the cerebellar system, most frequently bilaterally. The
condition is found in, for example, Friedreich's
ataxia. The salient features are inaccuracy and clumsiness of articulation,
alternating with some staccato speech periods and unexpected changes of pitch.
(d) Hypokinetic
Dysarthria: This
type of dysarthria arises from damage to the extrapyramidal system, thus implicating structures such as
the basal ganglia. Clinically, the salient features are slowness of movement,
limited range, rigidity, and tremor at rest. The condition is found in Parkinson's
disease and some Huntington's disease. Speech is soft due to poor
coordination of respiration and phonation, and wavering due to inefficient
vocal fold "hunting" [glossary].
When laryngeal closure eventually fails totally, then all voicing is lost and
only whispering is possible.
(e) Hyperkinetic Dysarthria: This
type of dysarthria also arises from damage to the extrapyramidal system, but with a different set of salient
features, namely jerks, tics, chorea [glossary], ballism [glossary], and athetosis [glossary]. The condition is
found in some Huntington's disease, Tourette's
syndrome, and Sydenham's chorea (more commonly known as St Vitus' dance). Speech presents as hesitant and randomly
discontinuous due to disruption of orderly phonation.
(f) Mixed Dysarthrias: This
type of dysarthria arises from multifocal or diffuse
lesions, that is to say, those which damage more than one part of the motor
system. The condition is found in multiple sclerosis, and the symptoms
are highly varied as a result of the multifocal nature of the underlying
pathology.
7 - Remarks
Finally, it is worth looking again at the speech
models identified in Section 4 in the light of the speech production defects
listed above. The anomias seem to be defects at or
around linguistic control level, either with having an idea in the first place,
or else with extracting an item from the semantic lexicon with which to express
that idea. The agrammatisms seem to be defects at or
around the lexical assembly area: the small clause [glossary]
as it is produced by the linguistic controller is assumed to be inherently
telegraphic at the best of times, but in this particular type of defect is then
denied further expansion by defective morphological and syntactic assembly
processes. The jargons are defects across a somewhat broader spectrum: in all
three subtypes, it is as if (a) ideation itself is impaired, or (b) the
linguistic controller and sentence assembly processes are so impaired as to
make it seem as though ideation is impaired. The dyspraxias,
too, can show themselves at a number of levels along the planning-execution
dimension, from ideation at the top to phonetic planning at the bottom. And the
dysarthrias are defects in execution, in either its feedforward or feedback aspects. Note how difficult
it is to locate the defect accurately in most cases.
References
See the Master References List
[Home]