PSYcherry1953

Lecturer's Précis - Cherry (1953)

"Some experiments on the recognition of speech, with one and with two ears"

Copyright Notice: This material was written and published in Wales by Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning resource, and subject only to acknowledging Derek J. Smith's rights under international copyright law to be identified as author may be freely downloaded and printed off in single complete copies solely for the purposes of private study and/or review. Commercial exploitation rights are reserved. The remote hyperlinks have been selected for the academic appropriacy of their contents; they were free of offensive and litigious content when selected, and will be periodically checked to have remained so. Copyright © 2002-2018, Derek J. Smith.

First published online 08:11 BST 24th June 2002, Copyright Derek J. Smith (Chartered Engineer). This version [2.0 - copyright] 09:00 BST 4th July 2018.

Cherry's (1953) Dichotic Listening Research

This was a classic study into the cognitive system's ability to deal with competing auditory inputs. The following definitions and distinctions are important:

Monaural vs Binaural: To hear with one or two ears respectively. In normal circumstances, all sound sources are processed binaurally, and the auditory system is very good at using the microscopic differences in time of arrival (it can detect differences down to 30 millionths of a second) and sound intensity to compute the direction a sound is coming from.

Diotic vs Dichotic: Diotic is the same as binaural, and means hearing one, two, or more, sound sources with two ears, and then making sense of where that/those sound(s) are coming from. Dichotic, on the other hand, refers to the artificially generated state of hearing a different sound with each ear, as when messages are presented through earphones (Trimble, 1931). The dichotic listening paradigm described below is therefore good at pushing the perceptual system beyond its natural limitations in an attempt to see more clearly how it is put together.

Against this background, Cherry (1953) conducted six sets of experiments, as follows:

· The Basic "Mixed Message" Paradigm: In the first two series of experiments, Cherry investigated how we recognise what one person is saying when others are speaking at the same time, a situation he described as "the 'cocktail party problem'" (p976). Subjects were presented with two different spoken messages, recorded onto a single audiotape (ie. "mixed", in a tape editing sense) by the same speaker, and played back via headphones. Both messages were thus simultaneously and equally available to both ears, thus approximating to real life competitive conversation. Subjects were then instructed to repeat one of the messages word by word or phrase by phrase. Cherry's observations were (a) that subjects reproduced at phrase level, rather than word level, and (b) that there were extremely few transpositions of material from the to-be-rejected message, except where the competing sentence structures accidentally presented the subject with a high transition probability transposition. Subjects generally reported great difficulty with the task, but this eased appreciably if they were allowed to make written notes.

· Predictability: In this series of experiments, Cherry arranged for the mixed material to be full of clichés, that is to say, "highly probable phrases" such as "the time has come to stop beating around the bush". His observation was that output tended to consist of whole clichés, and that recognition of just the first one or two words of a stock phrase would typically prompt the entire phrase. Successful message separation, however, was "impossible", and a cliché from one message would as likely as not be followed by one from the other message. [Bear in mind that the strings of clichés within each message did not create a particularly sensible overall message, so there was no strong narrative theme to hold the parts together.]

· The Basic "Unmixed Message" Paradigm: In the remaining sets of experiments, subjects were presented with two different spoken messages, recorded onto separate audiotapes (ie. "unmixed" in a tape editing sense) by the same speaker, and played back by headphones, one message to each earpiece. Unlike the mixed message paradigm, each ear now only heard one message. Again, subjects were instructed to repeat one of the messages (always the right ear message) as accurately as possible. Cherry's general observations were (a) that subjects could switch between messages at will, (b) that they could repeat the selected message easily and accurately, but slightly delayed, (c) that their speaking voice became monotonous, with "little emotional content or stressing of the words", (d) that they remained unaware of this, (e) that they "may have very little idea" what the message was all about, and (f) that they took in very little about the content of the rejected message. Indeed, if the language of the unattended message was changed from English to German a few seconds into the trial, once shadowing of the target message had been successfully established, that change was not usually detected. This observation prompted further investigation of what sort of information, if any, was available from the rejected message .....

· Penetration of the Rejected Message: In this series of experiments, Cherry looked at what information, if any, remained available to the listener from an otherwise unattended message. He arranged for the unattended left ear message to change from its normal (male spoken English) once the trial was under way. His observations were (a) that a change from forward speech to backward speech (same sound profile, but zero lexical or semantic content) was noticed as "something queer about it" by some subjects but not noticed at all by others, (b) that a change from male to female voice was "nearly always" identified, (c) that a change to a 400 Hz tone was always noticed, and (d) that subjects could not say with certainty what language was being used.

· Same Message, Time Delayed: In this series of experiments, Cherry wished to investigate the mechanisms by which the brain decides whether the messages arriving at the ears is from a single source, a state of affairs he referred to as "correlated". The point is that when two inputs are correlated, they need to be merged internally, despite naturally occurring ear-to-ear differences in intensity and arrival time, whilst when they are from different sources one of them needs to be rejected internally. He therefore presented an identical message to each ear, but with the left (to be rejected) delayed relative to the right (to be shadowed). This was achieved by running a single length of pre-recorded audiotape through two physically separated tape players. The second tape player was then gradually moved closer to the first, thus reducing the playback delay. Cherry's observations were that "nearly all" subjects eventually recognised words or phrases from the rejected message as matching those in the attended ear. Cherry remarks that this is actually quite surprising, given that when different messages are used nothing is available from the rejected ear. The delay at which such recognition took place varied considerably between subjects, but was typically 2-6 secs.

Same Message, Alternating Ear: This series of experiments was prompted by the observation that it took a finite amount of time to switch attention from one ear to the other. Cherry recorded long samples of speech and switched it between his subjects' ears either (a) randomly, or (b) periodically. When this switching was slow (say once a second), subjects could shadow with 100% accuracy. When it was fast (say 20-50 times a second), most subjects could shadow "the majority" of the words, reporting that "they listened as though to both ears simultaneously" (p979). However, as the switching period decreased to around six or seven times a second, so too did accuracy. To investigate this critical speed in more detail, Cherry introduced short periods of silence into the message. When played to one ear this would mean hearing about 150 msecs. of message, followed by 10 msec. of silence, followed by the next message block, followed by the next silence, and so on (equivalent to six or seven cycles per second). Accuracy in this condition was 95-100%. When each message block was switched to alternate ears, however, accuracy reduced to less than 20%. Cherry concluded that this particular switching rate coincided with the very short time interval required to transfer attention from one ear to the other, and that by the time attention had been switched it needed to be switched back again.

References

Cherry, E.C. (1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America, 25(5):975-979.

Trimble, O.C. (1931). Concerning the meaning of the terms diotic and dichotic. American Journal of Psychology, 43:144.