The movement of facial skin and muscles around the mouth plays an important role not only in the way the sounds of speech are made, but also in the way they are heard… “How your own face is moving makes a difference in how you ‘hear’ what you hear,” said first author Takayuki Ito, a senior scientist at Haskins.
Note that this sentence says that facial movement doesn’t affect what you hear, it only affects how you “hear” what you hear. More on this below.
When, Ito and his colleagues used a robotic device to stretch the facial skin of “listeners” in a way that would normally accompany speech production they found it affected the way the subjects heard the speech sounds.
The subjects listened to words one at a time that were taken from a computer-produced continuum between the words “head” and “had.” When the robot stretched the listener’s facial skin upward, words sounded more like “head.” With downward stretch, words sounded more like “had.” A backward stretch had no perceptual effect.
And, timing of the skin stretch was criticalâ€”perceptual changes were only observed when the stretch was similar to what occurs during speech production.
These effects of facial skin stretch indicate the involvement of the somatosensory system in the neural processing of speech sounds. This finding contributes in an important way to our understanding of the relationship between speech perception and production. It shows that there is a broad, non-auditory basis for “hearing” and that speech perception has important neural links to the mechanisms of speech production.
“Listeners,” “hearing”… Why do I worry so much about these damn quotation marks? Because they point out an assumption we tend to make about perception: that there are objective sense data out there in the world, ready to be accessed through our senses. Within this model, secondary effects (caused by face pulling robots) are seen as tricks played on our minds. But this is backwards. The astounding implication of this research is that our minds are composed of these tricks; the tricks are what produce a stable reality that meets our expectations.
For example, when the researchers were listening to recordings of the words “had” and “head” in order to design their experiment, the shape of their faces must have affected their hearing. (At least, that’s what their research seems to imply.) So who can listen without “listening”? Who determines whether the word is really “had” or “head”—someone without any facial expression at all?
The paper itself, which I haven’t read, can be purchased here.