Dr. Theodore has received a 3-year grant from the Division of Behavioral and Cognitive Sciences of the National Science Foundation titled “Collaborative research: An integrated model of phonetic analysis and lexical access based on individual acoustic cues to features.” The activities will be completed by teams at UConn (with Dr. James Magnuson and Dr. Paul Allopenna) and MIT (Dr. Stefanie Shattuck-Hufnagel and Dr. Elizabeth Choi). The public abstract is shown below.
Abstract: One of the greatest mysteries in the cognitive and neural sciences is how humans achieve robust speech perception given extreme variation in the precise acoustics produced for any given speech sound or word. For example, people can produce different acoustics for the same vowel sound, while in other cases the acoustics for two different vowels may be nearly identical. The acoustic patterns also change depending on the rate at which the sounds are spoken. Listeners may also perceive a sound that was not actually produced due to massive reductions in speech pronunciation (e.g., the “t” and “y” sounds in “don’t you” are often reduced to “doncha”). Most theories assume that listeners recognize words in continuous speech by extracting consonants and vowels in a strictly sequential order. However, previous research has failed to find evidence for robust, invariant information in the acoustic signal that would allow listeners to extract the important information.
This project uses a new tool for the study of language processing, LEXI (for Linguistic-Event EXtraction and Interpretation), to test the hypothesis that individual acoustic cues for consonants and vowels can be extracted from the signal and can be used to determine the speaker’s intended words. When some acoustic cues for speech sounds are modified or missing, LEXI can detect the remaining cues and interpret them as evidence for the intended sounds and words. This research has potentially broad societal benefits, including optimization of machine-human interactions to accommodate atypical speech patterns seen in speech disorders or accented speech. This project supports training of 1-2 doctoral students and 8-10 undergraduate students through hands-on experience in experimental and computational research. All data, including code for computational models, the LEXI system, and speech databases labeled for acoustic cues will be publicly available through the Open Science Framework; preprints of all publications will be publicly available at PsyArxiv and NSF-PAR.
This interdisciplinary project unites signal analysis, psycholinguistic experimentation, and computational modeling to (1) survey the ways that acoustic cues vary in different contexts, (2) experimentally test how listeners use these cues through distributional learning for speech, and (3) use computational modeling to evaluate competing theories of how listeners recognize spoken words. The work will identify cue patterns in the signal that listeners use to recognize massive reductions in pronunciation and will experimentally test how listeners keep track of this systematic variation. This knowledge will be used to model how listeners “tune in” to the different ways speakers produce speech sounds. By using cues detected by LEXI as input to competing models of word recognition, the work provides an opportunity to examine the fine-grained time course of human speech recognition with large sets of spoken words; this is an important innovation because most cognitive models of speech do not work with speech input directly. Theoretical benefits include a strong test of the cue-based model of word recognition and the development of tools to allow virtually any model of speech recognition to work on real speech input, with practical implications for optimizing automatic speech recognition.