Research Interests

Of particular interest to me at the moment are how we process auditory signals and extract semantic information from those signals, and democratizing access to linguistic information, research techniques, and speech technologies, which have in some cases been locked into black-box proprietary software or esoteric web interfaces. To those ends, I have a few research projects I'm engaged with currently.

The Role of Acoustic Distance in Spoken Word Recognition

Descriptions of spoken word recognition frequently invoke the notion of similarity between words. Similar sounding words are said to compete for activation during the course of spoken word recognition, for example. Yet, there has been little research on the topic of what it means for words to sound similar, with most studies resorting to comparing phoneme strings to determine similarity. This research project seeks to mathematically define a cognitively relevant measure of acoustic distance to undergird these descriptions of sound similarity for words.

New Approaches to Forced Alignment

One of the tools we use in phonetic research is forced alignment, which will automatically label the word-level and phone-level segments in a piece of speech. Many of the freely available tools that exist to perform this task do an adequate job, but they rely on the Hidden Markov Model Toolkit (HTK), which can be cumbersome to install and get working correctly. Neural networks, and especially deep nets are seeing a resurgence in the machine learning field and becoming a state of the art technique, however, and seem worth investigating as the backend of a forced aligner tool. Such a new tool may provide better alignment results than would the HTK-based aligners, and could leverage more easily accessible and freer software libraries.

Auditory Nonword Processing

One of the more interesting abilities we have in our speech perception arsenal is determining whether or not a "word" we hear is actually a part of our language or not. But what factors about this kind of stimulus serve to signal this to us? And what makes it more difficult for us to make this determination? Answering these questions will provide us a clearer window into the way we process auditory signals and extract semantic information from them. One of the ways I would like to examine this is through phonotactic probability, or the likelihood that a given sequence of phones occurs in a language. This would shed light on whether there is a point at which, before we finish perceiving a word, we determine its meaning (or lack thereof), and whether or not we may characterize certain segment sequences as "characteristic" of a language we speak.

Vowel Merger Assessment

A reliable vowel merger quantification has implications in a number of fields, such as dialectology in studying vowel merger, for example; second language speech learning, to help language learners and users more closely match the vowel targets in their target language; and sociophonetics to facilitate examining the variation in vowel production among different speakers of a language. To that end, one of the projects that I'm working on is a collaboration with Dr. Benjamin V. Tucker, analyzing and comparing current vowel overlap measures. These measures use the formant values, and optionally duration, in a dataset of vowels to compare each different vowel category to another and determine to what extent these categories overlap with each other. The problem, though, is that there is no consensus on which of the many proposed measures, if any, provides the most accurate and precise results and also account for the density of the data points. The project, then, focuses on finding which of these measures is the most suitable for the field to converge on for general purpose application.