Project Overview
Summary
Current state-of-the-art speech recognition systems generally
use Hidden Markov Models (HMMs) with frame-based spectral measures
(often cepstral coefficients) as the primary features. Traditional
spectral analysis techniques have been used for many years,
with progress in recognition accuracy over the last 10-15 years
being primarily incremental. This research project focuses on
the development of a significantly different approach to characterizing
speech signals, based on state-of-the-art techniques for time-series
modeling. These time-series techniques combine state-space embedding
methods and learning algorithms to create highly accurate non-linear
models of a system's state. This research integrates a dynamical
systems approach with a continuous speech recognition system,
changing the analytical focus from the frequency domain to the
time domain. The time-delay embedding technique, taken from
dynamical systems theory, is used to reconstruct the state spaces
of the speech waveforms. The resulting state spaces are then
characterized to generate a set of features, which are evaluated
with respect to their ability to differentiate the individual
phonemes that are the building blocks of speech.
Objectives
The focus of this project is to use time-domain analysis of
speech to create new modeling techniques and to gain a better
understanding of speech signals, leading to a subsequent improvement
in speech recognition accuracy. To achieve this, the primary
research objectives include the application of the time domain
embedding approach to the characterization of speech signals,
the development of an effective model for measuring differences
between the signals, and the integration of this model with
an HMM-based speech recognition system. The speech tasks used
for implementation of these objectives include both isolated
phoneme recognition and continuous word recognition experiments.
Methods
Successful achievement of the above objectives requires the
development of several new technologies. For the characterization
of speech signals in the time domain, the Time Series Data Mining
approach, which has been successfully applied to event prediction,
is modified for application to speech waveforms, including the
development of techniques for identifying optimal lag times
for the time-domain embedding process. Stochastic methods, including
various clustering techniques for learning parametric densities
such as Gaussian Mixture Models, are used for identifying appropriate
feature representations of the embedded waveforms. For integrating
these features with a recognition system, an HMM-based speech
system is modified to use the new time-domain features for computing
state occupancy likelihoods within the training and recognition
algorithms.
Impact
The impact of these new technologies and their application to
the speech recognition task extends into both the machine learning
and signal processing communities. The development of time-domain
characterization methods is directly applicable to many problems
of interest in the chaos and non-linear modeling domains. These
new methods are able to concretely measure differences between
the phase-space representations of dynamical systems. The application
to the speech recognition task is particularly appropriate for
this research, since it is a novel approach in a field where
traditional linear systems approaches have been unable to achieve
fully satisfactory results. It is expected that the experiments
conducted will lead to significant gains with respect to a fundamental
understanding of the characteristics and analysis of speech
signals, with potential long-term application to other areas
of speech processing such as speech coding and synthesis.
Demonstration videos (in AVI format)
Video
1
Video
2
Video
3
Video
4
Contact information
Knowledge and Information Discovery Laboratory
Olin Hall of Engineering
Marquette University
P.O. 1881
Milwaukee, Wi 53201-1881
Faculty offices: Olin 523, (414) 288-6046, Haggerty 224, (414)
288-7088
Grad student offices: Olin 523, (414) 288-6046
Computer and research lab: Olin 523, (414) 288-3503
Speech and Signal Processing
Laboratory
Olin Hall of Engineering
Marquette University
P.O. 1881
Milwaukee, WI 53201-1881
Faculty offices: Olin 518D, (414) 288-1608, Haggerty 214, (414)
288-0631
Grad student offices: Olin 518, (414) 288-7451
Computer and research lab: Olin 518A, (414) 288-3503
Data collection lab: Olin 518B, (414) 288-3503