Speech processing
|
| |

Accent conversion: Learners of a second language practice
their pronunciation by listening to and imitating utterances from
native speakers.
Recent research has shown that choosing a well-matched native speaker
to imitate can have a positive impact on pronunciation training.
Here we propose a speech-modification technique that can generate
utterances with the vocal properties of the learner and the accent
of a native speaker. This is accomplished by altering both prosodic
and segmental characteristics of speech. Our results indicate that
the technique can reduce foreign accentedness without significantly
altering the voice-quality properties of the foreign speaker.
|
|
|
|
|
|
|
/r/+α(/r/-/l/) |
/r/ |
½(/r/+/l/) |
/l/ |
/l/+α(/l/-/r/) |
Speech exaggeration: The
objective of this work is to develop speech processing tools that
can highlight spectral differences between similar phonemes.
Such tools have application in second-language (L2) learning.
Take the case of Japanese speakers, who have trouble
discriminating /r/ from /l/ English phones (e.g., 'rock'
vs. 'lock') because such contrast does not exist in
their native language. Emphasizing spectral differences
between these phonemes can help
the L2 learner focus on
those spectral cues that carry discriminatory information.
|
|
Speech driven facial animation: In
collaboration with Professors Oscar Garcia and Ardy Goshtasbi (Wright
State University), Anna Esposito (Second Univesity of Naples, Italy)
and Isaac Rudomin (Tec de Monterrey, Mexico) we developed a "talking
head" (a.k.a. voice puppet), a three-dimensional animation of
a human head driven by a speech signal. The figure shows the basic
building blocks of the system: audio processing, video tracking,
audio-visual prediction and facial animation. For more information
on this project, visit the SDFA
webpage maintained by Praveen Kakumanu at Wright State. |
Face recognition
|
| |

The objective of this research is to determine
the extent to which caricatures help improve face recognition
by
humans
. To achieve this objective,
we are developing algorithms to generate 3D caricatures
of an individual face from single
frontal
images.
We also conduct perceptual
studies to determine whether exaggeration
of distinctive
facial features helps memorize and recognize faces. A second objective
of this research is to study the application of caricaturization
algorithms to machine-learning problems, specifically in the
area of automatic
face recognition.
|
| |
Wearable sensors
|
| |
The
objective of this work is to develop a wearable/wireless sensor system
that allows
us to monitor the physiology, activity, and
context of a user on a 24-hour basis. Our hardware design
is small and portable
to allow users to carry out their normal daily activities. Data
processing is based on a combination of nonlinear
system
identification
and statistical pattern recognition techniques. Our long-term goal
is to provide users with information and visualizations that allow
them to gain a better understanding
of their behavior and their habits. The figure shows the power spectral
density of a user's respiration signal for a period of 50 minutes;
in this example, increases in respiration rate correlate with physical
exertion (the user drove, parked his car, walked to
the office, and worked on the computer).
|
| |
Machine Olfaction
|
| |
We also work on several aspects of machine olfaction, ranging
from instrumentation for metal-oxide chemoresistors to spatio-temporal
coding in neuromorphic models of the olfactory pathway.
|
| |
Infrared
spectroscopy: The objective of this research is to develop
a low-cost infrared absorption spectroscope based
on linear variable filters. This instrument represents
an alternative to electronic-nose devices based on cross-selective
gas sensor arrays. Instead, the proposed instrument uses
the concept of computational “pseudosensors,” where
spectral lines in an analytical instrument are clustered
into groups and used as independent variables. At
the core of our system is an IR detector that combines an LVF
and an array of 64 pyroelectric detectors (IR Microsystems). The
LVF is a wedge-shaped interference filter, which provides
a bank of transmission wavelengths from the thinnest
end (short wavelengths) to the thickest (long wavelengths).
The LVF sits atop a 64-pixel pyroelectric detector array,
which produces a low-resolution spectrum of the transmitted spectra.
Because of its particular bonds and molecular structures, each chemical
species produces a unique
IR spectrum, which can be used for analytical purposes.
|
Instrumentation/signal
processing:
We also investigate temperature modulation procedures to improve
the
sensitivity and selectivity of metal-oxide chemoresistors. The figure
shows the chemical transient of a TGS sensor driven by a temperature
ramp in the presence of allyl alcohol, tert butanol and benzene at
different concentrations. As shown in the figure, the shape of the
transient response and the location of the conductance maxima contain
information about the identity of the analyte,
whereas the transient amplitude can be used to estimate its concentration. |
Sensory
analysis: This represents the grand
challenge for machine olfaction, how to correlate the response
of instrumental data
with the "gold standard": sensory analysis
from a trained human panel. In colaboration with the University
of Valladolid, we are developing pattern recognition methods to
predict the organoleptic properties of Spanish red wines from gas,
liquid and color sensor
arrays.
In
collaboration
with Duke and NC State University, University we have also used
chemical sensors to evaluate biomaterials for odor abatement in
swine facilities. The figure shows the correlation coefficients
between human
scores for irritation and pleasantness of biofiltered hog odors,
and their leave-one-out predictions from sensor-array data. |
Neuro-morphic
models: What are the key signal processing
mechanisms in the olfactory system, and how can they be used to process
data from chemical
sensor arrays? Our investigations to date have explored the process
of spatial coding at the glomerular (GL) layer through chemotopic
convergence of olfactory receptor neurons (ORN), and
pattern completion through phase coding in the KIII neurodynamics
system. The figure shows the spatial patterns of a 20x20 GL layer
receiving projections from a population of 400,000 ORNs. The top
ten images
represent the GL pattern for ten different odors at
a fixed concentration, whereas the lower ten images show the patterns
for odor L1 and the mixture L1+L2 at five increasing concentrations.
According to the model (and also to experimental results from neurobiology),
odor quality is encoded by a unique spatial pattern across GL, whereas
odor intensity
is
captured
by
the intensity
and the spread of this pattern. |
| |
Mobile robotics |
| |
Heterogeneous
mobots: With funding from NSF
and a donation from Applied Materials, PRISM is sponsoring a number
of senior design
projects in the area of mobile robotics. These projects address issues
related to sensing, such as sensor fusion for dead-reckoning, acoustic
navigation, odor plume tracking and multi-robot sensor networks.
The figure shows a small robot homing in on a light source.
The vehicle is able to communicate its findings to other robots
in the network through an RF link. More information about these
projects can be
obtained
from
the CPSC
483 class
webpage. |
Omnidirectional
imaging: We have developed a computational range sensor based
on ominidirectional vision. The device employs a structure-from-motion
strategy, where depth information is extracted from optical flow.
These range estimates are then used to build a probabilistic
map by means of certainty grids.
|
Prior
work: Global self-localization using sonar maps, Kohonen and
multilayer perceptrons. Probabilistic models for sonar transducers
using multilayer
perceptrons. Probabilistic navigation using Bayesian inference
and Partially Observable Markov Decision Processes. Perception
and navigation using certainty grids. Low-level navigation
and obstacle avoidance using potential fields. The figures
show a global and a local map construted from a sonar ring
with the certainty grid sensor fusion approach of Moravec
and Elfes. |
| |
Speech-driven facial animation |
| |
|
| |