Title: Creating a Speech Enabled Avatar from a Single Photograph
1Creating a Speech Enabled Avatar from a Single
Photograph
Dmitri Bitouk Shree K. Nayar
Columbia University
2Speech Enabled Avatar
Input photograph
3Speech Enabled Avatar
Input photograph
Avatar
4Speech Enabled Avatar
Input photograph
Avatar
- Applications
- mobile messaging and video conferencing
- news reporting and information kiosks
- novel user interfaces
5Facial Motion Synthesis Challenges
- Mapping phonemes to static mouth shapes produces
unrealistic, jerky animations - Co-articulation facial articulations can be
dominated the preceding as well upcoming phonemes - Asynchrony facial motion may precede the
corresponding sound
6Related Work
- Avatars from video sequences
- Bregler et al 1997, Ezzat et al 2002, etc
- 2D Avatars from photographs
- Blanz et al 2003, CrazyTalkTM , MotionPortraitTM
7Generic Facial Motion Model
Prototype Surface
Deformed Surface
- Facial motion parameters
Bitouk 2006
8Generic Facial Motion Model
9Facial Motion Transfer
Prototype Face
Novel Faces
Bitouk 2006
10Facial Motion Transfer
Prototype Face
Novel Faces
Bitouk 2006
11Hidden Markov Models
- Phonemes /B/, /K/, /AA/, /IY/, etc
- With lexical /B/, /K/, /AA0/, /AA1/, /IY0/,
/IY1/, etc - stress
- Triphones
Facial motion parameters
12Training Hidden Markov Models
- Training set consists of motion capture data
- Baum-Welch embedded re-estimation
- Cluster triphone states to predict triphones not
seen in the training set
13Facial Motion Synthesis from Text
Time-labeled phonemes
14Fitting the Prototype Model to an Image
2D Prototype Face
Photograph
15Fitting the Prototype Model to an Image
2D Prototype Face
Photograph
16Facial Motion Synthesis
17Eye Motion Synthesis
18Eyeball Texture Synthesis
Eye Image
Synthesized Eyeball Texture
19Eye Motion Synthesis
Eye Motion Geometry
20Eye Motion and Blinking
21Visual Text-to-Speech Synthesis
22Visual Text-to-Speech Synthesis
23Facial Motion Synthesis from Speech
Time-labeled phonemes
24Facial Motion Synthesis from Speech
253D Avatars
Captured Stereo Image
Mirror View
Direct View
Gluckman Nayar, 2001
263D Avatars
Rectified Images
3D Model
Mirror View
Direct View
273D Avatars
Point cloud engraved inside a glass cube
Digital projector
Nayar Anand, 2007
283D Avatars
29Limitations and Future Work
- Automatic facial feature detection
- Synthesis of rigid head motion
- Expressive speech
- Web demo of our system will be available in
- early April
- www.cs.columbia.edu/CAVE/
30The End