Title: Geometric and Articulatory Models in Audiovisual Speech Technology
1Geometric and Articulatory Models in Audiovisual
Speech Technology
Tiina Karsikas
Audiovisuelle Sprache in der Sprachtechnologie
Sommersemester 2002
2Papers
Parke, F.I. (1982) A Parameterized Model for
Facial Animation. IEEE Computer Graphics and
Applications 2(9), pp. 61-70 Magnenat-Thalmann,
N., E. Primeau and D. Thalmann (1988).
Abstract Muscle Action Procedures for Human Face
Animation. The Visual Computer 3(5), pp. 290-297.
Internet sources
LCE http//www.lce.hut.fi/research/face/index.
html KTH http//www.speech.kth.se/multimodal
PSL http//mambo.ucsc.edu/index.html MIRALab
http//www.miralab.unige.ch
3Outline
- Application Areas
- Geometric Models
- - Introduction
- - Parkes Model
- - Parkes descendants (Talking Heads in action)
- Articulatory Models
- - Introduction
- - Magnenat-Thalmann et al.
- - Rendez-vous à Montréal
4Application Areas for Visual Speech Synthesis
- Basic research on audiovisual speech perception
- Multimedia
- - e.g. synthetic news reader / story teller
- Information systems in public and noisy
environments - - airports
- - train stations
- - shopping centres
- - etc.
- Aids for hearing impaired
- - tool for interactive training of speechreading
- - visualizing tongue positions in speech
training for deaf children - - telephone communication aid with a synthetic
face
5Other Potential Applications (Parke 1982)
- Previews of the effects of corrective surgical
or dental procedures on given faces - - conformation changes or changes in the range
of possible expressions - Data compression for image transmission
- - transmitting facial images simply by sending
the appropriate parameter values - Forensic Art
- - a crime victim could interactively modify
parameters to obtain a 3D approximation - of the face of an assailant
Identi-KIT 2000 Witnesses shown a whole face
within the basic group matching their
description, after which they point out features
that arent quite right
6Geometric (or Parametric) Models
Attempt to reproduce speech signals and facial
deformations in geometrical terms (I.e. not
trying to understand the underlying physiological
mechanisms that produce them)
- 3D structure of the face which can be modified
and deformed by the action of parameters - defined by a set of 3D meshes that describe the
surface geometry of various organs (e.g. skin,
teeth, eyes, etc...) which are generally
involved in speech production - typically a few hundred 3D vertices (and the
polygons they form) are moved by control
parameters on the face (e.g. rotation) - control parameters can control a single point or
more complex articulatory gestures or facial
expressions - facial animation is created by changing the
values of control parameters and redrawing the
face by using the new values - the approach has the advantage of being quite
simple and efficient as it requires low data
storage
7Parkes Model
Parkes 1982 parametric face model is what most
of the present audiovisual speech synthesizers
are based on Mesh of about 800 polygons that
approximate the surface of a human face including
the eyes, the eyebrows, the lips, and the teeth
Two main approaches 1. key frame animation -
a number of facial images are specified as key
frames and a computer algorithm is
used to generate the inbetween
frames 2. parameterized facial
models - an animator creates any facial
image by specifying the appropriate
parameter values ? Parametric
models better for 3D (as the number of key frames
becomes too high)
8Parkes Model (cont.)
- Creation of Parameterized Models
- 1. underlying concept of parameterization and the
development of appropriate parameter sets - - parameter values can be thought of as criteria
values describing or specifying individual - members in any given class
- - a complete set of parameters allows every
member of the class to be specified just by - selecting appropriate parameter values (and
every possible unique member can be described - by a unique n-tuple of parameter values)
- - if certain members of the class cannot be
uniquely described by parameter values the - parameter set is not complete
- e.g. cubes
- 2. Graphic image synthesizers producing images
based on some defined parameterization
PARAMETRIC MODEL PROCEDURES, FUNCTIONS, DATA,
ETC.
GRAPHIC ROUTINES SHADING, RENDERRING, ETC.
GRAPHIC DESCRIPTORS
PARAMETERS
FACIAL IMAGES
MODEL DESIGNER
ANIMATOR
9Parkes Model (cont.)
Developing parameter sets 1. by observing
surface properties of faces and developing ad hoc
sets allowing the observed
characteristics to be specified
parametrically 2. by studying the
underlying structures, or facial anatomy
3. by blending the two parameters based on
structural understanding and
supplemented by parameters based on observation
- Based on Ekmans Facial Action
Coding System (FACS) (uses about 50 facial
actions) Two broad categories of
parameters Expression parameters -
controlling expression or emotional content
e.g. eyes eyelid opening, eybrow position,
looking direction... mouth jaw
rotation, width, expression of smile,
frown... Conformation parameters - controlling
structure of an individual face e.g. color of
skin, eyes, lips, etc., nose, chin, forehead
shape....
10Effects of the Expression Parameters
sadness
surprise
disgust
neutral
happiness
anger
fear
11Parkes Model (cont.)
- The Model
- - parameter set includes both expression and
conformation parameters - - derived from earlier, more general versions
Polygon Topology - facial mask, each
eyeball, and teeth all separate polygons
(connected networks) - the 3D position of
each polygon vertex varies according to the
parameter values, eye orientation, and
face orientation - the polygons are sized
and positioned to match the features of the
face Operations - five kinds of operations
determine vertex positions from the parameter
values 1. Procedural construction models the
eyes 2. Interpolation used for those portions
of the face that change shape 3. Rotation used
to open the mouth 4. Scaling controls the
relative size of facial features 5. Position
offset controls the length of the nose, corners
of the mouth, raising of the upper lip The
advantage of parameterized models to the animator
is that he/she need only manipulate a limited
amount of imformation (the parameters) to create
a sequence of images
12Some of Parkes Descendants
- Contain a number of modifications to Parkes
model to improve it and to make it more suitable
for synthesized speech. - ? usually a set of rules for generating facial
control parameter trajectories from phonetic
text, and a simple tongue model (which were not
included in the original Parke model)
- Finnish Talking Head - Laboratory
of Computational Engineering at Helsinki
University of Technology - Controlled using 49 parameters, of which 12 are
used in visual speech (lip formation and jaw
rotation) - use of eyes and eyebrows
- Future work improve coarticulation modelling
and adding a tongue to the model - - Self-critical drawback the current quality of
the speech synthesizer is far away from the
natural voice
13- KTH Talking Heads - Centre for Speech
Technology at KTH Royal Institute of Technology - Flexible architecture
- ? allows the creation of new characters
either by adopting a static wireframe - model and specifying the required
deformation parameters for that model,
or by sculpting and reshaping an already
parameterized model - Use of eyes, eybrows, and tongue also several
different expressions - - Currently working on improving dynamic
articulation modelling
Kattis
Holger
Sven
Gunnar
Urban
Olga
August
Gustav
14Mr. Smoketoomuch
15 Baldi - USCS Perceptual Science Lab
- Can be aligned with synthetic or natural speech
- Main use as a language tutor for children with
hearing problems - ? help with pronunciation and real-time feedback
- ? children can easily practrice lip reading as
Baldi produces probably the most accurate - automatic generation of visible speech in
the world (Dr. Ron Cole, University of Colorado,
Boulder) - ? in addition the children wear headphones over
an acoustic nerve implant (inserted behind - the ear during a three hour operation) that
converts sound into electrical signals that can
be - relayed to the brain
- - Also used for autistic people and people with
reading disabilities
16Articulatory Models
a.ka. Pseudomuscle-Based Models
- - aim not to exactly simulate the detailed facial
anatomy but to develop models with only a - few control parameters that emulate the basic
face muscle actions - based on abstraction of muscle actions, where
deformation operators define muscle activities - (ignoring the tissue structures and the exact
muscle structure)
Platt Badler (1981) a mass-spring model to
simulate muscles - use Ekmans Facial Action
Coding System (FACS) - based on underlying
facial sructure - facial points simulated in the
skin, the muscles and the bones by 3D networks -
skin defined by a set of 3D points defining a
modifiable surface - bones represent an initial
unmovable level - between the two levels are
muscles as groups of points with elastic arcs
17Articulatory Models (cont.)
N. Magnenat-Thalmann, E. Primeau and D.
Thalmann Abstract Muscle Action procedures (AMA)
- more complex than the single parameter
approach and a general muscle approach - work
on specific regions of the human face (which must
be defined when the face is constructed) - each
AMA procedure approximates the action of a single
muscle or a group of closely related muscles
e.g. the vertical jaw action responsible for
opening the mouth ? the single procedure
composed of several motions (lowering the
corners of the mouth, lowering the lip and parts
of the upper lip and rounding the overall lip
shape) - the order of each action is extremely
important as the AMA procedures are not
independent ? each AMA procedure is
responsible for a facial parameter corresponding
approximately to a muscle - similar
to, but not the same as FACS action units -
Weakness dependence on the order of the actions
as the muscles are not independent of each other
18LIP AMA procedures - human lips very complex
(they may take almost any shape) - the goal of
muscle simulation for the lip control to provide
the illusion of generating the same motion as
human lips without imitating the complexity
?complex lip motions decomposed into several
simple motions (each simpler motion
produced by an AMA procedure) - Vertical Jaw ?
opening the mouth only movable bone in the head
composed of a series of successive small
motions - Close_Upper_Lip and Close_Lower_Lip ?
close the lips when they are open may be
manipulated separately - Left_Lip_Raiser and
Right_Lip_Raiser ? controls the raising of the
upper lip by the lip raiser muscle on the side of
the nose - Compressed_Lip ? muscle used in
kissing - Mouth_Beak ? lips out simularly to a
bird beak - Left_Zygomatic and Right_Zygomatic ?
muscle responsible for smiling raises the
commissure in the vertical direction -
Right_Risorius and Left_Risorius ? pulls
commisure in horizontal direction Two higher
levels in order to improve the user
interface The Expression Level - facial
expressions groups of facial parameter
values 1. Phonemes facial expression
which only uses motion and directly contributes
to speech - combination of several mouth
motions corresponding to specific sounds
useful for speaking - each phoneme
corresponds to a lip motion and tongue position
2. Emotions facial expression acting
on any part of the face (crying, smile,
laughter.)
19The Script Level - a collection of multiple
tracks (a track is a chronological sequence of
key frames) - one track per facial parameter
(or AMA procedure) and two tracks for facial
expressions - a track for a facial parameter
can at any time be modified and mixed with the
facial expression - animation itself performed
by spline interpolation Constraints on human
faces 1. human face assumed to be approximately
symmetric ? only half of the face entered in the
computer ? AMA prcedures assume complete
symmetry and thus results can be strange in
assymmetrical cases 2. AMA procedures
translation independent, but can be
scale-dependent ? parameters of the procedures
may need to be scaled by a factor F when face is
scaled by F 3. Division of the human face into
specific regions (skin, teeth, etc.) ? the
order of the parts is significant
20Script Example Rendez-vous à Montréal
Humphrey Bogart Heres looking at you,
kid. Marilyn Monroe Oh, play it again,
(Sam). Complete version in http//www.miralab.
unige.ch/newMIRA/Multimedia/films/FilmsroomMOVIES.
htm
Animation of human faces based on AMA procedures
part of the HUMAN FACTORY system ? system for
directing synthetic actors in their
environment - complete animation of human body
and complex hand animation
Conclusion The use of AMA procedures creates
fairly realistic results
21(No Transcript)