Face Animation Overview with Shameless Bias Toward MPEG4 Face Animation Tools

1 / 52

About This Presentation

Title:

Face Animation Overview with Shameless Bias Toward MPEG4 Face Animation Tools

Description:

FAPs (and BAPs) can be used without MPEG-4 systems/BIFS ... Body Animation Parameters (BAPs) are humanoid skeleton joint Euler angles ... –

Number of Views:108

Avg rating:3.0/5.0

Slides: 53

Provided by: ericpe1

Category:

more less

Transcript and Presenter's Notes

Title: Face Animation Overview with Shameless Bias Toward MPEG4 Face Animation Tools

1
Face Animation Overview with Shameless Bias
Toward MPEG-4 Face Animation Tools
Dr. Eric Petajan Chief Scientist and
Founder face2face animation, inc. eric_at_f2f-inc.com
2
Computer-generated Face Animation Methods

Morph targets/key frames (traditional)
Speech articulation model (TTS)
Facial Action Coding System (FACS)
Physics-based (skin and muscle models)
Marker-based (dots glued to face)
Video-based (surface features)

3
Morph targets/key frames

Advantages
Complete manual control of each frame
Good for exaggerated expressions
Disadvantages
Hard to achieve good lipsync without manual
tweeking
Morph targets must be downloaded to terminal for
streaming animation (delay)

4
Speech articulation model

Advantages
High level control of face
Enables TTS
Disadvantages
Robotic character
Hard to sync with real voice

5
Facial Action Coding System

Advantages
Very high level control of face
Maps to morph targets
Explicit specification of emotional states
Disadvantages
Not good for speech
Not quantified

6
Physics-based

Advantages
Good for realistic skin, muscle and fat
Collision detection
Disadvantages
High complexity
Must be driven by high level articulation
parameters (TTS)
Hard to drive with motion capture data

7
Marker-based

Advantages
Can provide accurate motion data from most of the
face
Face models can be animated directly from surface
feature point motion
Disadvantages
Dots glued to face
Dots must be manually registered
Not good for accurate inner lip contour or eyelid
tracking

8
Video-based

Advantages
Simple to capture video of face
Face models can be animated directly from surface
feature motion
Disadvantages
Must have good view of face

9
What is MPEG-4 Multimedia?

Natural audio and video objects
2D and 3D graphics (based on VRML)
Animation (virtual humans)
Synthetic speech and audio

10
Samples versus Objects

Traditional video coding is sample based (blocks
of pixels are compressed)
MPEG-4 provides visual object representation for
better compression and new functionalities
Objects are rendered in the terminal after
decoding object descriptors

11
Object-based Functionalities

User can choose display of content layers
Individual objects (text, models) can be searched
or stored for later used
Content is independent of display resolution
Content can be easily repurposed by provider for
different networks and users

12
MPEG-4 Object Composition

Objects are organized in a scene graph
Scene graphs are specified using a binary format
called BIFS (based on VRML)
Both 2D and 3D objects, properties and transforms
are specified in BIFS
BIFS allows objects to be transmitted once and
instanced repeatedly in the scene after
transformations

13
MPEG-4 Operation Sequence
14
(No Transcript)
15
Faces are Special

Humans are hard-wired to respond to faces
The face is the primary communication interface
Human faces can be automatically analyzed and
parameterized for a wide variety of applications

16
MPEG-4 Face and Body Animation Coding

Face animation is in MPEG-4 version 1
Body animation is in MPEG-4 version 2
Face animation parameters displace feature points
from neutral position
Body animation parameters are joint angles
Face and body animation parameter sequences are
compressed to low bitrates

17
Neutral Face Definition

Head axes parallel to the world axes
Gaze is in direction of Z axis
Eyelids tangent to the iris
Pupil diameter is one third of iris diameter
Mouth is closed and the upper and lower teeth are
touching
Tongue is flat, horizontal with the tip of tongue
touching the boundary between upper and lower
teeth

18
Face Feature Points
Teeth
Feature points affected by FAPs
Other feature points
19
Face Animation Parameter Normalization

Face Animation Parameters (FAPs) are normalized
to facial dimensions
Each FAP is measured as a fraction of neutral
face mouth width, mouth-nose distance, eye
separation, or iris diameter
3 Head and 2 eyeball rotation FAPs are Euler
angles

20
Neutral Face Dimensions for FAP Normalization
21
FAP Groups
22
Lip FAPs

Mouth closed if sum of upper and lower lip FAPs
0

23
Face Model Independence

FAPs are always normalized for model independence
FAPs (and BAPs) can be used without MPEG-4
systems/BIFS
Private face models can be accurately animated
with FAPs
Face models can be simple or complex depending on
terminal resources

24
MPEG-4 BIFS Face Node

Face node contains FAP node, Face scene graph,
Face Definition Parameters (FDP), FIT,and FAT
FIT (Face Interpolation Table) specifies
interpolation of FAPs in terminal
FAT (Face Animation Table) maps FAPs to Face
model deformation
FDP information included face feature points
positions and texture map

25
Face Model Download

3D graphical models (e.g. faces) can be
downloaded to the terminal with MPEG-4
3D model specification is based on VRML
Face Animation Table( FAT) maps FAPs to face
model vertex displacements
Appearance and animation of downloaded face
models is exactly predictable

26
FAP Compression

FAPs are adaptively quantized to desired quality
level
Quantized FAPs are differentially coded
Adaptive arithmetic coding further reduces
bitrate
Typical compressed FAP bitrate is less than 2
kilobits/second

27
FAP Predictive Coding

FAP(t)
Q
-
Bitstream
Arithmetic Coder
Frame Delay
Q-1
28
Face Analysis System

MPEG-4 does not specify analysis systems
face2face face analysis system tracks nostrils
for robust operation
Inner lip contour estimated using adaptive color
thresholding and lip modeling
Eyelids, eyebrows and gaze direction

29
Nostril Tracking
30
Inner Lip Contour Estimation
31
FAP Estimation Algorithm

Head scale is normalized based on neutral mouth
(closed mouth) width
Head pitch is approximated based on vertical
nostril deviation from neutral head position
Head roll is computed from smoothed eye or
nostril orientation depending on availability
Inner lip FAPs are measured directly from the
inner lips contour as deviations from the neutral
lip position (closed mouth)

32
FAP Sequence Smoothing
33
MPEG-4 Visemes and Expressions

A weighted combination of 2 visemes and 2 facial
expressions for each frame
Decoder is free to interpret effect of visemes
and expressions after FAPs are applied
Definitions of visemes and expressions using FAPs
can also be downloaded

34
Visemes
35
Facial Expressions
36
Free Face Model Software

Wireface is an openGL-based, MPEG-4 compliant
face model
Good starting point for building high quality
face models for web applications
Reads FAP file and raw audio file
Renders face and audio in real time
Wireface source is freely available

37
Body Animation

Harmonized with VRML Hanim spec
Body Animation Parameters (BAPs) are humanoid
skeleton joint Euler angles
Body Animation Table (BAT) can be downloaded to
map BAPs to skin deformation
BAPs can be highly compressed for streaming

38
Body Animation Parameters (BAPs)

186 humanoid skeleton euler angles
110 free parameters for use with downloaded body
surface mesh
Coded using same codecs as FAPs
Typical bitrates for coded BAPs is 5-10kbps

39
Body Definition Parameters (BDPs)

Humanoid joint center positions
Names and hierarchy harmonized with VRML/Web3D
H-Anim working group
Default positions in standard for broadcast
applications
Download just BDPs to accurately animate unknown
body model

40
Faces Enhance the User Experience

Virtual call center agents
News readers (e.g. Ananova)
Story tellers for the child in all of us
eLearning
Program guide
Multilingual (same face different voice)
Entertainment animation
Multiplayer games

41
Visual Content for the Practical Internet

Broadband deployment is happening slowly
DSL availability is limited and cable is shared
Talking heads need high frame-rate
Consumer graphics hardware is cheap and powerful
MPEG-4 SNHC/FBA tools are matched to available
bandwidth and terminals

42
Visual Speech Processing

FAPs can be used to improve speech recognition
accuracy
Text-to-speech systems can use FAPs to animate
face models
FAPs can be used in computer-human dialogue
systems to communicate emotions, intentions and
speech especially in noisy environments

43
Video-driven Face Animation

Facial expressions, lip movements and head motion
transferred to face model
FAPs extracted from talking head video with
special computer vision system
No face markers or lipstick is required
Normal lighting is used
Communicates lip movements and facial expressions
with visual anonymity

44
Automatic Face Animation Demonstration

FAPs extracted from camcorder video
FAPs compressed to less than 2 kbits/sec
30 frames/sec animation generated automatically
Face models animated with bones rig or fixed
deformable mesh (real-time)

45
(No Transcript)
46
What is easy, solved, or almost solved

Can we do photorealistic non-animated face
models? YES
Can we do near-real-time lip sync'ing that is
indistinguishable from a human? NO

47
What is really hard

Synthesizing human speech and facial expressions
Hair

48
What we have assumed someone else is solving

Graphics acceleration
Video camera cost and resolution
Multimedia communication infrastructure

49
Where we need help

We have a face with 68 parameters but we need
the psychologists to tell us how to drive it
autonomously
We need to embody our agents into graphical
models that have a couple of thousand parameters
to control gaze, gesture, body language, and do
collision detection-gt NEED MORE SPEED

50
Core functionality of the face