Title: Animating Virtual Humans in Intelligent Multimedia Storytelling
1Animating Virtual Humans in Intelligent
Multimedia Storytelling
- Minhua Eunice Ma and Paul Mc Kevitt
- School of Computing and Intelligent Systems
- Faculty of Engineering
- University of Ulster, Magee
- Derry, Northern Ireland
2Outline
- State-of-the-art virtual human animation
standards - VRML/X3D MPEG-4 for object modelling
- H-Anim MPEG-4 SNHC for humanoid modelling
- VHML STEP for human animation modelling
- Natural language to 3D animation
- Language visualisation (animation) in intelligent
multimodal storytelling system, CONFUCIUS - Humanoid animation in CONFUCIUS
- Multiple animation channels
- Space sites of virtual humans
- Virtual object manipulation
- Conclusion future work
3Four levels of virtual human representation
Current virtual human representation languages
can be classified to four groups according to the
levels of abstraction, starting from 3D geometry
modelling to language animation.
high level animation
Level 4 Natural language to animation
CONFUCIUS
AnimNL
Level 3 Human animation modelling
VHML (BAML) XML-based
STEP script-based
Level 2 3D human modelling
MPEG-4 SNHC
H-Anim
Level 1 3D object modelling
low level animation
VRML (X3D)
MPEG-4
4Level 1 3D object modelling
high level animation
Level 4 Natural language to animation
CONFUCIUS
AnimNL
Level 3 Human animation modelling
VHML (BAML) XML-based
STEP script-based
Level 2 3D human modelling
MPEG-4 SNHC
H-Anim
Level 1 3D object modelling
low level animation
VRML (X3D)
MPEG-4
- VRML (Virtual Reality Modelling Language) is a
hierarchical scene description language that
defines the geometry and behaviour of a 3D scene.
X3D is the successor to VRML. - MPEG-4 uses BIFS (Binary Format for Scenes) for
real-time streaming. BIFS borrows many concepts
from VRML. BIFS and VRML can be seen as different
representations of the same data.
5Level 2 3D human modelling
high level animation
Level 4 Natural language to animation
AnimNL
CONFUCIUS
Level 3 Human animation modelling
VHML (BAML) XML-based
STEP script-based
Level 2 3D human modelling
MPEG-4 SNHC
H-Anim
Level 1 3D object modelling
low level animation
VRML (X3D)
MPEG-4
- H-Anim is a stardard VRML97 representation for
humanoids. It defines standard human Joints
articulation, segments dimensions, and sites for
end effector and attachment points for
clothing. - MPEG-4 SNHC (Synthetic/Natural Hybrid Coding)
incorporates H-Anim and provides an efficient way
to animate virtual human and tools for the
efficient compression of the animation parameters
associated with the H-Anim human model.
6 H-Anim joint-segment hierarchy
- An H-Anim file contains a joint-segment
hierarchy. - Each joint node may contain other joint nodes and
a segment node that describes the body part
associated with the joint. - Each segment is a normal VRML transform node
describing the body part's geometry and texture. - H-Anim humanoids can be animated using
keyframing, inverse kinematics, other animation
techniques.
7H-Anim models on the Web
Virtual human models Nancy1 Baxter, Nana2 Y.T., Hiro3 Dilbert3 Max3 Jake3 Dork4
Authors Cindy Ballreich Christian Babski Matt Beitler Matt Beitler Matt Beitler Matt Beitler Michael Miller
1http//www.ballreich.net/vrml/h-anim/nancy_h-anim
.wrl 2http//ligwww.epfl.ch/babski/StandardBody 3
http//www.cis.upenn.edu/beitler/H-Anim/Models/H-
Anim1.1/ 4http//students.cs.tamu.edu/mmiller/hani
m/proto/dork-proto.wrl
URLs
8Level 3 Human animation modelling
high level animation
Level 4 Natural language to animation
AnimNL
CONFUCIUS
Level 3 Human animation modelling
VHML (BAML) XML-based
STEP script-based
Level 2 3D human modelling
MPEG-4 SNHC
H-Anim
Level 1 3D object modelling
low level animation
VRML (X3D)
MPEG-4
- VHML (Virtual Human Mark-up Language) is an
XML-based language which provides an intuitive
way to define virtual human animation. It is
composed of several sub-languages DMML, FAML,
BAML, SML, and EML. - STEP is a scripting language for human actions.
It has a Prolog-like syntax, which makes it
compatible with most standard logic programming
languages.
9VHML STEP examples
- ltleft-calf-flex amountmediumgt
- ltright-calf-flex amountmediumgt
- ltleft-arm-front amountmedium"gt
- ltright-arm-front amountmedium"gt
- Standing on my knees I beg you pardon
- lt/right-arm-frontgtlt/left-arm-frontgt
- lt/right-calf-flexgtlt/left-calf-flexgt
- A VHML example
- script(walk_forward_step(Agent),ActionList)-
- ActionListparallel( script_action(
- walk_pose(Agent),
- move(Agent,front,fast)
- ).
- B. A STEP example
10Level 4 Natural language to animation
high level animation
Level 4 Natural language to animation
AnimNL
CONFUCIUS
Level 3 Human animation modelling
VHML (BAML) XML-based
STEP script-based
Level 2 3D human modelling
MPEG-4 SNHC
H-Anim
Level 1 3D object modelling
low level animation
VRML (X3D)
MPEG-4
- High level animation applications converting
natural language to virtual human animation.
Little research on virtual human animation
focuses on this level. - The AnimNL project aims to enable people to use
natural language instructions to tell virtual
humans what to do - CONFUCIUS also deals with language animation
- Research on this level will lead to powerful
web-based applications
11Architecture of CONFUCIUS
Natural language sentences
Knowledge base
Surface transformer
Language knowledge (WordNet, LCS database, FDG
parser)
Natural Language Processing
3D authoring tools existing 3D models virtual
human models
Media allocator
mapping
semantic representation
Text-to-Speech
Animation engine (with nonspeech audio)
Presentation agent (Merlin the Narrator)
Visual/audio knowledge (3D models animations,
audio encapsulated in graphic models)
Synchronizing
3D virtual world with speech in VRML
Narration integration
Multimodal presentation
12Humanoid animation in CONFUCIUS
Semantic Representation
Y
match basic motions in library?
If the event predicate matches basic human
motions in animation library
N
Either loading a precreated keyframe animation or
providing animation specification for animation
generation
User interaction
animation controller
Motion instantiation
Apply spatial info place OBJ/HUMAN into a
specified environment
environment placement
Automatic camera placement apply cinematic rules
Camera controller
VRML file of the virtual story world
13Multiple animation channels
- 3rd level human animation modeling languages
(VHML, STEP) provide a facility to specify both
sequential and parallel temporal relations - Simultaneous animations cause the Dining
Philosopher's problem for higher level animation
using predefined animation data (multiple
animations may request to access same body parts
at the same time) - Multiple animation channels allow characters to
run multiple animations at the same time, e.g.
walking with the lower body while waving with the
upper body - Multiple animation channels often disable one
channel when a specific animation is playing on
another channel to avoid conflicts with another
animation
Involved joints /Animations sacroiliac l_hip r_hip r_shoulder
walk 2 2 2 1
jump 2 2 2 1
wave 0 0 0 2
run 2 2 2 1
scratch head 0 0 0 2
sit 2 2 2 1
14Space sites of virtual humans
- Types of virtual objects
- Small props, manipulated by hands or feet, e.g.
cup, hat, ball - Big props, source or targets of actions, e.g.
table, chair, tree - Stage props have internal structure, e.g. house,
restaurant, chapel - Site tags of virtual humans
- Manipulating small props, 6 sites on hands (three
sites for each hand), one site on head
(skull_tip), one site for each foot tip - For big props placement, 5 sites indicating five
directions around the human body x_front,
x_back, x_left, x_right, x_bottom. Big props like
a table or chairs usually placed at these
positions. - For stage props setting, 5 more space tags
indicating further places far_front, far_back,
far_left, far_right, far_top. Stage props (e.g. a
house) often locate at these far sites.
grip, pincer grip pushing pointing
15Virtual object manipulation
Two approaches to organize knowledge required for
successful grasping
4 stored hand postures for interacting with 3D
objects
- Store applicable objects in the animation file of
an action and using lexical knowledge of nouns to
infer hypernymy relations between objects - Including the manipulation hand postures and
movements within the object description, besides
its intrinsic object properties. These objects
have the ability to describe in details their
functionality and their possible interactions
with virtual humans.
grip (hold cup handle, knob, a bottle)
index pointing (press a button)
pincer grip (use thumb and index finger to pick
up small objects)
palm push (push a piece of furniture)
16Conclusion
- Classified virtual human representation languages
into four levels of abstraction - CONFUCIUS is an overall framework of intelligent
multimedia storytelling, using 3D
modelling/animation techniques with natural
language understanding technologies to achieve
higher level virtual human animation - A number of projects are currently based on
virtual human animation, working on various
application domains. Few of them takes modern NLP
approach that a high level human animation system
should be based on. - The value of CONFUCIUS lies in generation of 3D
animation from natural language by automating the
processes of language parsing, semantic
representation and animation production. - Potential application areas computer games,
animation production and direction, multimedia
presentation, shared virtual worlds - Future work coordination synchronization of
multiple virtual humans