modeling individual and group actions in meetings with layered HMMs

About This Presentation

Title:

modeling individual and group actions in meetings with layered HMMs

Description:

modeling individual and group actions in meetings. with layered HMMs ... 3 CCTV cameras. all synchronized. multimodal feature extraction: audio. microphone array ... –

Number of Views:19

Avg rating:3.0/5.0

Slides: 23

Provided by: Sch45

Category:

more less

Transcript and Presenter's Notes

Title: modeling individual and group actions in meetings with layered HMMs

1
modeling individual and group actions in meetings
with layered HMMs
dong zhang, daniel gatica-perez samy bengio, iain
mccowan, guillaume lathoud idiap research
institute martigny, switzerland
2
meetings as sequences of actions

human interaction
similar/complementary roles
individuals constrained by group

agenda prior sequence
discussion points
presentations
decisions to be made

minutes posterior sequence
key phases
summarized discussions
decisions made

3
the goal recognizing sequences of meeting
actions
Timeline
Presentation
Group Discussion
Discussion Phase
Whether
Budget
Topic
High
High
Neutral
Group Interest Level
Information Sharing
Decision Making
Group Task
group-level actions meeting actions
meeting views
4
our work two-layer HMMs

decompose the recognition problem
both layers use HMMs
individual action layer I-HMM various models
group action layer G-HMM

5
our work in detail

definition of meeting actions
audio-visual observations
action recognition
results

D. Zhang et al, Modeling Individual and Group
Actions in Meetings with Layered HMMs, IEEE
CVPR Workshop on Event Mining, 2004.
I. McCowan et al, ICASSP 2003, PAMI 2005.
N. Oliver et al, ICMI 2002.
6
1. defining meeting actions

multiple parallel views
tech-based what we can recognize?
application-based respond to user needs
psychology-based coding schemes from social
psychology

7
multi-modal turn-taking

describes the group discussion state
A discussion,
monologue (x4),
white-board,
presentation,
note-taking,
monologue note-taking (x4),
white-board note-taking,
presentation note-taking
individual actions
I speaking,
writing,
idle
actions are multi-modal in nature

8
example
W
Person 2
W
S
W
W
Person 3
W
S
S
W
Presentation
Used
Whiteboard
Used
Monologue1 Note-taking
Group Action
Discussion
Presentation Note-taking
Whiteboard Note-taking
9
2. audio-visual observations

audio
12 channels, 48 kHz
4 lapel microphones
1 microphone array
video
3 CCTV cameras
all synchronized

10
multimodal feature extraction audio

microphone array
speech activity (SRP-PHAT)
seats
presentation/whiteboard area
speech/silence segmentation
lapel microphones
speech pitch
speech energy
speaking rate

11
multimodal feature extraction video

head hands blobs
skin colour models (GMM)
head position
hands position features (eccentricity,size,orien
tation)
head hands blob motion
moving blobs from background subtraction

12
3. recognition with two-layer HMM

each layer trained independently
trained as in ASR (Torch)
simultaneous segmentation and recognition

13
models for I-HMM

early integration
all observations concatenated
correlation between streams
frame-synchronous streams

multi-stream (Dupont, TMM 2000)
HMM per stream (a or v), trained independently
decoding weighted likelihoods combined at each
frame
little inter-stream asynchrony
multi-band and a-v ASR

asynchronous (Bengio, NIPS 2002)
a and v streams with single state sequence
states emit on one or both streams, given a sync
variable
inter-stream asynchrony

14
linking the two layers

hard decision
i-action model with highest probability outputs
1 all other models output 0.
soft decision
outputs probability for each individual action
model

HD (1, 0, 0) SD (0.9, 0.05, 0.05)
Audio-visual features
15
4. experiments data setup

59 meetings (30/29 train/test)
four-people, five-minute
scripts
schedule of actions
natural behavior
features 5 f/s

mmm.idiap.ch
16
performance measures

individual actions frame error rate (FER)
group actions action error rate (AER)

Subs number of substituted actions
Del number of deleted actions
Ins number of added actions
Total actions number of target actions

17
results individual actions
43000 frames
(0.8,0.2)
(0.2-2.2s)
18
results group actions

multi-modality outperforms single modalities
two-layer HMM outperforms single-layer HMM for
a-only, v-only and a-v
best model A-HMM
soft decision slightly better than hard decision

19
action-based meeting structuring
20
conclusions

structuring meetings as sequences of meeting
actions
layered HMMs successful for recognition
turn-taking patterns useful for browsing
public dataset, standard evaluation procedures

open issues
less training data (unsupervised, acm mm04)
other relevant actions (interest-level, icassp05)
other features (words, emotions)
efficient models for many interacting streams

21
Linking Two Layers (1)
22
Linking Two Layers (2)
Please refer to D. Zhang, et al Modeling
Individual and Group Actions in Meetings a
Two-Layer HMM Framework. In IEEE Workshop on
Event Mining, CVPR, 2004 .

Write a Comment

User Comments (0)