Title: CAMEO: Year 1 Progress and Year 2 Goals
1CAMEOYear 1 Progress and Year 2 Goals
- Manuela Veloso, Takeo Kanade,
- Fernando de la Torre, Paul Rybski, Brett
Browning, - Raju Patil, Carlos Vallespi, Betsy Ricker
2CAMEO Internals
3CAMEOs Connection to other CALO Agents
CAMEO is an example of a physical event capture
system. Systems such as these transmit state
information about people to the CALO timeline
server.
Individualized CALO agents can access this
information to obtain updates about their
individual users.
4Inferring Meeting State with CAMEO Overview
- CAMEO observes activities of people in meeting
- Raw visual motion is segmented into discrete
actions - High-level meeting state is inferred from the
aggregate actions of the group
5Training CAMEO to Recognize Human Actions
6Action Recognition
Person action sequences are represented as a
simple finite state machine.
Person Action State Machine
State transitions are encoded in a dynamic
Bayesian network which infers the current person
state as a function of observed human activity
and previous state.
Dynamic Baysian Network
7Classification of Person State in a Meeting
Example of person state classification Here,
the states of a person are correctly classified
from the Bayesian network. The parameters of the
activity data are learned from previously-recorded
meeting data.
Standing
Stand
Sitting
Sit
Time in seconds
8Classification of the Meeting State
Global meeting state is defined by the aggregate
activities of every person attending the meeting.
9Generating Meeting Summary
- Meeting event log becomes summary
- Low and high-level events can be organized into a
hierarchy - Meeting can be viewed at any requested level of
detail from summary to captured video (and
eventually audio)
- 2004-02-03 Project Status Report
- 130405 Meeting Start
- 131212 General Discussion
- 131945 Presentation
- 132423 General Discussion
- 132929 Meeting End
10Generating Meeting Summary
- Meeting event log becomes summary
- Low and high-level events can be organized into a
hierarchy - Meeting can be viewed at any requested level of
detail from summary to captured video (and
eventually audio)
- 2004-02-03 Project Status Report
- 130405 Meeting Start
- 131212 General Discussion
- 131945 Presentation
- 131945 Jim stands
- 131950 Jim walks to podium
- 132000 Jim speaks
- 132204 Unknown speaks
- 132245 Jim speaks
- 133023 Wendy stands
- 133037 Wendy walks to podium
- 133042 Wendy speaks
- 133304 Wendy sits down
- 133304 Jim speks
- 133850 Jim sits down
- 134023 General Discussion
- 135029 Meeting End
11Protecting Individuals Privacy Issues
- Recognition is voluntary. CAMEO only recognizes
people it has registered. - We can digitally represent video logs so faces
are distorted or represented only as shapes
Raw video with tracking information
Stored video log after privacy filtering
12Some ways CALO Agents could use CAMEO Data
- What meetings happened when?
- Who was at the meeting?
- Who was sitting, standing, or speaking?
- Where were people looking?
- Who was talking?
- What were people doing?
- Who was pointing at what?
- What happened during the formal presentation?
- What happened during the general discussion?
- What is a general/detailed summary of the
meeting? - What did person 'x' contribute to the meeting?
- How to replay a meeting from a specific point in
time? - How to replay specific parts of the meeting?
13Some ways CALO Agents could use CAMEO Data
- What meetings happened when?
- When a meeting starts, CAMEO can post an event to
the timeline server indicating the start time of
the meeting. By querying the timeline server for
events of the appropriate tag, CALO agents could
determine the starts of the various meetings and
obtain other information about them such as what
it was about.
14Some ways CALO Agents could use CAMEO Data
- Who was at the meeting?
- Face recognition is required. This can be done
by applying various kinds of image matching
algorithms (SVD, template matching, etc...) to
see how close a given face is to a database of
saved faces. A database of saved faces must be
available to work from.
15Some ways CALO Agents could use CAMEO Data
- Who is sitting, standing, or speaking?
- By tracking the positions of people as they move
around, we should be able to tell who is sitting
and who is standing. Depending on how animated
the faces are in that state, we should also be
able to tell who is speaking by how much they're
bobbing around.
16Some ways CALO Agents could use CAMEO Data
- Where are people are looking?
- In order to determine where people are looking, a
profile face detector is needed. In this case,
we should be able to tell which direction they're
looking and correlate this with the other faces
in the image to figure out where in the image
people are likely to be looking
17Some ways CALO Agents could use CAMEO Data
- Who was talking?
- Besides tracking the face movements, audio data
can be recorded by possibly instrument CAMEO or
the meeting attendees with microphones (i.e. Alex
Rudnicky). With multiple microphones in the
room, sound localization techniques would be
required.
18Some ways CALO Agents could use CAMEO Data
- What were people doing?
- Besides the relative positions of peoples bodies
in the room, more detailed information could be
obtained with a full-body tracker. Including
information about the room itself, such as what
else is in the room (tables, whiteboards, or
chairs) would let CAMEO report more detailed
information.
19Some ways CALO Agents could use CAMEO Data
- Who was pointing at what?
- We need to have even more detailed full-body
tracking. By tracking arms and arm positions
with a stereo camera (ie, Trevor Darrell), we
should be able to figure out where the person is
pointing. By putting a stereo head on a panning
mount, a lot of information about the environment
could be obtained very easily. Even by extending
the 2D tracker so that it identifies arms as
being attached to bodies, we might be able to get
this information. However, this is only as good
as long as the person is pointing in a direction
perpendicular to CAMEO. Having two CAMEOs would
be a good way to solve this problem.
20Some ways CALO Agents could use CAMEO Data
- What happened during the formal presentation?
- Information has to be collated and merged in such
a way as the speaker is identified, and
information regarding the speech and powerpoint
presentation is processed (CALO-MMD group).
21Some ways CALO Agents could use CAMEO Data
- What happened during the general discussion?
- Information has to be collated and merged in such
a way as the speakers are identified, and
information regarding the speech is processed
(CALO-MMD group).
22Some ways CALO Agents could use CAMEO Data
- What is a general/detailed summary of the
meeting? - Given a state machine which can be used to
describe the most common things in a meeting, we
could cluster the individual events into larger
states which indicate the various sections of the
meeting based on a generic agenda (intro, formal
presentation, questions, open discussion,
wrap-up), or even a specific agenda that is
provided to CAMEO ahead of time? People print
out agendas and often bring them to formal
meetings so that everyone can follow allong.
23Some ways CALO Agents could use CAMEO Data
- What did person 'x' contribute to the meeting?
- Tracking an individual person's speech and
gestures allows the events posted to the timeline
server to be gathered/clustered into a
personalized kind of state machine that can be
viewed at a very minute level of detail
(individual gestures and actions) or a high level
description such as "person x didn't talk very
much", etc...
24Some ways CALO Agents could use CAMEO Data
- How to replay a meeting from a specific point in
time? - The raw movie files are available. Once the
individual person events are classified, the
timestamps can be extracted from the timeline
server and the video can be replayed from that
location.
25Some ways CALO Agents could use CAMEO Data
- How to replay specific parts of the meeting,
i.e., introductions, discussion after the
presentation, wrap up? - We need to create a probabilistic meeting
ontology that we can use to parse and tag the
meeting identifying parts of the meeting with
different probabilities. We can learn the model
of different types of meetings in terms of
learning the probabilistic parameters of an
ontology or the Bayesian dependencies from types,
people, and meeting purpose, to the format of the
meeting.