Title: Monitoring Human Behavior in an Office Environment
1Monitoring Human Behavior in an Office Environment
Douglas Ayers and Mubarak Shah Research
Conducted at the University of Central
Florida Now at Harris Corporation
2Goals of the System
- Recognize human actions in a room for which prior
knowledge is available. - Handle multiple people
- Provide a textual description of each action
- Extract key frames for each action
- Recognize all actions simultaneously for a given
field of view
3Possible Actions
- Enter
- Leave
- Sitting or Standing
- Picking Up Object
- Put Down Object
- Remove Object
- Open or Close
- Use Terminal
4Prior Knowledge
- Spatial layout of the scene
- Location of entrances and exits
- Location of objects and some information about
how they are use - Context can then be used to improve recognition
and save computation
5Layout of Scene 1
6Layout of Scene 2
7Layout of Scene 3
8Layout of Scene 4
9Major Components
- Skin Detection
- Tracking
- Scene Change Detection
- Action Recognition
10Flow of the System
Skin Detection
Track people and Objects for this Frame
Determine Possible Interactions Between People
and Objects
Scene Change Detection
Update States, Output Text, Output Key Frames
11Skin Detection
- Uses a modified version of Kjeldsen and Kender
method - Uses Y'CbCr color space instead of HSV
- 3D Histogram based approach
- Training phase and detection phase
- Tracking initialized with a window containing the
largest connected region
12Tracking
- Tracking is done on heads and movable objects
- Code Provided by Fieguth and Terzopoulos
- Uses Y'CbCr colorspace
- Tracks multiple people and objects
- Simple non-linear estimator
- Goodness of Fit Equation
13Scene Change Detection
- Similar to method used by Sandy Pentlands group
in Pfinder - Uses brightness normalized chromiance
- Takes a .5 sec clip to build model
- Determines change based on Mahalonobis distance
14Possible Actions
- Enter significant skin region detected for
several frames - Leave tracking box is close to edge of
entrance area - Sitting or Standing significant increase or
decrease in tracking boxs y-component - Picking Up Object after the person has moved
away from the objects initial position, a scene
change is detected
15Possible Actions
- Put Down Object person moves away from the
object - Remove Object object close to edge of entrance
area - Open or Close increase or decrease in the
amount of scene change - Use Terminal person is sitting and a scene
change in mouse, but not behind it
16State Model For Action Recognition
Start
Enter
Sit
Leave
End
Standing
Sitting
Stand
Stand / 0
Sit / 0
Near Phone
Near Cabinet
Near Terminal
Pick Up Phone
Open / Close Cabinet
Use Terminal
Talking on Phone
Opening/Closing Cabinet
Put Down Phone
Using Terminal
Hanging Up Phone
17Key Frames
- Why get key frames?
- Key frames take less space to store
- Key frames take less time to transmit
- Key frames can be viewed more quickly
- We use heuristics to determine when key frames
are taken - Some are taken before the action occurs
- Some are taken after the action occurs
18Key Frames
- Enter key frames as the person leaves the
entrance/exit area - Leave key frames as the person enters the
entrance/exit area - Standing/Sitting key frames after the tracking
box has stopped moving up or down respectively - Open/Close key frames when the of changed
pixels stabilizes
19Demo Sequences
- Two people opening cabinet, closing cabinet,
sitting and standing - One person picking up phone and putting down
phone - One person moving file folder
- One person sitting and using a terminal
20Sequence 1
21Key Frames Sequence 1 (350 frames), Part 1
22Key Frames Sequence 1 (350 frames), Part 2
23Sequence 2
24Key Frames Sequence 2 (200 frames)
25Sequence 3
26Key Frames Sequence 3 (200 frames)
27Sequence 4
28Key Frames Sequence 4 (399 frames), Part 1
29Key Frames Sequence 4 (399 frames), Part 2
30Sequence 5
31Sequence 6
32Summary
- Successfully recognizes a set of human actions
- Uses prior knowledge about scene layout
- Handles multiple people simultaneously
- Generates key frames and textual descriptions
33Limitations
- Limited by
- Skin detection
- Tracking
- Scene change detection
- Some actions difficult to model
- Limited by prior knowledge
- Limited by field of view
34Future Work
- Increase action vocabulary
- Determine peoples identity
- Improve field of view
- Wide angle lens
- Multiple cameras
- Implement in real-time
35(No Transcript)