Object Labelling from Human Action Recognition - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Object Labelling from Human Action Recognition

Description:

Four ceiling-mounted cameras, 25 fps. Monitor single scene ... Each sequence manually segmented into the four different activities and used for HMM training ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 20

Provided by: patrick226

Category:

more less

Transcript and Presenter's Notes

Title: Object Labelling from Human Action Recognition

1
Object Labelling fromHuman Action Recognition
1st IEEE Conference on Pervasive Computing
and Communications, 2003 Fort Worth, Texas

Contributors Patrick Peursum, Svetha Venkatesh,
Geoff West, Hai Hung Bui (Presenter)
School of Computing, Curtin University of
Technology Perth, Western Australia
2
Introduction

Aim infer object identity from human action
Indirectly recognise an object by detecting the
signature human action needed to use the object
Monitoring human activity in the home has certain
problems and opportunities for this
? Frequent and repeated human activity
? Indoors scenes
? Objects are often directly used, eg appliances
x Wide angle views, cluttered environment
x Scene and object locations change over time

1
3
Objectives

Evidence-based approach to labelling
Label object in a scene based on repeated human
interactions
Accumulation of evidence over time
Flexible and robust to noise, errors
Potentially adaptable to changes in the scene
Independent of objects physical structure
Learn location of chairs and floor areas
Initial study into the potential of the approach

2
4
Related Work

Traditional object recognition
Function-based variant (Stark and Bowyer 1991)
Inherent difficulty in recognition using physical
structure
Human activity / action recognition
Focus is mainly on detecting anomalous activities
(eg surveillance applications)
Human-object interaction recognition
Work on the use of occlusion to estimate object
positions and size (Grimson et al, 1998)
Top-down view of desk scenes, using hand
movements for action recognition (Moore, 1999)

3
5
Method - Overview
4. Scene Labelling
1. Raw Video
3. Activity Segmentation
2. Person Segmentation and Tracking
4
6
Method - Person Segmentation

Raw Video
Four ceiling-mounted cameras, 25 fps
Monitor single scene - overlapping FoVs
Person segmentation and tracking
Gaussian mixture-model background subtraction
(Stauffer et al, 2000) to find person
Bounding box used to outline person
Tracking via Kalman Filter on box centroid
Views calibrated to world coordinate system using
Tsais algorithm (Tsai, 1986)

5
7
Method - Activity Segmentation

HMMs used to segment four activities
Walking, sitting down, seated, standing up
Sitting/Standing Strict Left-Right HMMs (10
states)
Walk/Seated Standard HMMs (5 and 3 states)
Walking ? Floor interaction
Others ? Chair interaction
Training
24 sequences of a person sitting into a chair
Each sequence manually segmented into the four
different activities and used for HMM training
Training Features (from bounding box)
World height (mm), change in height/width,
velocity

6
8
Method - Activity Segmentation (2)

Activity segmentation window
Fixed-size moving window (30 frames)
Window moves forward one frame at a time
Frames within window used to calculate log
likelihoods of all four HMMs
Best HMM taken as activity for window
Best HMM must significantly outperform other HMMs
Minimises short-lived false positives
Last activity re-instated if no significantly
best HMM
Voting between views to elect activity
Activity estimated to begin halfway through window

7
9
Method - Scene Labelling

Objects labelled according to activity
Labelled area depends on activity / object
Sit Chairs labelled using the persons fitted
ellipse
Walk Floors labelled using lower 5 of the
fitted ellipse
Labels are weights that are updated via an
exponential-forgetting function
wLt1(x,y) wLt(x,y) (1 - ?) (? ?)
? 1 if L is detected object, 0
otherwise
w is the weight of Lth label (chair or floor) at
time t, pixel (x,y)
? is the learning rate for label updating
? controls the label to be strengthened

8
10
Method - Use of Occlusion

Partial occlusions used to refine labelling
Person is occluded when walking behind chair
Bounding box used to judge occluded area
Can cause over-estimation of occluded area
Chair labels are erased in unoccluded area
...since occlusion is a strong indicator of chair
bounds
Learning rate for chair labels in area is
retarded by a factor of 4
Feeds occlusion evidence back into labelling
process
Floor labelling is unaffected

9
11
Experiments

Three video sequences (2000 frames each)
Four camera views per sequence
Activity segmentation and label weighting on each
view
Strongest label for each pixel assigned as pixel
label
Threshold then applied to eliminate weak labels
Labelling analysed by overlaying edges
(manually-defined)

Figure 1. Sample labelling with chair and floor
edges
10
12
Camera Views
NW
NE
SW
SE
11
Figure 2. NW, NE, SW and SE Views of Lab
13
Demonstration Video
NW
NE
SW
SE
12
14
Results - Activity Segmentation

Activity segmentation evaluation
Ground truth estimated manually, with an
uncertainty of ?5 frames

Table 1. Error means and variances for activity
segmentation
13
15
Analysis - Activity Segmentation

Sit / Walk segmentation
Highly accurate given uncertainty of ?5 frames
Sit found late, walk found early
Conservatively estimates sitting action -
improves robustness
Seated lost, Stand far too early
Problems are related
End of Sit is misinterpreted as start of
Stand
Can solve with termination probabilities
(Al-Ohali et al, 2002)
Loss of Seated not critical
Later instances of sitting offset loss of evidence

14
16
Results - Scene Labelling

Labelling Accuracy
Chair area includes space between chair legs
Other are all non-chair, non-floor pixels
Chair precision of 49.07 seems quite poor
Floor recall seems low, but is misleading

Table 2. Confusion matrix for labelling (all
image pixels)
15
17
Analysis - Scene Labelling

Table ignores unseen pixels (ie Other)
Chair precision better, but still low
Not unexpected use of fitted ellipse
over-labelling
Occlusion helps, but not many instances of
occlusion
Floor recall much higher (93.6, up from 66.7)
Not all floor area visited, so high Other
misclassifications

Table 3. Confusion matrix for labelling (labelled
pixels only)
16
18
Conclusions and Future Work

Action-based approach to object labelling
Advantage of evidence accumulation
Robust to noise false positives have minimal
impact
No use of background image information
Accuracy would be improved with inclusion of
image information (eg regions) as secondary
evidence.
Must increase variation in objects, situations
Will require addressing limitations, including
Finer measurements of human to separate subtler
actions
More information on object labels (eg object
position)
Experiment with shifting objects around

17
19
References

Y. Al-Ohali, M. Cheriet and C. Suen. Introducing
termination probabilities to HMM. ICPR 2002
W. Grimson, C. Stauffer, R. Romano and L. Lee.
Using adaptive tracking to classify and monitor
activities in a site. CVPR 1998, pages 22-29
D. Moore, I. Essa and M.Hayes. Exploiting Human
Actions and Object Context for Recognition Tasks.
ICCV 1999
L. Stark and K. Bowyer. Achieving generalized
object recognition through reasoning about
association of function to structure. Pattern
Analysis and Machine Intelligence,
3(8)1097-1104, October 1991
C. Stauffer and W. Grimson. Learning patterns of
activity using real-time tracking. Pattern
Analysis and Machine Intelligence, 22(8)747-757,
August 2000
R. Tsai. An efficient and accurate camera
calibration technique for 3D machine vision. CVPR
1986, pages 364-374