Object Labelling from Human Action Recognition - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Object Labelling from Human Action Recognition

Description:

Four ceiling-mounted cameras, 25 fps. Monitor single scene ... Each sequence manually segmented into the four different activities and used for HMM training ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 20
Provided by: patrick226
Category:

less

Transcript and Presenter's Notes

Title: Object Labelling from Human Action Recognition


1
Object Labelling fromHuman Action Recognition
1st IEEE Conference on Pervasive Computing
and Communications, 2003 Fort Worth, Texas

Contributors Patrick Peursum, Svetha Venkatesh,
Geoff West, Hai Hung Bui (Presenter)
School of Computing, Curtin University of
Technology Perth, Western Australia
2
Introduction
  • Aim infer object identity from human action
  • Indirectly recognise an object by detecting the
    signature human action needed to use the object
  • Monitoring human activity in the home has certain
    problems and opportunities for this
  • ? Frequent and repeated human activity
  • ? Indoors scenes
  • ? Objects are often directly used, eg appliances
  • x Wide angle views, cluttered environment
  • x Scene and object locations change over time

1
3
Objectives
  • Evidence-based approach to labelling
  • Label object in a scene based on repeated human
    interactions
  • Accumulation of evidence over time
  • Flexible and robust to noise, errors
  • Potentially adaptable to changes in the scene
  • Independent of objects physical structure
  • Learn location of chairs and floor areas
  • Initial study into the potential of the approach

2
4
Related Work
  • Traditional object recognition
  • Function-based variant (Stark and Bowyer 1991)
  • Inherent difficulty in recognition using physical
    structure
  • Human activity / action recognition
  • Focus is mainly on detecting anomalous activities
    (eg surveillance applications)
  • Human-object interaction recognition
  • Work on the use of occlusion to estimate object
    positions and size (Grimson et al, 1998)
  • Top-down view of desk scenes, using hand
    movements for action recognition (Moore, 1999)

3
5
Method - Overview
4. Scene Labelling
1. Raw Video
3. Activity Segmentation
2. Person Segmentation and Tracking
4
6
Method - Person Segmentation
  • Raw Video
  • Four ceiling-mounted cameras, 25 fps
  • Monitor single scene - overlapping FoVs
  • Person segmentation and tracking
  • Gaussian mixture-model background subtraction
    (Stauffer et al, 2000) to find person
  • Bounding box used to outline person
  • Tracking via Kalman Filter on box centroid
  • Views calibrated to world coordinate system using
    Tsais algorithm (Tsai, 1986)

5
7
Method - Activity Segmentation
  • HMMs used to segment four activities
  • Walking, sitting down, seated, standing up
  • Sitting/Standing Strict Left-Right HMMs (10
    states)
  • Walk/Seated Standard HMMs (5 and 3 states)
  • Walking ? Floor interaction
  • Others ? Chair interaction
  • Training
  • 24 sequences of a person sitting into a chair
  • Each sequence manually segmented into the four
    different activities and used for HMM training
  • Training Features (from bounding box)
  • World height (mm), change in height/width,
    velocity

6
8
Method - Activity Segmentation (2)
  • Activity segmentation window
  • Fixed-size moving window (30 frames)
  • Window moves forward one frame at a time
  • Frames within window used to calculate log
    likelihoods of all four HMMs
  • Best HMM taken as activity for window
  • Best HMM must significantly outperform other HMMs
  • Minimises short-lived false positives
  • Last activity re-instated if no significantly
    best HMM
  • Voting between views to elect activity
  • Activity estimated to begin halfway through window

7
9
Method - Scene Labelling
  • Objects labelled according to activity
  • Labelled area depends on activity / object
  • Sit Chairs labelled using the persons fitted
    ellipse
  • Walk Floors labelled using lower 5 of the
    fitted ellipse
  • Labels are weights that are updated via an
    exponential-forgetting function
  • wLt1(x,y) wLt(x,y) (1 - ?) (? ?)
  • ? 1 if L is detected object, 0
    otherwise
  • w is the weight of Lth label (chair or floor) at
    time t, pixel (x,y)
  • ? is the learning rate for label updating
  • ? controls the label to be strengthened

8
10
Method - Use of Occlusion
  • Partial occlusions used to refine labelling
  • Person is occluded when walking behind chair
  • Bounding box used to judge occluded area
  • Can cause over-estimation of occluded area
  • Chair labels are erased in unoccluded area
  • ...since occlusion is a strong indicator of chair
    bounds
  • Learning rate for chair labels in area is
    retarded by a factor of 4
  • Feeds occlusion evidence back into labelling
    process
  • Floor labelling is unaffected

9
11
Experiments
  • Three video sequences (2000 frames each)
  • Four camera views per sequence
  • Activity segmentation and label weighting on each
    view
  • Strongest label for each pixel assigned as pixel
    label
  • Threshold then applied to eliminate weak labels
  • Labelling analysed by overlaying edges
    (manually-defined)

Figure 1. Sample labelling with chair and floor
edges
10
12
Camera Views
NW
NE
SW
SE
11
Figure 2. NW, NE, SW and SE Views of Lab
13
Demonstration Video
NW
NE
SW
SE
12
14
Results - Activity Segmentation
  • Activity segmentation evaluation
  • Ground truth estimated manually, with an
    uncertainty of ?5 frames

Table 1. Error means and variances for activity
segmentation
13
15
Analysis - Activity Segmentation
  • Sit / Walk segmentation
  • Highly accurate given uncertainty of ?5 frames
  • Sit found late, walk found early
  • Conservatively estimates sitting action -
    improves robustness
  • Seated lost, Stand far too early
  • Problems are related
  • End of Sit is misinterpreted as start of
    Stand
  • Can solve with termination probabilities
    (Al-Ohali et al, 2002)
  • Loss of Seated not critical
  • Later instances of sitting offset loss of evidence

14
16
Results - Scene Labelling
  • Labelling Accuracy
  • Chair area includes space between chair legs
  • Other are all non-chair, non-floor pixels
  • Chair precision of 49.07 seems quite poor
  • Floor recall seems low, but is misleading

Table 2. Confusion matrix for labelling (all
image pixels)
15
17
Analysis - Scene Labelling
  • Table ignores unseen pixels (ie Other)
  • Chair precision better, but still low
  • Not unexpected use of fitted ellipse
    over-labelling
  • Occlusion helps, but not many instances of
    occlusion
  • Floor recall much higher (93.6, up from 66.7)
  • Not all floor area visited, so high Other
    misclassifications

Table 3. Confusion matrix for labelling (labelled
pixels only)
16
18
Conclusions and Future Work
  • Action-based approach to object labelling
  • Advantage of evidence accumulation
  • Robust to noise false positives have minimal
    impact
  • No use of background image information
  • Accuracy would be improved with inclusion of
    image information (eg regions) as secondary
    evidence.
  • Must increase variation in objects, situations
  • Will require addressing limitations, including
  • Finer measurements of human to separate subtler
    actions
  • More information on object labels (eg object
    position)
  • Experiment with shifting objects around

17
19
References
  • Y. Al-Ohali, M. Cheriet and C. Suen. Introducing
    termination probabilities to HMM. ICPR 2002
  • W. Grimson, C. Stauffer, R. Romano and L. Lee.
    Using adaptive tracking to classify and monitor
    activities in a site. CVPR 1998, pages 22-29
  • D. Moore, I. Essa and M.Hayes. Exploiting Human
    Actions and Object Context for Recognition Tasks.
    ICCV 1999
  • L. Stark and K. Bowyer. Achieving generalized
    object recognition through reasoning about
    association of function to structure. Pattern
    Analysis and Machine Intelligence,
    3(8)1097-1104, October 1991
  • C. Stauffer and W. Grimson. Learning patterns of
    activity using real-time tracking. Pattern
    Analysis and Machine Intelligence, 22(8)747-757,
    August 2000
  • R. Tsai. An efficient and accurate camera
    calibration technique for 3D machine vision. CVPR
    1986, pages 364-374

18
Write a Comment
User Comments (0)
About PowerShow.com