Title: Identifying Human-Object Interaction in Range and Video Data
1Identifying Human-Object Interaction in Range and
Video Data
Ben Packer,
Varun Ganapathi, Suchi Saria, and Daphne Koller
Results of Full Model
First Stage
Aim Understand and classify human actions while
simultaneously tracking objects of interaction
- Capture initial depth with no foreground
- Capture video/depth of action involving object
- Pose tracker runs simultaneously in real-time
- Every pixel is either background (same depth as
initial image), pose, or possible object - Train visual object detector from most
confident candidate objects - Use (smoothed) detector on the full sequence
Kinect Video Data
Pick Up
Video and Tracked Pose
Depth Image
Base Model
Full Model
Put Down
Tasks What action is being performed? Wher
e is the manipulated object?
Drop
Colored blobs indicate candidate objects, ranging
from red (least likely) to yellow (most likely)
Why is this easy?
Full Model of Action and Interaction
- Depth sensor allows us to easily detect
foreground/background - Existing pose tracker accurately finds human
- Extremely efficient, runs in real-time, so a
large amount of data can be easily collected
Knowing the action will help track object Use
spatio-temporal interaction primitives e.g.
moving away from foot, in hand Model each
action as HMM over primitives Allows for simple
learning and inference
Kick
Why is this hard?
- Even with background subtraction and pose
estimation, object may still be in many places - Generic object tracking can help locate the
object, but often fails - Action recognition involving human-object
interaction is largely unsolved
Toss
First Attempt
C,F candidate positions and appearance (obs.) J
human joint positions (obs.) A action of entire
sequence, S state O object position, P active
primitive
Action Classification