Human-Assisted Motion Annotation - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Human-Assisted Motion Annotation

Description:

Human-Assisted Motion Annotation Yair Weiss The Hebrew University of Jerusalem Ce Liu William T. Freeman Edward H. Adelson Massachusetts Institute of Technology – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 2
Provided by: CeL68
Category:

less

Transcript and Presenter's Notes

Title: Human-Assisted Motion Annotation


1
Human-Assisted Motion Annotation
Yair Weiss The Hebrew University of Jerusalem
Ce Liu William T. Freeman Edward H.
Adelson Massachusetts Institute of Technology
  • Motivations
  • Existing motion databases are either synthetic or
    limited to indoor, experimental setups 1. Can
    we have ground-truth motion for arbitrary,
    real-world videos?
  • Humans are an expert at segmenting moving objects
    and perceiving difference between two frames. Can
    we have a computer vision system to quantify
    human perception of motion and generate
    ground-truth for motion analysis?
  • Several issues need to addressed
  • Is human labeling reliable (compared to the
    veridical ground-truth) and consistent (across
    subjects)?
  • How to efficiently label every pixel at every
    frame for hundreds of real-world videos?

Figure 1. The graphical user interface (GUI) of
our system (a) main window for labeling contours
and feature points (b) depth controller to
change depth value (c) magnifier (d) optical
flow viewer (e) control panel.
  • Our work
  • We designed a human-in-loop system to annotate
    motion for real-world videos 2
  • Semiautomatic layer segmentationThe user labels
    contours using polygons, and the system
    automatically propagates the contours to other
    frames. The system can also propagate users
    correction across frames.
  • Automatic layer-wise optical flowThe system
    automatically computes dense optical flow fields
    for every layer at every frame using
    user-specified parameters. For each layer, the
    user picks up the best flow that yields the
    correct matching and agrees with the smoothness
    and discontinuities of the image.
  • Semiautomatic motion labelingWhen the flow
    estimation fails, the user can label sparse
    correspondences between two frames, and the
    system automatically interpolates it to a dense
    flow field.
  • Automatic full-frame motion composition.
  • Our methodology is examined by comparing with
    veridical ground-truth data and user studies.
  • We created a ground-truth motion database
    consisting of 10 real-world video sequences
    (still growing). This database can be used for
    evaluating motion analysis algorithms as well as
    other vision and graphics applications.
  • Experiment
  • We applied our system to annotating a veridical
    example from 1 (Figure 3). Our annotation is
    very close to theirs 3.21 AAE, 0.104 AEP. The
    main difference is on the occluding boundary.
  • We tested the consistency of human annotation
    (Figure 3). The mean error is 0.989 AAE, 0.112
    AEP. The error magnitude correlates with the
    blurriness of the image.
  • We created a ground-truth motion database
    containing 10 real-world videos with 341 frames
    (Figure 5, Table 1) for both indoor and outdoor
    scenes. The statistics of the ground-truth motion
    are plotted in Figure 4.

Figure 2. The consistency of nine subjects
annotation. Clockwise from top left the image
frame, mean labeled motion, mean absolute error
(red higher error, white lower error), and
error histogram.
Figure 5. Some frames of the ground-truth motion
database we created. We obtained ground-truth
flow fields that are consistent with object
boundaries, as shown in column (3) and (4). In
comparison, the output of an optical flow
algorithm 3 is shown in column (5). From Table
1, the performance of this algorithm on our
database is worse than the performance on the
Yosemite sequence (1.723 AAE, 0.071 AEP).
  • System Features
  • We used the-state-of-the art computer vision
    algorithms to design our system. Many of the
    objective functions in contour tracking, flow
    estimation and flow interpolation have L1 norms
    for robustness. Techniques such as iterative
    reweighted least square (IRLS), pyramid-based
    coarse-to-fine search and occlusion/outlier
    detection were intensively used for optimizing
    these nonlinear objective functions.
  • The system was written in C, and QtTM 4.3 was
    used for GUI design (Figure 1). Our system has
    all the components to make annotation simple and
    easy, and also gives the user full freedom to
    label motion manually.

Table 1. The performance of an optical flow
algorithm 3 on our database
(a) (b) (c) (d) (e) (f) (g) (h)
AAE 8.996º 58.905º 2.573º 5.313º 1.924º 5.689º 5.243º 13.306º
AEP 0.976 4.181 0.456 0.346 0.085 0.196 0.385 1.567
References
1 S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski. A database and evaluation methodo-logy for optical flow. In Proc. ICCV, 2007.
2 C Liu, W. T. Freeman, E. H. Adelson, Y. Weiss. Human-Assisted Motion Annotation. Submitted to CVPR08.
3 A. Bruhn, J.Weickert, , and C. Schnörr. Lucas/Kanade meets Horn/Schunk combining local and global optical flow methods. IJCV, 61(3)211231, 2005.
Figure 4. The marginal ((a)(h)) and joint
((i)(n)) statistics of the ground-truth motion
from the database we created (log histogram).
Symbol u and v denotes horizontal and vertical
motion, respectively. From these statistics it is
evident that horizontal motion dominates
vertical vertical motion is sparser than
horizontal flow fields are sparser than natural
images spatial derivatives are sparser than
temporal derivatives.
Write a Comment
User Comments (0)
About PowerShow.com