MultiConcept MultiModality Active Learning for Interactive Video Annotation - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

MultiConcept MultiModality Active Learning for Interactive Video Annotation

Description:

MultiConcept MultiModality Active Learning for Interactive Video Annotation – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 29

Provided by: meng54

Category:

more less

Transcript and Presenter's Notes

Title: MultiConcept MultiModality Active Learning for Interactive Video Annotation

1
Multi-Concept Multi-Modality Active Learning for
Interactive Video Annotation

Meng Wang, Xian-Sheng Hua, Yan Song,
Jinhui Tang, Li-Rong Dai
University of Science and Technology of
ChinaMicrosoft Research Asia

ICSC 2007
2
Outline

Motivation
Solution
Evaluation
Discussion
Conclusion

3
Outline

Motivation
Solution
Evaluation
Discussion
Conclusion

4
Video Annotation To Bridge Semantic Gap

Video annotation to bridge semantic gap
Split the semantic gap between low level features
and user information needs into two, hopefully
smaller gaps (a) mapping the low-level features
into the intermediate semantic concepts and (b)
mapping these concepts into user needs
(Hauptmann, CIVR 2005).

Manual annotation is labor-intensive and
time-consuming
We usually need a large training set to guarantee
annotation accuracy.
Methods that can help reduce human effort are
highly desired.

5
Active LearningTo Reduce Human Effort

Active learning is an effective approach to
reduce human effort
It can obtain a more effective training set by
iteratively selecting the most informative
samples for manual annotation.

6
The Limitations of Existing Active Learning-Based
Methods

Multiple concepts are usually learnt
sequentially
The concepts are sequentially annotated with a
fixed number of samples for each concept, i.e.,
each concept is exhaustively annotated before
proceeding to the next.

The neglect of the context of multi-modality
Only a single modality is applied.
An existing multi-modality active learning method
is to select a certain number of samples
according to each sub-model (Chen et al., AAAI
2005)
However, it takes no account of the
discriminative abilities of different modalities.

7
Outline

Motivation
Solution
Evaluation
Discussion
Conclusion

8
To Incorporate Multiple Concepts into Active
Learning

Existing sequential learning method can not
suitably assign labeling effort
For example, several concepts are difficult to
learn with existing features and some other
concepts already have accurate models, then
labeling more samples for these concepts can
hardly improve their performance. Thus it is more
rational to dedicate annotation effort to other
concepts.

We propose to select the concept that is expected
to get the highest performance gain to learn in
each round
This is the greedy strategy to optimize the
average performance

9
To Incorporate Multiple Modalities into Active
Learning

We have to take the discriminative abilities of
different modalities into account
Some features may not be discriminative enough
for the concept to be annotated, and consequently
the active learning process can only attain very
limited improvements for the corresponding
sub-models
Adapt the numbers of selected samples for
different modalities such that they are
proportional to the performance variations of the
sub-models .

10
The Scheme of Multi-Concept Multi-Modality Active
Learning

Based on these ideas, we construct the
multi-concept multi-modality active learning
scheme

For detailed learning method, we adopt
Manifold-Ranking, a semi-supervised algorithm to
further explore unlabeled data (He et al., ACM MM
2004, Yuan et al., ACM MM 2006)

11
The Proposed Active Learning Process

Input
Li f / labeled training set for i-th concept,
1ic/
Ui x1, x2, , xn / unlabeled set for i-th
concept, 1ic /
AT / number of active learning iterations /
h / batch size for sample selection /
C / concept set /
Output
fi / annotation results for i-th concept, 1ic
/
Begin
for t 1, 2, , AT
k ConceptSelection(C) / select a concept /
S SampleSelection(Lk, Uk, h) / select a
set of samples for this concept /
Manually label samples in S, and move set S from
Uk to Lk
fk Manifold-Ranking(Lk, Uk)
/ obtain the annotation results for this concept
/
end

12
The Concept Selection Strategy

Firstly we have to establish the performance
evaluation criterion of multi-concept annotation
Here we adopt the most straightforward way, i.e.,
,where perfi is the
performance of the i-th concept
Then a greedy strategy leads us to selecting the
concept that is expected to get the highest
performance gain. The expected performance gain
for each concept is approximated by the
performance variation between the latest two
learning iterations.

13
The Concept Selection Strategy

However, we can also apply more sophisticated
performance measurements, such that the
annotation accuracies of those concepts with
large weights can be guaranteed, i.e.,
This method needs an initial stage such that the
performance gains of all concepts can be
initialized. In our implementation each concept
is annotated for two iterations in this stage,
and then the performance gains of all concepts
are initialized.

14
The Sample Selection Strategy

For sample selection with individual modality, we
adopt three criteria
Informativeness
Diversity
Density
For sample selection with multiple modalities,
the numbers of selected samples for different
modalities are adapted according to their
performance variations.

15
Sample Selection Criteria

The computation of effectiveness score

16
Multi-Modality Sample Selection

We construct our sample selection strategy based
on the performance gains of these modalities.
Denote by the performance gain of
m-th modality. Then we let the numbers of
selected samples be proportional to the
performance gains of multiple modalities, i.e.,

17
Outline

Motivation
Solution
Evaluation
Discussion
Conclusion

18
Experimental results

Experiments on TRECVID 2005 dataset
61901 sub-shots for training 64256 sub-shots
for testing
Six modalities

Ten concepts Walking/Running, Explosion/Fire,
Maps, Flag-US, Building, Waterscape/Waterfront,
Mountain, Prisoner, Sports, and Car

19
The effectiveness of sample selection

We compare the proposed method with other four
schemes
Scheme 1 integrate a global effectiveness
measure as effectiveness(xi) S?perfmeffectiven
ess(xim), and then select h samples according to
this measure.
Scheme 2 select a equal number of (i.e., h/M)
samples for each modality.
Scheme 3 define effectiveness measure as a
linear combination of informativeness, density
and diversity measures
Scheme 4 randomly select samples

20
Experimental Results

(h 500)

21
The Effectiveness of Concept Selection

We compare the proposed method with other two
schemes
Scheme 1 sequential annotation, i.e., manually
labeling s/c samples for each concept
Scheme 2 random concept selection method, i.e.,
in each round a concept is randomly selected

22
Experimental Results

(h 500)

23
Outline

Motivation
Solution
Evaluation
Discussion
Conclusion

24
Discussion

We have assumed that the effort of labeling a
sample with a concept is fixed.
However, the effort may vary across different
concepts and samples.
Different concepts may lead to different average
annotation times (Volkmer et al, ACM MM 2005)
Annotating different samples may cost different
effort as well even with the same concept.
But if the costs for different samples and
concepts can be obtained, the sample selection
and concept selection methods in our proposed
scheme can be easily adapted as well by taking
these costs into account.

25
Outline

Motivation
Solution
Evaluation
Discussion
Conclusion

26
Conclusion and Future works

An interactive video annotation framework based
on multi-concept multi-modality active learning

Future works
A more comprehensive evaluation of the proposed
scheme (e.g., with more concepts).
Further improve it by jointly rather than
separately learning multiple concepts.

Reference
1 A. G. Hauptmann, Lessons for the Future from
a Decade of Informedia Video Analysis Research,
in Proceedings of ACM International Conf. Image
and Video Retrieval, 2005
2 M. Chen and A. Hauptmann, Active learning in
multiple modalities for semantic feature
extraction from video. In Proceedings of AAAI
workshop on learning in computer vision, 2005.
3 J. R. He, M. J. Li, H. J. Zhang, H. H. Tong
and C. S. Zhang, Manifold-ranking based image
retrieval, in Proceedings of ACM Multimedia,
2004
4 X. Yuan, X. S. Hua, M. Wang, and X. Wu,
Manifold-ranking based video concept detection
on large database and feature pool, in
Proceedings of ACM Multimedia, 2006
5 T. Volkmer, J. R. Smith, and A. Natsev, A
web-based system for collaborative annotation of
large image and video collections, in
Proceedings of ACM Multimedia, 2005