Title: MultiConcept MultiModality Active Learning for Interactive Video Annotation
1Multi-Concept Multi-Modality Active Learning for
Interactive Video Annotation
- Meng Wang, Xian-Sheng Hua, Yan Song,
- Jinhui Tang, Li-Rong Dai
- University of Science and Technology of
ChinaMicrosoft Research Asia
ICSC 2007
2Outline
- Motivation
- Solution
- Evaluation
- Discussion
- Conclusion
3Outline
- Motivation
- Solution
- Evaluation
- Discussion
- Conclusion
4Video Annotation To Bridge Semantic Gap
- Video annotation to bridge semantic gap
- Split the semantic gap between low level features
and user information needs into two, hopefully
smaller gaps (a) mapping the low-level features
into the intermediate semantic concepts and (b)
mapping these concepts into user needs
(Hauptmann, CIVR 2005).
- Manual annotation is labor-intensive and
time-consuming - We usually need a large training set to guarantee
annotation accuracy. - Methods that can help reduce human effort are
highly desired.
5Active LearningTo Reduce Human Effort
- Active learning is an effective approach to
reduce human effort - It can obtain a more effective training set by
iteratively selecting the most informative
samples for manual annotation.
6The Limitations of Existing Active Learning-Based
Methods
- Multiple concepts are usually learnt
sequentially - The concepts are sequentially annotated with a
fixed number of samples for each concept, i.e.,
each concept is exhaustively annotated before
proceeding to the next.
- The neglect of the context of multi-modality
- Only a single modality is applied.
- An existing multi-modality active learning method
is to select a certain number of samples
according to each sub-model (Chen et al., AAAI
2005) - However, it takes no account of the
discriminative abilities of different modalities.
7Outline
- Motivation
- Solution
- Evaluation
- Discussion
- Conclusion
8To Incorporate Multiple Concepts into Active
Learning
- Existing sequential learning method can not
suitably assign labeling effort - For example, several concepts are difficult to
learn with existing features and some other
concepts already have accurate models, then
labeling more samples for these concepts can
hardly improve their performance. Thus it is more
rational to dedicate annotation effort to other
concepts.
- We propose to select the concept that is expected
to get the highest performance gain to learn in
each round - This is the greedy strategy to optimize the
average performance
9To Incorporate Multiple Modalities into Active
Learning
- We have to take the discriminative abilities of
different modalities into account - Some features may not be discriminative enough
for the concept to be annotated, and consequently
the active learning process can only attain very
limited improvements for the corresponding
sub-models - Adapt the numbers of selected samples for
different modalities such that they are
proportional to the performance variations of the
sub-models .
10The Scheme of Multi-Concept Multi-Modality Active
Learning
- Based on these ideas, we construct the
multi-concept multi-modality active learning
scheme
- For detailed learning method, we adopt
Manifold-Ranking, a semi-supervised algorithm to
further explore unlabeled data (He et al., ACM MM
2004, Yuan et al., ACM MM 2006)
11The Proposed Active Learning Process
- Input
- Li f / labeled training set for i-th concept,
1ic/ - Ui x1, x2, , xn / unlabeled set for i-th
concept, 1ic / - AT / number of active learning iterations /
- h / batch size for sample selection /
- C / concept set /
- Output
- fi / annotation results for i-th concept, 1ic
/ - Begin
- for t 1, 2, , AT
- k ConceptSelection(C) / select a concept /
- S SampleSelection(Lk, Uk, h) / select a
set of samples for this concept / - Manually label samples in S, and move set S from
Uk to Lk - fk Manifold-Ranking(Lk, Uk)
- / obtain the annotation results for this concept
/ - end
12The Concept Selection Strategy
- Firstly we have to establish the performance
evaluation criterion of multi-concept annotation - Here we adopt the most straightforward way, i.e.,
- ,where perfi is the
performance of the i-th concept - Then a greedy strategy leads us to selecting the
concept that is expected to get the highest
performance gain. The expected performance gain
for each concept is approximated by the
performance variation between the latest two
learning iterations.
13The Concept Selection Strategy
- However, we can also apply more sophisticated
performance measurements, such that the
annotation accuracies of those concepts with
large weights can be guaranteed, i.e., - This method needs an initial stage such that the
performance gains of all concepts can be
initialized. In our implementation each concept
is annotated for two iterations in this stage,
and then the performance gains of all concepts
are initialized.
14The Sample Selection Strategy
- For sample selection with individual modality, we
adopt three criteria - Informativeness
- Diversity
- Density
- For sample selection with multiple modalities,
the numbers of selected samples for different
modalities are adapted according to their
performance variations.
15Sample Selection Criteria
- The computation of effectiveness score
16Multi-Modality Sample Selection
- We construct our sample selection strategy based
on the performance gains of these modalities. - Denote by the performance gain of
m-th modality. Then we let the numbers of
selected samples be proportional to the
performance gains of multiple modalities, i.e.,
17Outline
- Motivation
- Solution
- Evaluation
- Discussion
- Conclusion
18Experimental results
- Experiments on TRECVID 2005 dataset
- 61901 sub-shots for training 64256 sub-shots
for testing - Six modalities
- Ten concepts Walking/Running, Explosion/Fire,
Maps, Flag-US, Building, Waterscape/Waterfront,
Mountain, Prisoner, Sports, and Car
19The effectiveness of sample selection
- We compare the proposed method with other four
schemes - Scheme 1 integrate a global effectiveness
measure as effectiveness(xi) S?perfmeffectiven
ess(xim), and then select h samples according to
this measure. - Scheme 2 select a equal number of (i.e., h/M)
samples for each modality. - Scheme 3 define effectiveness measure as a
linear combination of informativeness, density
and diversity measures - Scheme 4 randomly select samples
20Experimental Results
21The Effectiveness of Concept Selection
- We compare the proposed method with other two
schemes - Scheme 1 sequential annotation, i.e., manually
labeling s/c samples for each concept - Scheme 2 random concept selection method, i.e.,
in each round a concept is randomly selected
22Experimental Results
23Outline
- Motivation
- Solution
- Evaluation
- Discussion
- Conclusion
24Discussion
-
- We have assumed that the effort of labeling a
sample with a concept is fixed. - However, the effort may vary across different
concepts and samples. - Different concepts may lead to different average
annotation times (Volkmer et al, ACM MM 2005) - Annotating different samples may cost different
effort as well even with the same concept. - But if the costs for different samples and
concepts can be obtained, the sample selection
and concept selection methods in our proposed
scheme can be easily adapted as well by taking
these costs into account.
25Outline
- Motivation
- Solution
- Evaluation
- Discussion
- Conclusion
26Conclusion and Future works
- An interactive video annotation framework based
on multi-concept multi-modality active learning
- Future works
- A more comprehensive evaluation of the proposed
scheme (e.g., with more concepts). - Further improve it by jointly rather than
separately learning multiple concepts.
27- Reference
- 1 A. G. Hauptmann, Lessons for the Future from
a Decade of Informedia Video Analysis Research,
in Proceedings of ACM International Conf. Image
and Video Retrieval, 2005 - 2 M. Chen and A. Hauptmann, Active learning in
multiple modalities for semantic feature
extraction from video. In Proceedings of AAAI
workshop on learning in computer vision, 2005. - 3 J. R. He, M. J. Li, H. J. Zhang, H. H. Tong
and C. S. Zhang, Manifold-ranking based image
retrieval, in Proceedings of ACM Multimedia,
2004 - 4 X. Yuan, X. S. Hua, M. Wang, and X. Wu,
Manifold-ranking based video concept detection
on large database and feature pool, in
Proceedings of ACM Multimedia, 2006 - 5 T. Volkmer, J. R. Smith, and A. Natsev, A
web-based system for collaborative annotation of
large image and video collections, in
Proceedings of ACM Multimedia, 2005
28Thanks!