Title: Image annotation? Trends and Direction for real-world image Annotation
1Image annotation? Trends and Direction for
real-world image Annotation
Yohan Jin, Latifur Khan et al. "Image
Annotations By Combining Multiple Evidence
WordNet", In Proc. of 13th Annual ACM Multimedia
2005 (ACM MM05'), Singapore, November 2005, Page
706-715
2Image Annotation?
soldiers,clock, door,wall,mirror
yohan jin, military officer,military
friends
3Problem Statement
- What is the Auto Image Annotation (AIA)?
Tiger Woods (People), palm(tree), building,
grass, golf, sky
Automatically annotate images then retrieve based
on the textual annotations.
4Problem Statement contd.
- Why many multimedia researchers love to do it?
- Big Burst of Multimedia Content
- E.g., Flicker, YouTube
- Interdisciplinary Area
- Computer Vision, Machine Learning,
Information Retrieval, NLP and so on.
5Overall Procedure
- Problem Statement
- Motivation for KBIAR
- Approach
- Semantic-Similarity (WordNet)
- Several Similarity Measures
- Combining Semantic Evidence Model
- Results
- Contributions
- Future Work as Ongoing Project
6Motivation
- How to retrieve images/videos?
- CBIR is based on similarity search of visual
features - Doesnt support textual queries
- Doesnt capture semantics
- Automatically annotate images then retrieve based
on the textual annotations.
Example Annotations Tiger, grass.
7Motivation for KBIAR
- There is a gap between perceptual issue and
conceptual issue. - Semantic gap Hard to represent semantic meaning
using low-level image features like color,
texture and shape. - Its possible to answer query Red ball with
Red Rose. -
8Motivation
- Most of current automatic image annotation and
retrieval approaches consider - Keywords
- Low-level image features for visual
token/region/object - Correspondence between keywords and visual
tokens - Our goal is to develop automated image annotation
tecniques with better accuracy
9Annotation
10Annotation
- Major steps
- Segmentation into regions
- Clustering to construct blob-tokens
- Analyze correspondence between key words and
blob-tokens - Auto Annotation
11Annotation Segmentation Clustering
- Images Segments
Blob-tokens
12Annotation Correspondence/Linking
- Our purpose is to find correspondence between
words and blob-tokens. - P(TigerV1), P(V2grass)
13Auto Annotation
.
14Co-Occurrence Models
- Mori et al. 1999
- Create co-occurrence table using a training set
of annotated images - Tend to annotate with high frequency words
- Context is ignored
- Needs joint probability models
P( w1 v1 ) 12/(12201)0.8 P( v3 w2 )
12/(2401243)0.12
15Correspondence Translation Model (TM)
16Translation Models
- Duygulu et al. 2002
- Use classical IBM machine translation models to
translate visterms into words - IBM machine translation models
- Need a bi-lingual corpus to train the models
V2 V4 V6
Maui People Dance
Mary no daba una botefada a la bruja verde
Mary did not slap the green witch
V1 V34 V321 V21
Tiger grasssky
17Correspondence (TM )
B
B
N
X
W
N
W
18Correspondence (TM )
B
W
Bj
Wi
N
N
19Correspondence (TM )
- Cosine Method (CSM)
- Apply cosine to calculate the matrix with the
dimension of W x B in which the element of ith
row and jth column is the cosine between ith row
in and jth column in .
20Correspondence (TM )
- EM algorithm
- EM algorithm can be used to estimate some set of
parameters ? that describe a hidden probability
distribution. - IBM model-2 for Translation
- Try to maximize likelihood
21Correspondence (TM )
- EM algorithm
- Calculate correspondences based on an estimate of
the probability table and use the correspondences
to update the estimate of the probability table - Two constraints
22Correspondence (TM )
23Correspondence (TM )
24Motivation Contd.
- Annotation results in images
Fish, building, sea, coral, flower
Sky, mountain, city
Noisy keywords
25Motivation Contd.
fish
sea
choose
coral
Relevant keywords
building
flower
remove
26Motivation Contd.
- Human can do Semantic Grouping
- What about Computer?
- Semantic Similarity
- fish sea, fish
building - How much similar?
- WordNet
27Approach
- WordNet
- A lexical database for the English language
- English nouns, verbs, adjectives and adverbs are
organized into synonym sets - Each representing one underlying lexical concept
28WordNet
- Car a motor vehicle with four wheels usually
propelled by an internal combustion engine - .self-propelled vehicle
- . wheeled vehicle
- . vehicle
- .instrumentation
- . artifact
- .object
- .entity
29Detect Concept
30Semantic Similarity (4)
- To remove noisy annotation words in each image.
- Three different Approaches
- (Node-based , Distance-based, Gloss-based
approaches) - We use the WordNet2.0, SemCor2.0
- as the Knowledge-base
31Node-based Approaches
- Resnik Measure (RIK)95
- Jiang and Conrath Measure (JNC)97
- Line Measure (LIN)98
32Resnik Measure(1)
- Introduces first Information Content (IC)
- Use the Corpus (SemCor2.0)
- A concept with high IC value
- ?Concept has a more detailed information
- IC(Cable-television) greater than
IC(television)
33Resnik Measure(2)
- In the Corpus,
- calculate the frequency of one word.
- Get the Prob. by relative frequency
word(c) set of words subsumed by c
34Resnik Measure(3)
artifact
- Determine the lcs (lowest common subsumer)
between two words (hotel,door) - IC value of lcs is the semantic similarity value
structure
door
building
hotel
35Resnik Measure(4)
- After detecting the lcs between two words
36Resnik Measure(4) Weakness
If the lcs is the same Soil rock
?material(4.82) Sand rock?material(4.82)
The simantic similarity value is the same!!
No way to discriminate!!
37Jiang Conrath Measure
- Use the IC (Information Content) notion
- Consider
- - IC value of lcs between the two words
- and IC value of two words also!!
38Lin Measure
- Use the ratio of commonality
- information amount of each words
39Distance-based Measure
- Leacock and Chodorow Measure (LNC) 98
40Leacock and Chodorow Measure (LNC)
- Measure by following the IS-A relation in
WordNet. - Compute the Shortest Number of intermediate
Nodes.
41Leacock and Chodorow Measure2 (LNC)
ShortestLen (professional golf, baseball game)
5 D17(overall depth)
42Gloss-based Measure
- Banerjee and Pedersen Measure(BNP)
43Benerjee and Pedersen Measure (BNP)
- Use Gloss-Overlap for computing similarity
- The more share, the more related!!
- All relations
- -hypernym,hyponym,meronym,holonym..
44Benerjee and Pedersen Measure(BNP)
- By gathering all glosses between A and B through
all relations in WordNet.
Related-paris(gloss,gloss),(hype,hype),(hypo,hyp
o), (hype,gloss),(gloss,hype)
45Limitations of Measures(1)
- RIK measure
- Cannot differentiate the two words which have the
same lcs. - JNC, LIN measure
- Suggest the way to differentiate the word which
have same lcs, so sensitive to Corpus
46Limitations of Measures(2)
- LNC measure
- Shortest distance in WordNet may not
- reflect the true distance!
- Ex) furniture vs sky ? 8
- furniture vs door ? 8
- However, furniture is definitely close to
door!
47Limitations of Measures(3)
- BNP measure
- Rely heavily on the shared glosses
- If there is no common word in glosses
- throughout every relations in WordNet
- gt no way to get the distance..
- Ex) jet sky ? no shared word ? score 0!!
-
48TMHD model
- TMHD Applying semantic similarity measure on
top of the TM model. - We choose the JNC, LIN, BNP measures outperform
other measures - Combine these scores for each keywords using
Dempster-Shaper Theory.
49Dempster-Shafer Basics
- Frame of Discernment (T)
- a mutually exclusive elements
- Power set (2T)
- All subsets of T
- Elements
- Propositions
- Hypothesis
50Dempster-Shafer Basics
- Basic probability assignment or mass function (m)
- m(A) a measure of that portion of the total
belief to A - Uncertainty of the evidence
51Dempster-Shafer Example
- What is the next page a user will visit?
- Given a web site of three pages B, C, D
- T B, C, D
- 2T PB,PC ,PD,PB , PC , PB, PD, PD ,
PC, PB , PC , PD,? - We have Evidence that
- m(B) 0.3
- m(B,D) 0.1
- m(D) 0.2
- Uncertainty
- m(T) 1 0.3 0.1 0.2 0.5
52Dempsters Rule
53Dempsters Rule in TM Model
- Why?
- JNC,LIN,BEN give better results
- Combine three measures into one by giving
- different weights.
- Importance of each measure is different to each
image locally - TMHD model can combine dynamically three
evidences using the Dempsters Rule. -
54Dempsters Rule in TM Model Cont.
- H (Hypothesis) assignment of a similarity value
between annotated keywords. - e.g. Hypothesis semantic dominance of sky in
one image - ? semantic similarity of sky with other
keywords in a particular image
55Dempsters Rule in TM Model Cont.
- Dempsters Rule (with 2 evidences)
- H a member of power set of
-
frame of discernment
56Dempsters Rule in TM Model Cont.
- Enhanced Dempsters Rule (with 3 evidences)
m1(A) is the portion of belief assigned to A by m1
m1,2,3(A) is the Dempsters combined prob. for a
hypothesis
57Dempsters Rule in TM Model Cont.
- Example 1)
- Image has three annotation words (A,B,C)
- Proper subset of
58Dempsters Rule in TM Model Cont.
In many cases, basic probabilities of every
proper subset of may not available!!
If there is no belief of one subset,
59Dempsters Rule in TM Model Cont.
- We expect the evidences (JNC, LIN, BEN) to
evaluate semantic dominance of only one keyword
at a time. - We have positive evidence for only,
60Dempsters Rule in TM Model Cont.
- The Uncertainty of the evidence in our case is
- TMHD model predict the semantic similarity by
combining Dempsters Rule in this way -
-
m
61Dempsters Rule in TM Model Cont.
62Dempsters Rule in TM Model Cont.
- The simplified Dempsters Rule
- After eliminating zero terms
63Dempsters Rule in TM Model Cont.
Uncertainty in the bodies of evidence for the
TSD (Total Semantic Distance) the summation
semantic distance for a word between every
other words in a image
64Dempsters Rule in TM Model Cont.
- Uncertainty based on the TSD
- TSD (JNC) summation distance of JNC
- over pair wise keywords
65Dempsters Rule in TM ModelCont.
- Example 2
- Apply the dempsters rule to remove the
- noisy word get from TM model
Sun, water, field, pillar -gt TMs annotation
result
66Dempsters Rule in TM ModelCont.
- Within this image,
- TSD (JNC) 2.2087, TSD (LIN) 2.2875
- TSD (BNP) 5.69211
- Uncertainty
- Basic probs.
67Dempsters Rule in TM ModelCont.
- Get the final combination result
- Normalize the values for ranking
-
Remove!!
68Results(1)
- Dataset
- Corel Stock Photo CDs.
- 600 CDs, each of them consists of 100 images
under same topic. - We select 5000 images (4500 training, 500
testing). Each image has manual annotation. - 374 words and 500 blobs.
- sun city sky mountain
- grizzly bear meadow water
69Results(2)
70Results(3)
71Overall Procedure
- Problem Statement
- Motivation for KBIAR
- Approach
- Semantic-Similarity (WordNet)
- Several Similarity Measures
- Combining Semantic Evidence Model
- Results
- Contributions
- Future Work as Ongoing Project
72Contributions
- Open the new branch for Content-based Image
Retrieval area ? Context - Refinement Process ? required for many multimedia
data (image video retrieval) - Web-image annotation
73Major Following Works in Knowledge-based Image
Annotation Refinement
- Image annotation refinement using random walk
with restartsC Wang, F Jing, L Zhang, HJ Zhang -
Proceedings of the 14th annual ACM international
conference on Multimedia 2006. - An adaptive graph model for automatic image
annotation J Liu, M Li, WY Ma, Q Liu, H Lu -
Proceedings of the 8th ACM international workshop
on Multimedia Information Retrieval (MIR 06') - A Search-Based Web Image Annotation MethodX Rui,
N Yu, T Wang, M Li - Multimedia and Expo, 2007
IEEE International Conference on, 2007 -
ieeexplore.ieee.org - Automatic Refinement of Keyword Annotations for
Web Image SearchB Wang, Z Li, M Li - Springer,
(MMM 07') - Refining image annotation using contextual
relations between wordsY Wang, S Gong -
Proceedings of the 6th ACM international
conference on Image and Video Retrieval (CIVR
07') - A Content-Based Image Annotation RefinementC
Wang, F Jing, L Zhang, HJ Zhang - Computer Vision
and Pattern Recognition, 2007. CVPR'07. IEEE - Image Annotation Refinement using NSC-Based Word
CorrelationJ Liu, M Li, Q Liu, H Lu, S Ma -
Multimedia and Expo, 2007 IEEE International
Conference on, 2007 - Automatic image annotation by an iterative
approach incorporating keyword correlations and
region matching CIVR07
74Smoothing Effect
- Low-level features ? Classification
- Wait a minute, we have time and resource for
bridging this semantic gap!!
75Smoothing Effect
- Re-ranking Graph Algorithms
- Graph-random Walks (Wang et al, ACM MM06)
- Adaptive graph model (Liu et al, MIR 06)
- Iterative annotation graph model (Zhou et al.
ACM CIVR07) - Web-image Annotation based on Search
- Web-keywords Refinement (Wang et al. MMM07)
- Bipartite graph model for web-image annotation
- (Rui et al. ACM MM07)
76Refining Decision Table
Candidate of Candidate keywords
Global textual relations
Web-search results
Candidate keywords
Knowledge-base
Local textual co-occurrence
Visual-similarity
Graph-heuristic algorithm
Machine-Learning algorithm
Decision Table is Too heavy
77Problem Reduction
- Image Annotation Refinement ? weighted Maximum
Cut Problem
building
woods
sky
desktop
G(V,E)
fish
V
politics
E
Woods, Building, fish, Sky, desktop, politics
Semantic Distance
Candidate keywords
78Complexity of Image Annotation Problem
NP-Complete Original Problem
NP-Complete
Image Annotation Problem
KBIAR (Knowledge-base Image Annotation Refinement)
79Optimal Solution of weighted Maximum-Cut
(-1)
building
(-1)
woods
(-1)
sky
desktop
(1)
We need to check Another Guess!!
(1)
fish
politics
(1)
80Approximation Algorithm for WMC using
Randomization (Goeman et al.95)
- Relaxation Process1
- 1-dimensional variable of unit norm
2-dimensional vector space of
unit norm - Let us define WMC 2-relaxation problem
- (WMC-2VQP)
-
81Approximation Algorithm for WMC using
Randomization (Goeman et al.95)
- Relaxation Process2
- 2-dimensional variable of unit norm
n-dimensional vector space of
unit norm - Construct a positive semi-definite matrix M,
- Formulate WMC-SDP(Semi-definite Program)
82Randomized Approximation Scheme (2-way)
83Relaxation Effect on Image Annotation Refinement
building
building
crystal
anemone
crystal
anemone
palace
reef
palace
reef
people
people
Edge-Values
Node-Variable
84RMCA for Image Annotation Refinement
85RMCA for Image Annotation Refinement
86RMCA for Image Annotation Refinement
87RMCA for Image Annotation Refinement
882-dimensional Random-hyper plane for decision
892-dimensional Random-hyper plane for decision
90Another Semantic Distance (Normalized Google
Distance)
- Dynamic
- Diversity
- Example
91Result with Corel Image Set
92Google Image Labeler
93Context, rather than Concept
94Web? There is answer..but
Woods, hoping to extend winning streak, charges
to lead at Dubai Desert Classic Published
January 30, 2008 DUBAI, United Arab Emirates
(AP) Tiger Woods picked up right where he left
off last week - at the top of the
leaderboard. Woods, who won the Buick
Invitational on Sunday by eight strokes, shot a
7-under 65 Thursday to take a two-shot lead after
the first round of the Dubai Desert Classic. "I
played well today, just a bunch of good golf
shots,"" Woods said after his bogey-free round at
the Emirates Golf Club. Eleven players, including
Miguel Angel Jimenez and Abu Dhabi Golf
Championship winner Martin Kaymer, were tied for
second at 67. Ernie Els, Sergio Garcia and
defending champion Henrik Stenson were tied with
10 others another stroke back. Woods said he
played better in Dubai than he did last week at
Torrey Pines. "I had two good days of practice
the last couple days and started to hit the ball
a lot better than I did last week,"" said Woods,
who won the Dubai tournament in 2006.
woods
Tiger Woods
golf shot
stroke
95Result with Web-images
96Decision-2d hyper plane
today
winning streak
top
winner
Golf Club
second
first round
week
others
Golf shot
Golf championship
Kaymer
stoke
Classic
woods
Eight stroke
shot
Abu Dubai
r
Emirates
round
desert
Dubai
Tiger woods
Buick Invitational
Sunday
lead
Leader board
eleven players
Jimenez
Thursday
Ernie Els
Sergio Garcia
Defending Champion
two-shot
Hendrik Stenson
(a)
972nd round decision
Bethlehem
West bank
r
local
Israeli Civilian
militant group
sources
militant
town
night
leader
Palestinian
arrest
Israeli Forces
Jihad
terror
home
attack
terrorist
forces
soldiers
hours
(b)
Israeli Civilian
Palestinian
militant
Bethlehem
Jihad
r
arrest
town
attacks
night
soldiers
terror
forces
terrorist
(c)
98Discussion Future Work
- Evaluation scheme
- More refined web-data (e.g. wiki-pedia)
- Vision vs Semantic Measure
- Video refinement