Title: A System for Large-scale,
1A System for Large-scale, Content-based Web
Image Retrieval - and the Semantics within
Till Quack
2Task
- Create a content-based image retrieval system for
the WWW - Large-scale, one order of magnitude larger than
existing systems. Means O(106) items - Relevance Feedback
- Explore and exploit the semantics within
- Take large-scale, content-based image retrieval
one step closer to commercial applications
3Outline
- Content-based Image Retrieval on the WWW
- PART I A System for Image Retrieval on the WWW
- Features
- Retrieval
- Relevance Feedback
- Software Design
- PART II The Semantics within
- Identifying a Method to find Semantics
- Data Mining for Semantic Clues
- Frequent Itemset Mining and Association Rules
- The Visual Link
- Discussion Demonstration
- Conclusions Outlook
4Content-based Image Retrieval on the WWW
- Characteristics of the data repository
- Size 4.2 billion documents in Googles index
- Diversity Documents in any context, language
- Control Anybody can publish anything
- Dynamics Ever changing
- System Requirements
- FAST
- SCALABLE
- Make use of all the information available
- Motivation for a new system
- Existing systems
- Either pure text (Google)
- Or pure content-based
- Large-Scale
5PART I A System for Large-scale, Content-based
Image Retrieval on the WWW
Ullrich MoenichTill QuackLars Thiele
6System Overview
7Visual Features describe the Images
- Global Features from MPEG-7 Standard
- Currently no Segmentation
- Reasons Scalability and the diversity of the
data - Texture Features
- Edge Histogram Descriptor (EHD)
- Histogram of quantified edge directions. 80
dimensions - Homogeneous Texture Descriptor (HTD)
- Output of Gabor filter-bank. 62 dimensions.
- Color Features
- Scalable Color Descriptor (SCD)
- Color Histogram. 256, 128, 64 or 32 dimensions
- Dominant Color Descriptor (DCD)
- Up to 8 dominant colors (3d color-space) and
their percentages - 32 dimensions
- Bins defined for each image
8Collateral Text as an additional Feature
- ALT Tag and Collateral Text around images
- VERY uncontrolled annotation
- Stemming Porter Stemmer
- Example training -gt train
- More matching terms for boolean queries
- But also some new ambiguities
- train to train verb / the train noun
9Retrieval in 2 Steps
1. Text Retrieval
2. Visual Nearest Neighbor Search
10Retrieval Text
- Options
- Boolean query on inverted index
- Vector Space Model
- LSI etc.
- Choice
- Ranked boolean queries on inverted index
- Ranking tfidf
- Reasons
- Speed
- Sparsity of data
- 600 000 Keywords in total
- 1 document 10-50 words
Keyword ImageId tf
shoe 1233 1
sport 1233 1
red 1233 1
banana 1234 1
fruit 1234 2
Order 1234 1
Keyid ImageId tf
124 1233 1
341 1233 1
345 1233 1
445 1234 1
75 1234 2
875 1234 1
11Retrieval Visual Features (MPEG-7)
- K-Nearest Neighbor search (K-NN)
- Find K closest candidates ci to query image q in
a vector space - Distance Minkowsky Metrics for distance d(ci,q)
namely L1 and L2 norms - Most MPEG-7 descriptors are high-dimensional
vectors - The dimensionality curse applies
- High dimensional spaces behave weirdly
- In particular the distances are not too
meaningful
12Retrieval Challenges for Visual Features
- We have several (visual) feature types How can
we combine them? - Our database is very large.How can we search it
fast enough? - i.e. how can we avoid comparing the query vector
with each database entry?
13A Combined Distance for the MPEG-7 Features
- We use a combined distance of all the visual
feature types - The individual distances occupy different ranges
in different distributions - The distributions were transformed to a normal
distribution in the range 0,1 - The distances are then combined linearly
14Clustering speeds up the search
- Problem
- Millions of items in DB
- Linear search over the whole dataset too slow
- Looking only for the K nearest neighbors anyway
- (One) Solution
- Partition the data into Clusters, identified by
representative, the centroid - Only search the cluster whose centroid is closest
to query q - K-Means clustering algorithm
- Not the best, in particular in HD spaces
- But fast!
- Problem with Clustering
- Query at the border of a cell does not find all
the nearest neighbors - Simple Solution Overlapping Clusters
- Problem Scalability
- Original data 7GB
- Overlapping data 50 GB
Imageid Primary Descriptor Secondary Descriptor 1 Secondary Descriptor 2 Secondary Descriptor 3
122 ehd htd scd dcd
45233 ehd htd scd dcd
6688 ehd htd scd dcd
15Relevance Feedback Improves the Results
- Relevance feedback User input to improve search
results - iteration by iteration - i.e. the user selects good matches
- We obtain the following information
- A new query vector which is a combination of the
relevant images Query Vector Movement - The ratios for the combination of the feature
types
16Relevance Feedback Query Vector Movement
- Construct the query vector qn of images selected
in iteration n - Vector component kFeature type f
(EHD,SCD,HTD)i1...M relevant images - The final, new query vector is
- q 0.75 qn 0.25 qn-1
- i.e. move from the old query vector towards the
new vector
17Relevance Feedback Weight Adapation
- Which feature is most important for the given
query? - The one for which all the relevant images are
closest - Determine the ratios for the combination based on
the average distance, e.g. for the EHD - and set
18Implementation Software and Hardware
- Languages C and Perl
- InlineCPP to connect Layers
- WWW Apache and CGI
- Relational DB mySQL
- Operating System OS X
- Hardware
- Dual 2 GHZ Apple G5, 2GB RAM
- Teran Terrabyte Disk Array
19Part II The Semantics Within
20Semantics Combining Text and Visual Features
- Our dataset is multi-modal
- Keywords and several visual features
- Not only valid for WWW data
- Video imagespeech,
- Bio-imagery imagemicroscope setting, cell
coloring fluid - Goal Try to jointly use the different modes
- Do semantic relations between the modes exist?
- Learn something about these semantic relations
- Improve the retrieval precision based on them
- Challenges in our project
- Large-scale
- Noisy and uncontrolled data
- Only global visual features
21Identifying a Method to find the Semantics
- Related work
- Latent Semantic Indexing (LSI) Westerveld 2000
- problem O(N2m3), NDocumentsTerms, mconcept
space - Statistical models Barnard, Forsyth 2001-2004
- Problem O several hours for several thousand
images - Problem It is a (rather strict, hierarchical)
model - Others
- Neural networks (SOM etc.)
- Hidden Markov Models
- Often Classification
- We dont know our classes, or there are just too
many - We cant train them either (data too diverse and
noisy) - Most of the methods above only tested on
relatively small, supervised datasets - There is one more option
22Method Data Mining for Semantic Clues
- Mine the data for patterns
- Find them only where they exist
- Deduce Rules from them
- Scalable methods available
- Frequent Itemset Mining and Association Rules
- Classic Application Market baskets, Census data
- Some works on Multimedia data
- Zaïane 98 Datacubes with appended keywords
- Tešic et al. 03 Perceptual associations
(texture) within images
23Frequent Itemsets and Association Rules
- Itemset I
- Transaction T
- Database D
- Support of Itemset A
- A is called frequent if
- Rule
- Support of a Rule
- Statistical significance
- Confidence of a Rule
- Strength of implication
- Maximum likelihood estimate that Bis true given
that A is true
24Example Advantages
- Example Market Baskets
- Rule Diaper,Milk?Beer
- Advantages
- Human readable
- Can be edited
- Fast Algorithms available
- Note Associations are not correlations
- The same concept, just simpler
- Associations and correlations Brin, Motwani,
Silverstein 98
TID Items
1 Bread, Milk
2 Beer, Diaper, Bread, Eggs
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Bread, Diaper, Milk
25Using FIMI to find the itemsets
- Frequent Itemset Mining (FIMI)
- Find frequent itemsets with support gt minsupp
- Minimal support minsupp given by an expert
- First Algorithm APriori Agrawal et al. 93
- Basic Idea If an itemset is frequent, all its
subsets must be frequent (Monotonicity) - k-passes over dataset for itemsets of length k
- O(knp) n transactions, p items, itemsets of
length k - Todays algorithms
- Rely on the same basic principle
- But much faster (Main Reason Data structures)
- Usually only 2 database passes
- linear runtime
- State-of-the-art algorithm overview FIMI03
- We used fpmax Grahne, Zhu Nov 03
26Diapers and Beer !!?
- Application to the domain of Multimedia data
- Formulate images as transactions
- Low-level clusters serve as a dimensionality
reduction for the visual features - We find associations of visual features
(clusters) and keywords - From theses associations we deduce semantic rules
- Advantages
- Comparably low computational complexity
- Other data sources can be integrated in the same
manner (e.g. long-term relevance feedback) - Challenges
- Noisy, uncontrolled data
- Associations within keywords much stronger than
associations between keywords and visual features - Uneven distribution of cluster sizes (K-Means
problem)
27Characteristics of the Itemsets and Rules
- There are associations
- Within text shoe ? walk
- Within visual clusters EHD 14 ? SCD 12
- Between text and visual clusters shoe ? EHD
14 - Measure for interestingness or choice of rules
from FI - Confidence?
- Statistical Criteria?
- Background Knowledge? (Example pregnant -gt
Woman 100 confidence) - Our Background Knowledge Rules that connect
keywords and low-level features are more
interesting - Since this is known, the mining can be adapted
and made even faster
28Exploiting the Itemsets and Rules
29Selecting Interesting Low-Level Clusters based on
Rules
- Clusters were introduced to partition the visual
feature vector data and search only on certain
clusters - Problem We miss certain nearest neighbors if
images for a concept are spread over several
clusters - Unsatisfactory Solution Overlapping Clusters
- But association rules might find and solve this
situation - Clusters are re-united
- If number of images for concept in both clusters
is gtminsupp - Example shirt -gt ehd249,ehd310 reunites
these clusters for the initial keyword-query
shirt! - This is scalable - unlike overlapping clusters
- Another benefit is that more images labeled with
the original keyword are injected into the
results of K-NN search - Currently One Keyword as high level semantic
concept - Future Find high level semantic concepts by
mining associations within text first
30The Visual Link
- Another contribution, NOT related to Frequent
Itemset Mining and Association Rules - Since search-concept suggests visual nearest
neighbor search with relevance feedback after
intitial keyword search - It would be nice to have a diverse selection of
images for a given keyword on the first page of
results - Images sorted not only by keyword ranking, but
also based on visual feature information - Basic idea For a given keyword query, build
groups of images that are visually close. - Larger groups are more important
- Show only one representative per group
31The Visual Link A Graph-Based Approach
- Let I(Q) be a set of images matching a keyword
query Q - Define a graph G(V,E)
- i.e. images are visually linked if the distance
between them is lower than a given threshold - Do a connected component analysis to find
connected components C - For each component C find the best
representative rC - Re-rank results based on representatives rC
32The Visual Link An Example
33The Visual Link An Approximation
- Problem Distance calculations for graph take too
long - Clusters cannot be used
- Loading individual vectors takes a lot of time
- Solution
- Approximate distance
- Idea If images in the same cluster and same
distance range to the centroid ? Probability that
they are close is high - New definition for visually linked
- If in same cluster and same range of relative
distance to its centroid - Can be encoded in relational DB! And comes at
nearly no extra cost in creation
Imageid Clusterid 2ndClusterid Reldist
1 221 122 0.6
2 342 345 0.8
3 223 42 0.2
4 12 126 0.4
34Discussion Demo
35Discussion Precision
- Measuring the quality of such a large-scale
system is difficult - Precision/Recall measure not possible ground
truth not known - C correct results
- D Desired results
- A Actual results
- We measure the precision based on user questioning
36Before we continue some numbers
- Number of Images 3 006 660
- Size of Image data 111 GB
- Feature Extraction 15 days (dual 2Ghz CPU, 2GB
RAM) - Number of distinct keywords 680 256
- Size of inverted keyword index table 50 260 345
lines - MySQL database size 23 GB
37And now the moment youve all been waiting for
38Conclusions
- A system with over 3 Million items was
implemented - Probably the largest CBIR System to date?
- A retrieval concept was introduced
- a keyword query followed by relevance feedback
and visual nearest neighbour search - Superior to existing retrieval concepts (query by
keyword or query by example) - Data mining to explore and exploit semantics in
large-scale systems was introduced
39Questions
40Outlook
- Many extensions and improvements possible
- Segmentation
- Or maybe rather some simple tiling
- Indexing
- K-Means should be replaced
- Suggestion VA-File based approach
Manjunath,Tesic 03 - Association Rule Mining
- Multilevel Approach
- First keywords for high level semantic concepts
- Then visual features
41Thanks
- Ullrich Moenich and Lars Thiele
42Which Rules are of Interest?
- There are associations
- Within text shoe ? walk
- Within visual clusters EHD 14 ? SCD 12
- Between text and visual clusters shoe ? EHD
14, SCD 12 - There are long and short rules
- Short rules have higher support by the nature of
the problem - Long rules contain more (precise) information
about the semantics - Measure for interestingness or choice of rules
from FI - Confidence?
- Statistical Criteria?
- Background Knowledge? (Example pregnant Woman )
43Characteristics and Challenges
- Chosen criteria
- Mainly interested in rules keywords ? visual
feature clusters. (Our Background Knowledge) - Support, confidence
- Mine long and short rules
- Restriction of the problem Mine for frequent
itemsets per keyword - i.e. all imagestransactions for a given keyword
- This means
- We avoid being distracted by associations within
keywords - The method is made even more scalable
- The keyword as a placeholder for a semantic
concept - A keyword does not always stand for a single
semantic concept - Proposal for future versions Multi-Level
approach - First keywords ? keywords rules to identify
real semantic concepts - Then itemset mining per identified concept
44Characteristics of the Itemsets and Rules -
Overall
45Why keyword filtering of the results does not work
46Proposal Semantic Clusters
- Ultimate goal Search some kind of Semantic
Clusters instead of visual feature clusters - Proposal based on approach from Ester et al.
2002, 2003 - Clustering based on frequent itemsets, originally
for text - Clustering criterion minimize overlap