A System for Large-scale, - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

A System for Large-scale,

Description:

Find K closest candidates ci to query image q in a vector space ... State-of-the-art algorithm overview: FIMI'03. We used: fpmax* [Grahne, Zhu: Nov 03] ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 47
Provided by: tqu7
Learn more at: https://web.ece.ucsb.edu
Category:
Tags: scale | system

less

Transcript and Presenter's Notes

Title: A System for Large-scale,


1
A System for Large-scale, Content-based Web
Image Retrieval - and the Semantics within
Till Quack
2
Task
  • Create a content-based image retrieval system for
    the WWW
  • Large-scale, one order of magnitude larger than
    existing systems. Means O(106) items
  • Relevance Feedback
  • Explore and exploit the semantics within
  • Take large-scale, content-based image retrieval
    one step closer to commercial applications

3
Outline
  • Content-based Image Retrieval on the WWW
  • PART I A System for Image Retrieval on the WWW
  • Features
  • Retrieval
  • Relevance Feedback
  • Software Design
  • PART II The Semantics within
  • Identifying a Method to find Semantics
  • Data Mining for Semantic Clues
  • Frequent Itemset Mining and Association Rules
  • The Visual Link
  • Discussion Demonstration
  • Conclusions Outlook

4
Content-based Image Retrieval on the WWW
  • Characteristics of the data repository
  • Size 4.2 billion documents in Googles index
  • Diversity Documents in any context, language
  • Control Anybody can publish anything
  • Dynamics Ever changing
  • System Requirements
  • FAST
  • SCALABLE
  • Make use of all the information available
  • Motivation for a new system
  • Existing systems
  • Either pure text (Google)
  • Or pure content-based
  • Large-Scale

5
PART I A System for Large-scale, Content-based
Image Retrieval on the WWW
Ullrich MoenichTill QuackLars Thiele
6
System Overview
7
Visual Features describe the Images
  • Global Features from MPEG-7 Standard
  • Currently no Segmentation
  • Reasons Scalability and the diversity of the
    data
  • Texture Features
  • Edge Histogram Descriptor (EHD)
  • Histogram of quantified edge directions. 80
    dimensions
  • Homogeneous Texture Descriptor (HTD)
  • Output of Gabor filter-bank. 62 dimensions.
  • Color Features
  • Scalable Color Descriptor (SCD)
  • Color Histogram. 256, 128, 64 or 32 dimensions
  • Dominant Color Descriptor (DCD)
  • Up to 8 dominant colors (3d color-space) and
    their percentages
  • 32 dimensions
  • Bins defined for each image

8
Collateral Text as an additional Feature
  • ALT Tag and Collateral Text around images
  • VERY uncontrolled annotation
  • Stemming Porter Stemmer
  • Example training -gt train
  • More matching terms for boolean queries
  • But also some new ambiguities
  • train to train verb / the train noun

9
Retrieval in 2 Steps
1. Text Retrieval
2. Visual Nearest Neighbor Search
10
Retrieval Text
  • Options
  • Boolean query on inverted index
  • Vector Space Model
  • LSI etc.
  • Choice
  • Ranked boolean queries on inverted index
  • Ranking tfidf
  • Reasons
  • Speed
  • Sparsity of data
  • 600 000 Keywords in total
  • 1 document 10-50 words

Keyword ImageId tf
shoe 1233 1
sport 1233 1
red 1233 1
banana 1234 1
fruit 1234 2
Order 1234 1
Keyid ImageId tf
124 1233 1
341 1233 1
345 1233 1
445 1234 1
75 1234 2
875 1234 1
11
Retrieval Visual Features (MPEG-7)
  • K-Nearest Neighbor search (K-NN)
  • Find K closest candidates ci to query image q in
    a vector space
  • Distance Minkowsky Metrics for distance d(ci,q)
    namely L1 and L2 norms
  • Most MPEG-7 descriptors are high-dimensional
    vectors
  • The dimensionality curse applies
  • High dimensional spaces behave weirdly
  • In particular the distances are not too
    meaningful

12
Retrieval Challenges for Visual Features
  • We have several (visual) feature types How can
    we combine them?
  • Our database is very large.How can we search it
    fast enough?
  • i.e. how can we avoid comparing the query vector
    with each database entry?

13
A Combined Distance for the MPEG-7 Features
  • We use a combined distance of all the visual
    feature types
  • The individual distances occupy different ranges
    in different distributions
  • The distributions were transformed to a normal
    distribution in the range 0,1
  • The distances are then combined linearly

14
Clustering speeds up the search
  • Problem
  • Millions of items in DB
  • Linear search over the whole dataset too slow
  • Looking only for the K nearest neighbors anyway
  • (One) Solution
  • Partition the data into Clusters, identified by
    representative, the centroid
  • Only search the cluster whose centroid is closest
    to query q
  • K-Means clustering algorithm
  • Not the best, in particular in HD spaces
  • But fast!
  • Problem with Clustering
  • Query at the border of a cell does not find all
    the nearest neighbors
  • Simple Solution Overlapping Clusters
  • Problem Scalability
  • Original data 7GB
  • Overlapping data 50 GB

Imageid Primary Descriptor Secondary Descriptor 1 Secondary Descriptor 2 Secondary Descriptor 3
122 ehd htd scd dcd
45233 ehd htd scd dcd
6688 ehd htd scd dcd
15
Relevance Feedback Improves the Results
  • Relevance feedback User input to improve search
    results - iteration by iteration
  • i.e. the user selects good matches
  • We obtain the following information
  1. A new query vector which is a combination of the
    relevant images Query Vector Movement
  2. The ratios for the combination of the feature
    types

16
Relevance Feedback Query Vector Movement
  • Construct the query vector qn of images selected
    in iteration n
  • Vector component kFeature type f
    (EHD,SCD,HTD)i1...M relevant images
  • The final, new query vector is
  • q 0.75 qn 0.25 qn-1
  • i.e. move from the old query vector towards the
    new vector

17
Relevance Feedback Weight Adapation
  • Which feature is most important for the given
    query?
  • The one for which all the relevant images are
    closest
  • Determine the ratios for the combination based on
    the average distance, e.g. for the EHD
  • and set

18
Implementation Software and Hardware
  • Languages C and Perl
  • InlineCPP to connect Layers
  • WWW Apache and CGI
  • Relational DB mySQL
  • Operating System OS X
  • Hardware
  • Dual 2 GHZ Apple G5, 2GB RAM
  • Teran Terrabyte Disk Array

19
Part II The Semantics Within
20
Semantics Combining Text and Visual Features
  • Our dataset is multi-modal
  • Keywords and several visual features
  • Not only valid for WWW data
  • Video imagespeech,
  • Bio-imagery imagemicroscope setting, cell
    coloring fluid
  • Goal Try to jointly use the different modes
  • Do semantic relations between the modes exist?
  • Learn something about these semantic relations
  • Improve the retrieval precision based on them
  • Challenges in our project
  • Large-scale
  • Noisy and uncontrolled data
  • Only global visual features

21
Identifying a Method to find the Semantics
  • Related work
  • Latent Semantic Indexing (LSI) Westerveld 2000
  • problem O(N2m3), NDocumentsTerms, mconcept
    space
  • Statistical models Barnard, Forsyth 2001-2004
  • Problem O several hours for several thousand
    images
  • Problem It is a (rather strict, hierarchical)
    model
  • Others
  • Neural networks (SOM etc.)
  • Hidden Markov Models
  • Often Classification
  • We dont know our classes, or there are just too
    many
  • We cant train them either (data too diverse and
    noisy)
  • Most of the methods above only tested on
    relatively small, supervised datasets
  • There is one more option

22
Method Data Mining for Semantic Clues
  • Mine the data for patterns
  • Find them only where they exist
  • Deduce Rules from them
  • Scalable methods available
  • Frequent Itemset Mining and Association Rules
  • Classic Application Market baskets, Census data
  • Some works on Multimedia data
  • Zaïane 98 Datacubes with appended keywords
  • TeÅ¡ic et al. 03 Perceptual associations
    (texture) within images

23
Frequent Itemsets and Association Rules
  • Itemset I
  • Transaction T
  • Database D
  • Support of Itemset A
  • A is called frequent if
  • Rule
  • Support of a Rule
  • Statistical significance
  • Confidence of a Rule
  • Strength of implication
  • Maximum likelihood estimate that Bis true given
    that A is true

24
Example Advantages
  • Example Market Baskets
  • Rule Diaper,Milk?Beer
  • Advantages
  • Human readable
  • Can be edited
  • Fast Algorithms available
  • Note Associations are not correlations
  • The same concept, just simpler
  • Associations and correlations Brin, Motwani,
    Silverstein 98

TID Items
1 Bread, Milk
2 Beer, Diaper, Bread, Eggs
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Bread, Diaper, Milk
25
Using FIMI to find the itemsets
  • Frequent Itemset Mining (FIMI)
  • Find frequent itemsets with support gt minsupp
  • Minimal support minsupp given by an expert
  • First Algorithm APriori Agrawal et al. 93
  • Basic Idea If an itemset is frequent, all its
    subsets must be frequent (Monotonicity)
  • k-passes over dataset for itemsets of length k
  • O(knp) n transactions, p items, itemsets of
    length k
  • Todays algorithms
  • Rely on the same basic principle
  • But much faster (Main Reason Data structures)
  • Usually only 2 database passes
  • linear runtime
  • State-of-the-art algorithm overview FIMI03
  • We used fpmax Grahne, Zhu Nov 03

26
Diapers and Beer !!?
  • Application to the domain of Multimedia data
  • Formulate images as transactions
  • Low-level clusters serve as a dimensionality
    reduction for the visual features
  • We find associations of visual features
    (clusters) and keywords
  • From theses associations we deduce semantic rules
  • Advantages
  • Comparably low computational complexity
  • Other data sources can be integrated in the same
    manner (e.g. long-term relevance feedback)
  • Challenges
  • Noisy, uncontrolled data
  • Associations within keywords much stronger than
    associations between keywords and visual features
  • Uneven distribution of cluster sizes (K-Means
    problem)

27
Characteristics of the Itemsets and Rules
  • There are associations
  • Within text shoe ? walk
  • Within visual clusters EHD 14 ? SCD 12
  • Between text and visual clusters shoe ? EHD
    14
  • Measure for interestingness or choice of rules
    from FI
  • Confidence?
  • Statistical Criteria?
  • Background Knowledge? (Example pregnant -gt
    Woman 100 confidence)
  • Our Background Knowledge Rules that connect
    keywords and low-level features are more
    interesting
  • Since this is known, the mining can be adapted
    and made even faster

28
Exploiting the Itemsets and Rules
29
Selecting Interesting Low-Level Clusters based on
Rules
  • Clusters were introduced to partition the visual
    feature vector data and search only on certain
    clusters
  • Problem We miss certain nearest neighbors if
    images for a concept are spread over several
    clusters
  • Unsatisfactory Solution Overlapping Clusters
  • But association rules might find and solve this
    situation
  • Clusters are re-united
  • If number of images for concept in both clusters
    is gtminsupp
  • Example shirt -gt ehd249,ehd310 reunites
    these clusters for the initial keyword-query
    shirt!
  • This is scalable - unlike overlapping clusters
  • Another benefit is that more images labeled with
    the original keyword are injected into the
    results of K-NN search
  • Currently One Keyword as high level semantic
    concept
  • Future Find high level semantic concepts by
    mining associations within text first

30
The Visual Link
  • Another contribution, NOT related to Frequent
    Itemset Mining and Association Rules
  • Since search-concept suggests visual nearest
    neighbor search with relevance feedback after
    intitial keyword search
  • It would be nice to have a diverse selection of
    images for a given keyword on the first page of
    results
  • Images sorted not only by keyword ranking, but
    also based on visual feature information
  • Basic idea For a given keyword query, build
    groups of images that are visually close.
  • Larger groups are more important
  • Show only one representative per group

31
The Visual Link A Graph-Based Approach
  • Let I(Q) be a set of images matching a keyword
    query Q
  • Define a graph G(V,E)
  • i.e. images are visually linked if the distance
    between them is lower than a given threshold
  • Do a connected component analysis to find
    connected components C
  • For each component C find the best
    representative rC
  • Re-rank results based on representatives rC

32
The Visual Link An Example
33
The Visual Link An Approximation
  • Problem Distance calculations for graph take too
    long
  • Clusters cannot be used
  • Loading individual vectors takes a lot of time
  • Solution
  • Approximate distance
  • Idea If images in the same cluster and same
    distance range to the centroid ? Probability that
    they are close is high
  • New definition for visually linked
  • If in same cluster and same range of relative
    distance to its centroid
  • Can be encoded in relational DB! And comes at
    nearly no extra cost in creation

Imageid Clusterid 2ndClusterid Reldist
1 221 122 0.6
2 342 345 0.8
3 223 42 0.2
4 12 126 0.4
34
Discussion Demo
35
Discussion Precision
  • Measuring the quality of such a large-scale
    system is difficult
  • Precision/Recall measure not possible ground
    truth not known
  • C correct results
  • D Desired results
  • A Actual results
  • We measure the precision based on user questioning

36
Before we continue some numbers
  • Number of Images 3 006 660
  • Size of Image data 111 GB
  • Feature Extraction 15 days (dual 2Ghz CPU, 2GB
    RAM)
  • Number of distinct keywords 680 256
  • Size of inverted keyword index table 50 260 345
    lines
  • MySQL database size 23 GB

37
And now the moment youve all been waiting for
  • The Demo of Cortina

38
Conclusions
  • A system with over 3 Million items was
    implemented
  • Probably the largest CBIR System to date?
  • A retrieval concept was introduced
  • a keyword query followed by relevance feedback
    and visual nearest neighbour search
  • Superior to existing retrieval concepts (query by
    keyword or query by example)
  • Data mining to explore and exploit semantics in
    large-scale systems was introduced

39
Questions
40
Outlook
  • Many extensions and improvements possible
  • Segmentation
  • Or maybe rather some simple tiling
  • Indexing
  • K-Means should be replaced
  • Suggestion VA-File based approach
    Manjunath,Tesic 03
  • Association Rule Mining
  • Multilevel Approach
  • First keywords for high level semantic concepts
  • Then visual features

41
Thanks
  • Ullrich Moenich and Lars Thiele

42
Which Rules are of Interest?
  • There are associations
  • Within text shoe ? walk
  • Within visual clusters EHD 14 ? SCD 12
  • Between text and visual clusters shoe ? EHD
    14, SCD 12
  • There are long and short rules
  • Short rules have higher support by the nature of
    the problem
  • Long rules contain more (precise) information
    about the semantics
  • Measure for interestingness or choice of rules
    from FI
  • Confidence?
  • Statistical Criteria?
  • Background Knowledge? (Example pregnant Woman )

43
Characteristics and Challenges
  • Chosen criteria
  • Mainly interested in rules keywords ? visual
    feature clusters. (Our Background Knowledge)
  • Support, confidence
  • Mine long and short rules
  • Restriction of the problem Mine for frequent
    itemsets per keyword
  • i.e. all imagestransactions for a given keyword
  • This means
  • We avoid being distracted by associations within
    keywords
  • The method is made even more scalable
  • The keyword as a placeholder for a semantic
    concept
  • A keyword does not always stand for a single
    semantic concept
  • Proposal for future versions Multi-Level
    approach
  • First keywords ? keywords rules to identify
    real semantic concepts
  • Then itemset mining per identified concept

44
Characteristics of the Itemsets and Rules -
Overall
45
Why keyword filtering of the results does not work
46
Proposal Semantic Clusters
  • Ultimate goal Search some kind of Semantic
    Clusters instead of visual feature clusters
  • Proposal based on approach from Ester et al.
    2002, 2003
  • Clustering based on frequent itemsets, originally
    for text
  • Clustering criterion minimize overlap
Write a Comment
User Comments (0)
About PowerShow.com