Experiments on Using Semantic Distances Between Words in Image Caption Retrieval

About This Presentation
Title:

Experiments on Using Semantic Distances Between Words in Image Caption Retrieval

Description:

Experiments on Using Semantic Distances Between Words in Image Caption Retrieval ... different words describing same thing ('stomach pain' 'belly ache' ... –

Number of Views:105
Avg rating:3.0/5.0
Slides: 10
Provided by: ADy5
Category:

less

Transcript and Presenter's Notes

Title: Experiments on Using Semantic Distances Between Words in Image Caption Retrieval


1
Experiments on Using Semantic Distances Between
Words in Image Caption Retrieval
Alan F. Smeaton and Ian Quigley School of
Computer Applications Dublin City University
  • Presenter Cosmin Adrian Bejan

2
IR implementation - traditional approach
  • Represent
  • a user query a bag of query terms
  • document a bag of index terms
  • Compute
  • a degree of similarity between a document and a
    query based on the overlap or number of query
    terms in common between them.

3
Problems in IR implementation
  • caused by
  • same words describing different things (bar,
    bank)
  • different words describing same thing (stomach
    pain belly ache)
  • natural language is fraught with ambiguities at
    all levels leading to multiple interpretations of
    words, phrases, etc.
  • Common way to address these problems query
    expansion
  • The approach in this paper when computing the
    degree of similarity between query and document
    instead of basing similarity on the terms in
    common between the two incorporate a quantitative
    measure of the semantic similarity between index
    terms into the measure.

4
Measuring semantic distance between words




































































  • knowledge base hierarchical concept graphs
    (HCGs) automatically constructed from WordNet
  • The similarity of two classes or synsets
  • Computing the similarity between two word senses
    (nouns) can only be done if both are in the same
    HCG, otherwise they are regarded as being
    dissimilar.

information content of the class ci
P(ci) the class probability of class ci
5
Experimental Set-up
  • Hand-caption 2714 images
  • Manually disambiguate polysemous words in caption
  • Manually built a collection of 60 queries
  • Compute various query-caption similarity measure
    using word-word semantic distances.

6
Retrieval Strategies 1-2
  • Notation
  • query Qq1, q1, qm.
  • caption Cc1, c1 cn where a qi or a cj is the
    original term used only as a representation for
    its synset.
  • Sim(ti, tj) is the similarity between the
    sense-disambiguated form of two terms ti and tj.
  • Run1
  • Run2

straightforward statistically-based tfIDF match
between the word forms or strings, i.e. not using
word sense disambiguated captions or queries.
where terms in caption in query are both expanded
to include other word strings from their sense
disambi-guated sysnsets (query expansion).
7
Retrieval Strategies 3-5
  • Run3
  • Run4
  • Run5

when considering different threshold values for
each HCG, given that there is a concentration of
usage of concepts from some HCGs (like entity)
and hardly any use of others (like shape).
8
Retrieval Strategies 6-8
  • Run6
  • Run7
  • Run8

9
Experimental Results
Write a Comment
User Comments (0)
About PowerShow.com