Video Indexing and Retrieval using an MPEG7 Based Inference Network

1 / 36
About This Presentation
Title:

Video Indexing and Retrieval using an MPEG7 Based Inference Network

Description:

Sybil warns Basil not to bet. Basil says Sybil is a dragon ... Who Basil,Sybil,Major,Polly /Who Where Lobby /Where /StructuredAnnotation /TextAnnotation ... –

Number of Views:209
Avg rating:3.0/5.0
Slides: 37
Provided by: andrew493
Category:

less

Transcript and Presenter's Notes

Title: Video Indexing and Retrieval using an MPEG7 Based Inference Network


1
Video Indexing and Retrieval using an MPEG7 Based
Inference Network
  • Andrew Gravesandrew_at_dcs.qmul.ac.uk

2
Contents
  • Introduction
  • Aims
  • Background, Positioning
  • MPEG7 Collection
  • The Model
  • Experiments
  • Concluding Remarks

3
Introduction
4
Project Aims
  • Metadata based retrieval using MPEG7
  • Assume we have the metadata
  • Build a modular retrieval systemVideo analysis
    -gt MPEG7 -gt Video retrieval
  • Exploit MPEG7 structure, context and concepts

5
Background
  • Information Retrieval
  • IR Models Inference Network model
  • Text retrieval Indexing Retrieval Term
    Statistics
  • Structured Information Retrieval at QM Dortmund
  • Multimedia
  • MPEG 1/2/4/7
  • MPEG7, Multimedia Content Description Interface
  • Video Indexing and Retrieval
  • Annotation, Content, Metadata based approaches
  • Assume we have the Metadata Shot/Scene
    detection
  • Feature extraction / Acquisition of Semantics

6
MPEG7
  • Description Definition Language (DDL),
    Descriptor (D) and Description Schemes (DS)
  • Just another XML format

7
Inference Network Model
  • Probabilistic Framework for IR that uses a
    Bayesian Network (so based on proven statistical
    theory)
  • Complete Network Document Network
    QueryNetwork Attachment Evaluation
  • Complete Network used to estimate the
    probability of relevance for each Document Node

8
Positioning
Model Inference Network MPEG 7
  • Inference Network
  • Allows a combination of evidence
  • Allows hierarchical document nodes (structure)
  • MPEG7
  • Structural, conceptual contextual info
  • So, we process DSs and Ds to form IN

9
In Other Words...
  • Build a Document Network that represents all of
    the Ds (concepts) and DSs (structure)
  • Attach a Query Network and evaluate

10
MPEG7 Collection
11
Collection
  • 3 MPEG1 files generated
  • 2 comedies (Fawlty Towers)
  • 1 film (A Room With A View)
  • MPEG7 files then generated
  • Automatic shot detection
  • Shots manually grouped to form scenes

12
Annotations
  • Abstract from box
  • StructuredAnnotation for each scene to specify
    exactly participants and location
  • FreeTextAnnotation to describe action
  • FreeTextAnnotation with speech extracts

13
MPEG7 Excerpt 1
  • ltAudioVisual id"Communication Problems"gt
  • ltMediaInformation/gt
  • ltMediaProfile/gt
  • ltCreationInformationgt
  • ltCreationgt
  • ltTitlegtCommunication Problems
  • lt/Titlegt
  • ltAbstractgt
  • ltFreeTextAnnotationgt
  • It's not a wise man who entrusts his furtive
    winnings on the horses to a geriatric Major, but
    Basil bas never known for that quality. Parting
    with those ill gotten gains was Basil's first
    mistake his second was to tangle with the
    intermittently deaf Mrs Richards.
  • lt/FreeTextAnnotationgt
  • lt/Abstractgt
  • ltCreatorgtBBClt/Creatorgt
  • lt/Creationgt
  • ltClassificationgt
  • ltGenregtComedylt/Genregt
  • ltLanguagegtEnglishlt/Languagegt
  • lt/Classificationgt
  • lt/CreationInformationgt

Excerpt shows concepts pertaining to the whole
video
14
MPEG7 Excerpt 2
  • ltSegmentDecomposition decompositionType"temporal
    " gap"true" id"TableOfContent" overlap"false"gt
  • ltSegment id"A satisfied customer"
    xsitype"AudioVisualSegmentType"gt
  • ltTextAnnotationgt
  • ltFreeTextAnnotationgtBasil
    receives a tip on a horse from a customer. Sybil
    warns Basil not to bet. Basil says Sybil is a
    dragon to Polly.lt/FreeTextAnnotationgt
  • ltStructuredAnnotationgt
  • ltWhogtBasil,Sybil,Major,Pol
    lylt/Whogt
  • ltWheregtLobbylt/Wheregt
  • lt/StructuredAnnotationgt
  • lt/TextAnnotationgt
  • ltSegmentDecomposition
    decompositionType"temporal" gap"true"
    overlap"false"gt
  • ltSegment id"Shot_1"
    xsitype"AudioVisualSegmentType"gt
  • ltTextAnnotationgtltFreeTextA
    nnotationgt
  • Glad you enjoyed it.
    Polly will you get Mr Firkins bill please.
  • lt/FreeTextAnnotationgtlt/Tex
    tAnnotationgt
  • ltMediaTimegtltMediaIncrDurat
    ion timeUnit"PT1N25F"gt86lt/MediaIncrDurationgt
  • lt/MediaTimegt
  • lt/Segmentgt
  • lt/SegmentDecompositiongt
  • ltMediaTimegt

Excerpt shows video decomposition distribution
of evidence (ie concepts) amongst MPEG7 file
15
Model
16
Model Overview
  • Document Network (built during indexing)
  • Static, contains information about the collection
  • Query Network (built during retrieval)
  • Query Language based upon INQUERY
  • Statistical operators (and approximations of
    Boolean)
  • Attachment process
  • Builds the Complete Network
  • Create DN-gtQN links where concepts are the same
  • Evaluation process
  • Calculate probability of relevance for each
    element

17
Document Network
Document Nodes Context Nodes Concept Nodes
  • Document Node layer. Created from MPEG7
    structural aspects
  • Context Node layer. Provides contextual
    information
  • Concept Node layer. Contains all the contents
    present in collection

18
Query Network 1
  • Query text is parsed to produce Query tree
  • Inverted DAG with a single final node
  • Terms Operators
  • Boolean Operators and or not
  • Statistical Operators sum wsum max
  • Constraints constraint tree

and("BBC" "Basil")   constraint(Creation,
"BBC")   and(constraint(Creation, "BBC")
"Basil")
19
Query Network 2
  • No Simple Complex constraints
  • constraint and tree

and(tree(CreationInformation,
constraint(Classification/Genre,"comedy",
constraint(Creation,"BBC") basil)
  • Trying to exploit the contextual information

20
Attachment
  • Attachment creates DN-gtQN links (at concept
    level)
  • Find candidate links then consider constraints
  • Strength of link can be determined by closeness
    of match
  • Perform Tree Matching to find Edit Distance
    (ED)
  • Use ED by a) testing against threshold, b) reduce
    weight

21
Evaluation
  • After attachment we have formed the Complete
    Network
  • This is evaluated for every Document Node and
    resultant probabilities are used for ranking
  • All nodes required are evaluated using 1) Value
    of parents nodes 2) Conditional probabilities
  • Nodes may inherit parental contexts (Link
    Inheritance)
  • The parents outside the constraint may be ignored
    (Path Cropping)

22
Extraction
  • Structural Extraction. About the hierarchical
    makeup.
  • Attribute Extraction. Data about the structural
    elements.
  • Concept Extraction. Obtain the concepts that
    appear.
  • Text preprocessing
  • Luhns Analysis, Term Statistics

ltcontext attribute"value"gt free
text ltsubcontextgt more free text lt/subcontextgt
lt/contextgt
ltTextAnnotationgt ltFreeTextAnnotationgt
Basil attempts to fix the car
lt/FreeTextAnnotationgt ltStructuredAnnotationgt
ltWhogtBasillt/Whogt ltWhatObjectgtCarlt/WhatOb
jectgt ltWhatActiongtFixlt/WhatActiongt
ltWheregtCarparklt/Wheregt lt/StructuredAnnotationgt
lt/TextAnnotationgt
23
Probability Estimation
  • Probability document is relevant to the query
  • Conditional probabilities between the nodes
  • Context-gtContext (eg Video-gtScene)
  • Context-gtConcept
  • We use
  • Term Statistics
  • Number of Siblings
  • Duration ratios
  • Alternatives
  • Size of context
  • Number of concepts
  • Number of occurrences
  • Perceived goodness

24
Experiments
25
Experiment Overview
  • Software written in C NT
  • Not using INQUERY
  • 1. Basic. Does the model work at all?
  • 2. Real Data. Does the model work with our real
    metadata collection?
  • 3. Metrics. What are the precision/recall metrics?

26
Remember...
  • Link Inheritance (LI)
  • Link Degradation (LID)
  • Tree Matching (TM)
  • Threshold (TMT) The attachment is made only if
    the constraint is met, and if the Edit Distance
    is below the specified threshold.
  • Weighted (TMW) The attachment is made if the
    constraint is met. The Edit Distance is used as
    a weight upon the DN-gtQN link.
  • Path Cropping (PC)

27
Representations
We use XML for DN and QN representation
  • ltRootgt
  • ltOperator Type"WSUM"gt
  • ltConcept weight"0.2"gtbreakfast
    lt/Conceptgt
  • ltConcept weight"0.8"gtview lt/Conceptgt
  • lt/Operatorgt
  • lt/Rootgt
  • ltRootgt
  • ltOperator Type"AND"gt
  • ltConceptgt
  • ltTextgtBBClt/Textgt
  • ltConstraintgtCreation
    lt/Constraintgt
  • lt/Conceptgt
  • ltConceptgtBasillt/Conceptgt
  • lt/Operatorgt
  • lt/Rootgt

28
Experiment 1
  • ltRootgt
  • ltVideo id'Video1' Duration'1000'
    weight'1.000000'gt
  • ltCreationInformation weight'0.750000'gt
  • ltCreation weight'1.000000'gt
  • ltConcept weight'0.8' cid'1'gtbananalt/Conc
    eptgt
  • lt/Creationgt
  • lt/CreationInformationgt
  • ltMediaInformation weight'0.750000'/gt
  • ltScene id'Scene1' KeyFrame'none.jpg'
    Duration'400' weight'0.700000'gt
  • ltShot id'Shot1' KeyFrame'none.jpg'
    Duration'100' weight'0.625000'/gt
  • ltShot id'Shot2' KeyFrame'none.jpg'
    Duration'300' weight'0.875000'gt
  • ltConcept weight'0.7' cid'1'gtbananalt/Conc
    eptgt
  • lt/Shotgt
  • lt/Scenegt
  • ltScene id'Scene2' Duration'600'
    weight'0.800000'/gt
  • lt/Videogt
  • ltVideo id'Video2' weight'1.000000'/gt
  • ltVideo id'Video3' weight'1.000000'/gt
  • lt/Rootgt

Remember probabilities from duration ratio and
number of context siblings
Two influences1. Creation-gtbanana2.
Shot2-gtbanana
29
Experiment 1
1 No Parameters 001 0.364000 Video
Video1 002 0.245000 Shot Shot2 003 0.227500
Scene Scene1 004 0.105000 Shot Shot1 004
0.105000 Scene Scene2 004 0.105000 Video
Video2 004 0.105000 Video Video3
7 LI LID TWM 001 0.288750 Shot Shot2
002 0.280313 Scene Scene1 003 0.273000
Video Video1 004 0.129375 Scene Scene2 005
0.123750 Shot Shot1 006 0.078750 Video
Video2 006 0.078750 Video Video3
  • Model works
  • Different levels of document granularity
    (Video/Scene/Shot) retrieved in same list
  • Parameters work but unclear if they help

30
Experiment 2
or("Mushroom" "Mushrooms")
or(chips" and("salad" "cream"))
constraint(Classification, "Comedy")
  • Model worked with real collection to produce real
    results
  • Results were as expected given knowledge of
    material

31
Experiment 3
  • Recall/Precision metrics calculated
  • Rank in results list (not result rank) used for
    analysis
  • Ten best Video/Scene/Shot chosen by author
  • Ranking seems good
  • 6/10 required in top 10
  • All 10 within top 93 (out of 362 in total)
  • Figures suggest that the model working
    effectively although this is not conclusive

32
Discussion
  • Size of collection too small to produce
    significant results. No known MPEG7 collections.
  • No independent queries with relevance assessments
    exist (obviously)
  • Software efficiency crucial - simplifying
    assumptions can be made to ensure that the IN is
    computationally viable. Size of computation is
    not proportional to size of collection.

33
Concluding Remarks
34
Concluding Remarks
  • MPEG7 was found to contain useful tools
  • Model for VIR developed
  • Based on Inference Network, Built from MPEG7
    files
  • Indexing captures structure, context and concepts
  • Retrieval done using Terms, Operators and
    Constraints
  • Model parameters devised
  • Results suggest that approach taken well founded
    although lack of data is problematic

35
Next...
  • Build an independent MPEG7 collection with
    relevance assessments etc!
  • Automatic methods for generating metadata
  • Eliminate bias, Increase consistency, Improve
    quality
  • Feature extraction etc. to produce Simple
    Semantics
  • Solve the Semantic Gap issue
  • Build metadata based models that exploit
    contextual information
  • Assume contextual information can help retrieval
  • Assume we have good metadata
  • Efficiency of the evaluation vital

36
The End
Write a Comment
User Comments (0)
About PowerShow.com