Video Indexing and Retrieval using an MPEG7 Based Inference Network

1 / 36

About This Presentation

Title:

Video Indexing and Retrieval using an MPEG7 Based Inference Network

Description:

Sybil warns Basil not to bet. Basil says Sybil is a dragon ... Who Basil,Sybil,Major,Polly /Who Where Lobby /Where /StructuredAnnotation /TextAnnotation ... –

Number of Views:209

Avg rating:3.0/5.0

Slides: 37

Provided by: andrew493

Category:

more less

Transcript and Presenter's Notes

Title: Video Indexing and Retrieval using an MPEG7 Based Inference Network

1
Video Indexing and Retrieval using an MPEG7 Based
Inference Network

Andrew Gravesandrew_at_dcs.qmul.ac.uk

2
Contents

Introduction
Aims
Background, Positioning
MPEG7 Collection
The Model
Experiments
Concluding Remarks

3
Introduction
4
Project Aims

Metadata based retrieval using MPEG7
Assume we have the metadata
Build a modular retrieval systemVideo analysis
-gt MPEG7 -gt Video retrieval
Exploit MPEG7 structure, context and concepts

5
Background

Information Retrieval
IR Models Inference Network model
Text retrieval Indexing Retrieval Term
Statistics
Structured Information Retrieval at QM Dortmund
Multimedia
MPEG 1/2/4/7
MPEG7, Multimedia Content Description Interface
Video Indexing and Retrieval
Annotation, Content, Metadata based approaches
Assume we have the Metadata Shot/Scene
detection
Feature extraction / Acquisition of Semantics

6
MPEG7

Description Definition Language (DDL),
Descriptor (D) and Description Schemes (DS)
Just another XML format

7
Inference Network Model

Probabilistic Framework for IR that uses a
Bayesian Network (so based on proven statistical
theory)
Complete Network Document Network
QueryNetwork Attachment Evaluation
Complete Network used to estimate the
probability of relevance for each Document Node

8
Positioning
Model Inference Network MPEG 7

Inference Network
Allows a combination of evidence
Allows hierarchical document nodes (structure)
MPEG7
Structural, conceptual contextual info
So, we process DSs and Ds to form IN

9
In Other Words...

Build a Document Network that represents all of
the Ds (concepts) and DSs (structure)
Attach a Query Network and evaluate

10
MPEG7 Collection
11
Collection

3 MPEG1 files generated
2 comedies (Fawlty Towers)
1 film (A Room With A View)
MPEG7 files then generated
Automatic shot detection
Shots manually grouped to form scenes

12
Annotations

Abstract from box
StructuredAnnotation for each scene to specify
exactly participants and location
FreeTextAnnotation to describe action
FreeTextAnnotation with speech extracts

13
MPEG7 Excerpt 1

ltAudioVisual id"Communication Problems"gt
ltMediaInformation/gt
ltMediaProfile/gt
ltCreationInformationgt
ltCreationgt
ltTitlegtCommunication Problems
lt/Titlegt
ltAbstractgt
ltFreeTextAnnotationgt
It's not a wise man who entrusts his furtive
winnings on the horses to a geriatric Major, but
Basil bas never known for that quality. Parting
with those ill gotten gains was Basil's first
mistake his second was to tangle with the
intermittently deaf Mrs Richards.
lt/FreeTextAnnotationgt
lt/Abstractgt
ltCreatorgtBBClt/Creatorgt
lt/Creationgt
ltClassificationgt
ltGenregtComedylt/Genregt
ltLanguagegtEnglishlt/Languagegt
lt/Classificationgt
lt/CreationInformationgt

Excerpt shows concepts pertaining to the whole
video
14
MPEG7 Excerpt 2

ltSegmentDecomposition decompositionType"temporal
" gap"true" id"TableOfContent" overlap"false"gt
ltSegment id"A satisfied customer"
xsitype"AudioVisualSegmentType"gt
ltTextAnnotationgt
ltFreeTextAnnotationgtBasil
receives a tip on a horse from a customer. Sybil
warns Basil not to bet. Basil says Sybil is a
dragon to Polly.lt/FreeTextAnnotationgt
ltStructuredAnnotationgt
ltWhogtBasil,Sybil,Major,Pol
lylt/Whogt
ltWheregtLobbylt/Wheregt
lt/StructuredAnnotationgt
lt/TextAnnotationgt
ltSegmentDecomposition
decompositionType"temporal" gap"true"
overlap"false"gt
ltSegment id"Shot_1"
xsitype"AudioVisualSegmentType"gt
ltTextAnnotationgtltFreeTextA
nnotationgt
Glad you enjoyed it.
Polly will you get Mr Firkins bill please.
lt/FreeTextAnnotationgtlt/Tex
tAnnotationgt
ltMediaTimegtltMediaIncrDurat
ion timeUnit"PT1N25F"gt86lt/MediaIncrDurationgt
lt/MediaTimegt
lt/Segmentgt
lt/SegmentDecompositiongt
ltMediaTimegt

Excerpt shows video decomposition distribution
of evidence (ie concepts) amongst MPEG7 file
15
Model
16
Model Overview

Document Network (built during indexing)
Static, contains information about the collection
Query Network (built during retrieval)
Query Language based upon INQUERY
Statistical operators (and approximations of
Boolean)
Attachment process
Builds the Complete Network
Create DN-gtQN links where concepts are the same
Evaluation process
Calculate probability of relevance for each
element

17
Document Network
Document Nodes Context Nodes Concept Nodes

Document Node layer. Created from MPEG7
structural aspects
Context Node layer. Provides contextual
information
Concept Node layer. Contains all the contents
present in collection

18
Query Network 1

Query text is parsed to produce Query tree
Inverted DAG with a single final node
Terms Operators
Boolean Operators and or not
Statistical Operators sum wsum max
Constraints constraint tree

and("BBC" "Basil") constraint(Creation,
"BBC") and(constraint(Creation, "BBC")
"Basil")
19
Query Network 2

No Simple Complex constraints
constraint and tree

and(tree(CreationInformation,
constraint(Classification/Genre,"comedy",
constraint(Creation,"BBC") basil)

Trying to exploit the contextual information

20
Attachment

Attachment creates DN-gtQN links (at concept
level)
Find candidate links then consider constraints
Strength of link can be determined by closeness
of match
Perform Tree Matching to find Edit Distance
(ED)
Use ED by a) testing against threshold, b) reduce
weight

21
Evaluation

After attachment we have formed the Complete
Network
This is evaluated for every Document Node and
resultant probabilities are used for ranking
All nodes required are evaluated using 1) Value
of parents nodes 2) Conditional probabilities
Nodes may inherit parental contexts (Link
Inheritance)
The parents outside the constraint may be ignored
(Path Cropping)

22
Extraction

Structural Extraction. About the hierarchical
makeup.
Attribute Extraction. Data about the structural
elements.
Concept Extraction. Obtain the concepts that
appear.
Text preprocessing
Luhns Analysis, Term Statistics

ltcontext attribute"value"gt free
text ltsubcontextgt more free text lt/subcontextgt
lt/contextgt
ltTextAnnotationgt ltFreeTextAnnotationgt
Basil attempts to fix the car
lt/FreeTextAnnotationgt ltStructuredAnnotationgt
ltWhogtBasillt/Whogt ltWhatObjectgtCarlt/WhatOb
jectgt ltWhatActiongtFixlt/WhatActiongt
ltWheregtCarparklt/Wheregt lt/StructuredAnnotationgt
lt/TextAnnotationgt
23
Probability Estimation

Probability document is relevant to the query
Conditional probabilities between the nodes
Context-gtContext (eg Video-gtScene)
Context-gtConcept

We use
Term Statistics
Number of Siblings
Duration ratios

Alternatives
Size of context
Number of concepts
Number of occurrences
Perceived goodness

24
Experiments
25
Experiment Overview

Software written in C NT
Not using INQUERY
1. Basic. Does the model work at all?
2. Real Data. Does the model work with our real
metadata collection?
3. Metrics. What are the precision/recall metrics?

26
Remember...

Link Inheritance (LI)
Link Degradation (LID)
Tree Matching (TM)
Threshold (TMT) The attachment is made only if
the constraint is met, and if the Edit Distance
is below the specified threshold.
Weighted (TMW) The attachment is made if the
constraint is met. The Edit Distance is used as
a weight upon the DN-gtQN link.
Path Cropping (PC)

27
Representations
We use XML for DN and QN representation

ltRootgt
ltOperator Type"WSUM"gt
ltConcept weight"0.2"gtbreakfast
lt/Conceptgt
ltConcept weight"0.8"gtview lt/Conceptgt
lt/Operatorgt
lt/Rootgt
ltRootgt
ltOperator Type"AND"gt
ltConceptgt
ltTextgtBBClt/Textgt
ltConstraintgtCreation
lt/Constraintgt
lt/Conceptgt
ltConceptgtBasillt/Conceptgt
lt/Operatorgt
lt/Rootgt

28
Experiment 1

ltRootgt
ltVideo id'Video1' Duration'1000'
weight'1.000000'gt
ltCreationInformation weight'0.750000'gt
ltCreation weight'1.000000'gt
ltConcept weight'0.8' cid'1'gtbananalt/Conc
eptgt
lt/Creationgt
lt/CreationInformationgt
ltMediaInformation weight'0.750000'/gt
ltScene id'Scene1' KeyFrame'none.jpg'
Duration'400' weight'0.700000'gt
ltShot id'Shot1' KeyFrame'none.jpg'
Duration'100' weight'0.625000'/gt
ltShot id'Shot2' KeyFrame'none.jpg'
Duration'300' weight'0.875000'gt
ltConcept weight'0.7' cid'1'gtbananalt/Conc
eptgt
lt/Shotgt
lt/Scenegt
ltScene id'Scene2' Duration'600'
weight'0.800000'/gt
lt/Videogt
ltVideo id'Video2' weight'1.000000'/gt
ltVideo id'Video3' weight'1.000000'/gt
lt/Rootgt

Remember probabilities from duration ratio and
number of context siblings
Two influences1. Creation-gtbanana2.
Shot2-gtbanana
29
Experiment 1
1 No Parameters 001 0.364000 Video
Video1 002 0.245000 Shot Shot2 003 0.227500
Scene Scene1 004 0.105000 Shot Shot1 004
0.105000 Scene Scene2 004 0.105000 Video
Video2 004 0.105000 Video Video3
7 LI LID TWM 001 0.288750 Shot Shot2
002 0.280313 Scene Scene1 003 0.273000
Video Video1 004 0.129375 Scene Scene2 005
0.123750 Shot Shot1 006 0.078750 Video
Video2 006 0.078750 Video Video3

Model works
Different levels of document granularity
(Video/Scene/Shot) retrieved in same list
Parameters work but unclear if they help

30
Experiment 2
or("Mushroom" "Mushrooms")
or(chips" and("salad" "cream"))
constraint(Classification, "Comedy")

Model worked with real collection to produce real
results
Results were as expected given knowledge of
material

31
Experiment 3

Recall/Precision metrics calculated
Rank in results list (not result rank) used for
analysis
Ten best Video/Scene/Shot chosen by author
Ranking seems good
6/10 required in top 10
All 10 within top 93 (out of 362 in total)
Figures suggest that the model working
effectively although this is not conclusive

32
Discussion

Size of collection too small to produce
significant results. No known MPEG7 collections.
No independent queries with relevance assessments
exist (obviously)
Software efficiency crucial - simplifying
assumptions can be made to ensure that the IN is
computationally viable. Size of computation is
not proportional to size of collection.

33
Concluding Remarks
34
Concluding Remarks

MPEG7 was found to contain useful tools
Model for VIR developed
Based on Inference Network, Built from MPEG7
files
Indexing captures structure, context and concepts
Retrieval done using Terms, Operators and
Constraints
Model parameters devised
Results suggest that approach taken well founded
although lack of data is problematic

35
Next...