CSM06 Information Retrieval - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

CSM06 Information Retrieval

Description:

CSM06 Information Retrieval. LECTURE 9 Tuesday 25th November. Dr Andrew Salway ... 'Find all the sequences showing Tony Blair eating a donut' ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 37

Provided by: csp9

Category:

more less

Transcript and Presenter's Notes

Title: CSM06 Information Retrieval

1
CSM06 Information Retrieval

LECTURE 9 Tuesday 25th November
Dr Andrew Salway
a.salway_at_surrey.ac.uk

2
Lecture 9 Video Retrieval

Part 1 Modelling Video Data
Part 2 Querying Video Data
Part 3 Automatic Video Processing
Part 4 Systems

3
Moving Images include

Television
Cinema
Meteorological images
Surveillance cameras
Medical images
Biomechanical images
Dance

4
Digital Video Data is

A sequence of frames, where each frame is an
image (typically 25 frames per second)
Moving images depict objects, people, actions,
events
May include a soundtrack Speech (commentary /
monologue / dialogue) Music Sound effects
May include text, i.e. subtitles / closed
captions
Video data has temporal aspects as well as
spatial aspects the temporal organisation of the
moving images conveys information in its own
right
Cinematic techniques (pan, zoom, etc.) and
editing effects can also convey information

5
Example queries to a video database

May wish to retrieve a whole video or only parts,
i.e. intervals / regions

6
PART 1 Video Data Models

Must decide how to structure (i.e. model) video
data so that metadata can be attached
appropriately must consider potential user
information needs
Models of video data include
BLObs (Binary Large Objects)
Frames
Intervals (discrete, hierarchical, overlapping)
temporal logic can be used to express
relationships between intervals
Object-based schemes (e.g. MPEG-4s audio-visual
objects)

7
Treating Video Data as a BLOb

Metadata may be associated with the video data as
a whole
The kinds of metadata for visual information
discussed in Lecture 8 apply equally well to
moving images but note ideally only metadata
that is true for the whole video data file should
be associated with a BLOb

8
Treating Video Data as Frames

An exhaustive metadata description of a video
data file would include details for each and
every frame (remember each frame is a still
image)
However, with 25-30 fps, the cost of this is
usually prohibitive and there are few
applications where it would be beneficial

9
Treating Video Data as Intervals

It is more usual to model video data as
meaningful intervals on a timeline where
meaningful depends on the particular domain and
application
The intervals may be discrete or overlapping
The intervals may be arranged in a hierarchy so
that metadata descriptions can be inherited
Temporal relationships between intervals may be
described, e.g. for more complex queries

10
Temporal Relationships between Intervals

The work of Allen (1983) is often discussed in
the video database literature (and in other
computing disciplines)
Allen described 13 temporal relationships that
can hold between intervals
A transitivity table allows a system to infer the
relationship between A r C, if A r B and B r C
are known

11
Modelling the Relationships between Entities and
Events in Film

Roth (1999) proposed the use of a semantic
network to represent the relationships between
entities and events in a movie
The user can then browse between scenes in a
movie, e.g. if they are watching the scene of an
explosion, they may browse to the scene in which
a bomb was planted, via the semantic network

12
Exercise

Describe different ways you could model the
following video data files
The 10 oclock news
A movie
A football match

13
Further Reading

Subrahmanian, Principles of Multimedia Database
Systems. Chapter 7
Allen (1983). J. F. Allen, Maintaining
Knowledge About Temporal Intervals.
Communications of the ACM 26 (11), pp. 832-843.
Especially Figure 2 for the 13 relationships and
Figure 4 for the full transitivity table.
Roth (1999). Volker Roth, Content-based
retrieval from digital video. Image and Vision
Computing 17, pp. 531-540.

14
PART 2 Querying Video Content

Broadly speaking video content can be said to
comprise
Objects (including people) with properties
Activities (actions, events) involving 0 or more
objects
Recall that descriptions of content may be
attached to frames, intervals, whole videos
intervals may be discrete / overlapping
hierarchical related by 13 temporal
relationships
How to express and process queries?

15
Querying Video Content

Four kinds of retrieval
Segment Retrieval find all video segments where
an exchange of a briefcase took place at Johns
house
Object Retrieval find all the people in the
video sequence (v,s,e)
Activity Retrieval what was happening in the
video sequence (v,s,e)
Property-based Retrieval find all segments
where somebody is wearing a blue shirt

16
Querying Video Content

Subrahmanian proposes an extension to SQL in
order to express a users information need when
querying a video database
Based on video functions
Recall that SQL is a database query language for
relational databases queries expressed in terms
of
SELECT (which attributes)
FROM (which table)
WHERE (these conditions hold)

17
Video Functions

FindVideoWithObject(o)
FindVideoWithActivity(a)
FindVideoWithActivityandProp(a,p,z)
FindVideoWithObjectandProp(o,p,z)
FindObjectsInVideo(v,s,e)
FindActivitiesInVideo(v,s,e)
FindActivitiesAndPropsInVideo(v,s,e)
FindObjectsAndPropsInVideo(v,s,e)

18
A Query Language for Video

SELECT may contain
Vid_Id s,e
FROM may contain
video ltsourcegt
WHERE condition allows statements like
term IN func_call
(term can be variable, object, activity or
property value
func_call is a video function)

19
EXAMPLE 1

Find all video sequences from the library
CrimeVidLib1 that contain Denis Dopeman
?
SELECT vid s,e
FROM video CrimeVidLib1
WHERE
(vid,s,e) IN FindVideoWithObjects(Denis Dopeman)

20
EXAMPLE 2

Find all video sequences from the library
CrimeVidLib1 that show Jane Shady giving Denis
Dopeman a suitcase

21
EXAMPLE 2

SELECT vid s,e
FROM video CrimeVidLib1
WHERE
(vid,s,e) IN FindVideoWithObjects(Denis Dopeman)
AND
(vid,s,e) IN FindVideoWithObjects(Jane Shady) AND
(vid,s,e) IN FindVideoWithActivityandProp(Exchange
Object, Item, Briefcase) AND
(vid,s,e) IN FindVideoWithActivityandProp(Exchange
Object, Giver, Jane Shady) AND
(vid,s,e) IN FindVideoWithActivityandProp(Exchange
Object, Receiver, Denis Dopeman)

22
EXAMPLE 3

Which people have been seen with Denis Dopeman
in CrimeVidLib1

23
EXAMPLE 3

SELECT vid s,e, Object
FROM video CrimeVidLib1
WHERE
(vid,s,e) IN FindVideoWithObject(Denis Dopeman)
AND
Object IN FindObjectsInVideo(vid,s,e) AND
Object Denis Dopeman AND
type of (Object, Person)

24
EXERCISE

Express the following in Subrahmanians Video SQL
Find all the sequences showing Tony Blair
Find all the sequences showing Tony Blair eating
a donut
Find all the sequences showing Tony Blair with
Edwina Currie, wearing a black shirt in a
nightclub
Further Reading
Subrahmanian, Principles of Multimedia Database
Systems, pp. 191-195

25
PART 3 Automatic Video Content Analysis

Can a machine understand the content of a video
data stream?
Similar challenges/limitations as for still
images
However systems must also track objects between
frames (this might provide extra information of
object segmentation / identification)
Also need to recognise events in terms of the
actions of several objects

26
The Scope of Automatic Video Content Analysis

What can be automated?
Region (object) segmentation within frames
Interval segmentation
Recognition of camera actions and editing
techniques
Extraction of representative key-frames
Extraction of visual features for
indexing-retrieval visual features of frames,
intervals and / or regions (objects)

27
Video Segmentation

Intervals within some kinds of moving images can
be automatically detected
Algorithms look for sudden changes in visual
features between successive frames e.g. sudden
change in colour between as background to scene
changes sudden change in motion from car chase
to conversation

28
Recognising Camera Actions and Editing Techniques

Shots in films may be characterised in terms of
camera actions and editing techniques e.g. the
pan or zoom slow fade or dissolve
Researchers have developed mathematical models of
how visual features change for these techniques
and editing effects, and so in some cases they
can be recognised automatically
This may give some insight into the mood or
genre of a film??

29
Extraction of Key-frames

In order to produce video summaries it may be
useful to automatically extract representative
key-frames from longer sequences
Key-frames should have visual features typical
of the sequence
The number of key-frames required depends on how
much the visual content varies within the sequence

30
PART 4 Video Retrieval Systems

Video retrieval systems may be based on
Visual features
Manually annotated keywords
Keywords extracted from collateral text
A combination of these

31
VideoQ Columbia University, New York

Indexing based on automatically extracted visual
features, including colour, shape and motion
these features are generated for regions/objects
in sequences within video data streams.
Sketch-based queries can specify colour and shape
are specified as well as motion over a number of
frames and spatial relationships with other
objects/regions.
Success depends on how well information needs can
be expressed as visual features may not capture
all the semantics of a video sequence

32
VideoQ sketch-based query
33
Annotating Video Data

Content-descriptive metadata for video often
needs to be manually annotated this will need
to be more than just keywords to capture video
content
In some cases video annotation can be automated
by processing collateral texts cross-modal
information retrieval

34
Informedia Carnegie Mellon University

Indexing based on keywords extracted from the
audio stream and/or subtitles of news and
documentary programmes.
Also combines visual and linguistic features to
classify video segments
Success depends on how closely the spoken words
of the presenters relates to what can be seen

35
(No Transcript)
36
Further Reading

More on VideoQ at
http//www.ctr.columbia.edu/videoq/.index.html
More on Informedia at
http//www.informedia.cs.cmu.edu/
Commercial video retrieval systems
http//www.virage.com/index.cfm
http//www.dremedia.com
Currently a set of major US research projects in
the area of Video Analysis and Content
Exploitation (VACE)
http//www.ic-arda.org/InfoExploit/vace/