Title: CSM06 Information Retrieval
1CSM06 Information Retrieval
- LECTURE 9 Tuesday 25th November
- Dr Andrew Salway
- a.salway_at_surrey.ac.uk
2Lecture 9 Video Retrieval
- Part 1 Modelling Video Data
- Part 2 Querying Video Data
- Part 3 Automatic Video Processing
- Part 4 Systems
3Moving Images include
- Television
- Cinema
- Meteorological images
- Surveillance cameras
- Medical images
- Biomechanical images
- Dance
4Digital Video Data is
- A sequence of frames, where each frame is an
image (typically 25 frames per second) - Moving images depict objects, people, actions,
events - May include a soundtrack Speech (commentary /
monologue / dialogue) Music Sound effects - May include text, i.e. subtitles / closed
captions - Video data has temporal aspects as well as
spatial aspects the temporal organisation of the
moving images conveys information in its own
right - Cinematic techniques (pan, zoom, etc.) and
editing effects can also convey information
5Example queries to a video database
- May wish to retrieve a whole video or only parts,
i.e. intervals / regions
6PART 1 Video Data Models
- Must decide how to structure (i.e. model) video
data so that metadata can be attached
appropriately must consider potential user
information needs - Models of video data include
- BLObs (Binary Large Objects)
- Frames
- Intervals (discrete, hierarchical, overlapping)
temporal logic can be used to express
relationships between intervals - Object-based schemes (e.g. MPEG-4s audio-visual
objects)
7Treating Video Data as a BLOb
- Metadata may be associated with the video data as
a whole - The kinds of metadata for visual information
discussed in Lecture 8 apply equally well to
moving images but note ideally only metadata
that is true for the whole video data file should
be associated with a BLOb
8Treating Video Data as Frames
- An exhaustive metadata description of a video
data file would include details for each and
every frame (remember each frame is a still
image) - However, with 25-30 fps, the cost of this is
usually prohibitive and there are few
applications where it would be beneficial
9Treating Video Data as Intervals
- It is more usual to model video data as
meaningful intervals on a timeline where
meaningful depends on the particular domain and
application - The intervals may be discrete or overlapping
- The intervals may be arranged in a hierarchy so
that metadata descriptions can be inherited - Temporal relationships between intervals may be
described, e.g. for more complex queries
10Temporal Relationships between Intervals
- The work of Allen (1983) is often discussed in
the video database literature (and in other
computing disciplines) - Allen described 13 temporal relationships that
can hold between intervals - A transitivity table allows a system to infer the
relationship between A r C, if A r B and B r C
are known
11Modelling the Relationships between Entities and
Events in Film
- Roth (1999) proposed the use of a semantic
network to represent the relationships between
entities and events in a movie - The user can then browse between scenes in a
movie, e.g. if they are watching the scene of an
explosion, they may browse to the scene in which
a bomb was planted, via the semantic network
12Exercise
- Describe different ways you could model the
following video data files - The 10 oclock news
- A movie
- A football match
13Further Reading
- Subrahmanian, Principles of Multimedia Database
Systems. Chapter 7 - Allen (1983). J. F. Allen, Maintaining
Knowledge About Temporal Intervals.
Communications of the ACM 26 (11), pp. 832-843.
Especially Figure 2 for the 13 relationships and
Figure 4 for the full transitivity table. - Roth (1999). Volker Roth, Content-based
retrieval from digital video. Image and Vision
Computing 17, pp. 531-540.
14PART 2 Querying Video Content
- Broadly speaking video content can be said to
comprise - Objects (including people) with properties
- Activities (actions, events) involving 0 or more
objects - Recall that descriptions of content may be
attached to frames, intervals, whole videos
intervals may be discrete / overlapping
hierarchical related by 13 temporal
relationships - How to express and process queries?
15Querying Video Content
- Four kinds of retrieval
- Segment Retrieval find all video segments where
an exchange of a briefcase took place at Johns
house - Object Retrieval find all the people in the
video sequence (v,s,e) - Activity Retrieval what was happening in the
video sequence (v,s,e) - Property-based Retrieval find all segments
where somebody is wearing a blue shirt
16Querying Video Content
- Subrahmanian proposes an extension to SQL in
order to express a users information need when
querying a video database - Based on video functions
- Recall that SQL is a database query language for
relational databases queries expressed in terms
of - SELECT (which attributes)
- FROM (which table)
- WHERE (these conditions hold)
17Video Functions
- FindVideoWithObject(o)
- FindVideoWithActivity(a)
- FindVideoWithActivityandProp(a,p,z)
- FindVideoWithObjectandProp(o,p,z)
- FindObjectsInVideo(v,s,e)
- FindActivitiesInVideo(v,s,e)
- FindActivitiesAndPropsInVideo(v,s,e)
- FindObjectsAndPropsInVideo(v,s,e)
18A Query Language for Video
- SELECT may contain
- Vid_Id s,e
- FROM may contain
- video ltsourcegt
- WHERE condition allows statements like
- term IN func_call
- (term can be variable, object, activity or
property value - func_call is a video function)
19EXAMPLE 1
- Find all video sequences from the library
CrimeVidLib1 that contain Denis Dopeman - ?
- SELECT vid s,e
- FROM video CrimeVidLib1
- WHERE
- (vid,s,e) IN FindVideoWithObjects(Denis Dopeman)
20EXAMPLE 2
- Find all video sequences from the library
CrimeVidLib1 that show Jane Shady giving Denis
Dopeman a suitcase
21EXAMPLE 2
- SELECT vid s,e
- FROM video CrimeVidLib1
- WHERE
- (vid,s,e) IN FindVideoWithObjects(Denis Dopeman)
AND - (vid,s,e) IN FindVideoWithObjects(Jane Shady) AND
- (vid,s,e) IN FindVideoWithActivityandProp(Exchange
Object, Item, Briefcase) AND - (vid,s,e) IN FindVideoWithActivityandProp(Exchange
Object, Giver, Jane Shady) AND - (vid,s,e) IN FindVideoWithActivityandProp(Exchange
Object, Receiver, Denis Dopeman)
22EXAMPLE 3
- Which people have been seen with Denis Dopeman
in CrimeVidLib1
23EXAMPLE 3
- SELECT vid s,e, Object
- FROM video CrimeVidLib1
- WHERE
- (vid,s,e) IN FindVideoWithObject(Denis Dopeman)
AND - Object IN FindObjectsInVideo(vid,s,e) AND
- Object Denis Dopeman AND
- type of (Object, Person)
24EXERCISE
- Express the following in Subrahmanians Video SQL
- Find all the sequences showing Tony Blair
- Find all the sequences showing Tony Blair eating
a donut - Find all the sequences showing Tony Blair with
Edwina Currie, wearing a black shirt in a
nightclub - Further Reading
- Subrahmanian, Principles of Multimedia Database
Systems, pp. 191-195
25PART 3 Automatic Video Content Analysis
- Can a machine understand the content of a video
data stream? - Similar challenges/limitations as for still
images - However systems must also track objects between
frames (this might provide extra information of
object segmentation / identification) - Also need to recognise events in terms of the
actions of several objects
26The Scope of Automatic Video Content Analysis
- What can be automated?
- Region (object) segmentation within frames
- Interval segmentation
- Recognition of camera actions and editing
techniques - Extraction of representative key-frames
- Extraction of visual features for
indexing-retrieval visual features of frames,
intervals and / or regions (objects)
27Video Segmentation
- Intervals within some kinds of moving images can
be automatically detected - Algorithms look for sudden changes in visual
features between successive frames e.g. sudden
change in colour between as background to scene
changes sudden change in motion from car chase
to conversation
28Recognising Camera Actions and Editing Techniques
- Shots in films may be characterised in terms of
camera actions and editing techniques e.g. the
pan or zoom slow fade or dissolve - Researchers have developed mathematical models of
how visual features change for these techniques
and editing effects, and so in some cases they
can be recognised automatically - This may give some insight into the mood or
genre of a film??
29Extraction of Key-frames
- In order to produce video summaries it may be
useful to automatically extract representative
key-frames from longer sequences - Key-frames should have visual features typical
of the sequence - The number of key-frames required depends on how
much the visual content varies within the sequence
30PART 4 Video Retrieval Systems
- Video retrieval systems may be based on
- Visual features
- Manually annotated keywords
- Keywords extracted from collateral text
- A combination of these
31VideoQ Columbia University, New York
- Indexing based on automatically extracted visual
features, including colour, shape and motion
these features are generated for regions/objects
in sequences within video data streams. - Sketch-based queries can specify colour and shape
are specified as well as motion over a number of
frames and spatial relationships with other
objects/regions. Â - Success depends on how well information needs can
be expressed as visual features may not capture
all the semantics of a video sequenceÂ
32VideoQ sketch-based query
33Annotating Video Data
- Content-descriptive metadata for video often
needs to be manually annotated this will need
to be more than just keywords to capture video
content - In some cases video annotation can be automated
by processing collateral texts cross-modal
information retrieval
34Informedia Carnegie Mellon University
- Indexing based on keywords extracted from the
audio stream and/or subtitles of news and
documentary programmes. - Also combines visual and linguistic features to
classify video segments - Success depends on how closely the spoken words
of the presenters relates to what can be seen
35(No Transcript)
36Further Reading
- More on VideoQ at
- http//www.ctr.columbia.edu/videoq/.index.html
- More on Informedia at
- http//www.informedia.cs.cmu.edu/
- Commercial video retrieval systems
- http//www.virage.com/index.cfm
- http//www.dremedia.com
- Currently a set of major US research projects in
the area of Video Analysis and Content
Exploitation (VACE) - http//www.ic-arda.org/InfoExploit/vace/