CSM06 Information Retrieval - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

CSM06 Information Retrieval

Description:

CSM06 Information Retrieval. LECTURE 9 Tuesday 25th November. Dr Andrew Salway ... 'Find all the sequences showing Tony Blair eating a donut' ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 37
Provided by: csp9
Category:

less

Transcript and Presenter's Notes

Title: CSM06 Information Retrieval


1
CSM06 Information Retrieval
  • LECTURE 9 Tuesday 25th November
  • Dr Andrew Salway
  • a.salway_at_surrey.ac.uk

2
Lecture 9 Video Retrieval
  • Part 1 Modelling Video Data
  • Part 2 Querying Video Data
  • Part 3 Automatic Video Processing
  • Part 4 Systems

3
Moving Images include
  • Television
  • Cinema
  • Meteorological images
  • Surveillance cameras
  • Medical images
  • Biomechanical images
  • Dance

4
Digital Video Data is
  • A sequence of frames, where each frame is an
    image (typically 25 frames per second)
  • Moving images depict objects, people, actions,
    events
  • May include a soundtrack Speech (commentary /
    monologue / dialogue) Music Sound effects
  • May include text, i.e. subtitles / closed
    captions
  • Video data has temporal aspects as well as
    spatial aspects the temporal organisation of the
    moving images conveys information in its own
    right
  • Cinematic techniques (pan, zoom, etc.) and
    editing effects can also convey information

5
Example queries to a video database
  • May wish to retrieve a whole video or only parts,
    i.e. intervals / regions

6
PART 1 Video Data Models
  • Must decide how to structure (i.e. model) video
    data so that metadata can be attached
    appropriately must consider potential user
    information needs
  • Models of video data include
  • BLObs (Binary Large Objects)
  • Frames
  • Intervals (discrete, hierarchical, overlapping)
    temporal logic can be used to express
    relationships between intervals
  • Object-based schemes (e.g. MPEG-4s audio-visual
    objects)

7
Treating Video Data as a BLOb
  • Metadata may be associated with the video data as
    a whole
  • The kinds of metadata for visual information
    discussed in Lecture 8 apply equally well to
    moving images but note ideally only metadata
    that is true for the whole video data file should
    be associated with a BLOb

8
Treating Video Data as Frames
  • An exhaustive metadata description of a video
    data file would include details for each and
    every frame (remember each frame is a still
    image)
  • However, with 25-30 fps, the cost of this is
    usually prohibitive and there are few
    applications where it would be beneficial

9
Treating Video Data as Intervals
  • It is more usual to model video data as
    meaningful intervals on a timeline where
    meaningful depends on the particular domain and
    application
  • The intervals may be discrete or overlapping
  • The intervals may be arranged in a hierarchy so
    that metadata descriptions can be inherited
  • Temporal relationships between intervals may be
    described, e.g. for more complex queries

10
Temporal Relationships between Intervals
  • The work of Allen (1983) is often discussed in
    the video database literature (and in other
    computing disciplines)
  • Allen described 13 temporal relationships that
    can hold between intervals
  • A transitivity table allows a system to infer the
    relationship between A r C, if A r B and B r C
    are known

11
Modelling the Relationships between Entities and
Events in Film
  • Roth (1999) proposed the use of a semantic
    network to represent the relationships between
    entities and events in a movie
  • The user can then browse between scenes in a
    movie, e.g. if they are watching the scene of an
    explosion, they may browse to the scene in which
    a bomb was planted, via the semantic network

12
Exercise
  • Describe different ways you could model the
    following video data files
  • The 10 oclock news
  • A movie
  • A football match

13
Further Reading
  • Subrahmanian, Principles of Multimedia Database
    Systems. Chapter 7
  • Allen (1983). J. F. Allen, Maintaining
    Knowledge About Temporal Intervals.
    Communications of the ACM 26 (11), pp. 832-843.
    Especially Figure 2 for the 13 relationships and
    Figure 4 for the full transitivity table.
  • Roth (1999). Volker Roth, Content-based
    retrieval from digital video. Image and Vision
    Computing 17, pp. 531-540.

14
PART 2 Querying Video Content
  • Broadly speaking video content can be said to
    comprise
  • Objects (including people) with properties
  • Activities (actions, events) involving 0 or more
    objects
  • Recall that descriptions of content may be
    attached to frames, intervals, whole videos
    intervals may be discrete / overlapping
    hierarchical related by 13 temporal
    relationships
  • How to express and process queries?

15
Querying Video Content
  • Four kinds of retrieval
  • Segment Retrieval find all video segments where
    an exchange of a briefcase took place at Johns
    house
  • Object Retrieval find all the people in the
    video sequence (v,s,e)
  • Activity Retrieval what was happening in the
    video sequence (v,s,e)
  • Property-based Retrieval find all segments
    where somebody is wearing a blue shirt

16
Querying Video Content
  • Subrahmanian proposes an extension to SQL in
    order to express a users information need when
    querying a video database
  • Based on video functions
  • Recall that SQL is a database query language for
    relational databases queries expressed in terms
    of
  • SELECT (which attributes)
  • FROM (which table)
  • WHERE (these conditions hold)

17
Video Functions
  • FindVideoWithObject(o)
  • FindVideoWithActivity(a)
  • FindVideoWithActivityandProp(a,p,z)
  • FindVideoWithObjectandProp(o,p,z)
  • FindObjectsInVideo(v,s,e)
  • FindActivitiesInVideo(v,s,e)
  • FindActivitiesAndPropsInVideo(v,s,e)
  • FindObjectsAndPropsInVideo(v,s,e)

18
A Query Language for Video
  • SELECT may contain
  • Vid_Id s,e
  • FROM may contain
  • video ltsourcegt
  • WHERE condition allows statements like
  • term IN func_call
  • (term can be variable, object, activity or
    property value
  • func_call is a video function)

19
EXAMPLE 1
  • Find all video sequences from the library
    CrimeVidLib1 that contain Denis Dopeman
  • ?
  • SELECT vid s,e
  • FROM video CrimeVidLib1
  • WHERE
  • (vid,s,e) IN FindVideoWithObjects(Denis Dopeman)

20
EXAMPLE 2
  • Find all video sequences from the library
    CrimeVidLib1 that show Jane Shady giving Denis
    Dopeman a suitcase

21
EXAMPLE 2
  • SELECT vid s,e
  • FROM video CrimeVidLib1
  • WHERE
  • (vid,s,e) IN FindVideoWithObjects(Denis Dopeman)
    AND
  • (vid,s,e) IN FindVideoWithObjects(Jane Shady) AND
  • (vid,s,e) IN FindVideoWithActivityandProp(Exchange
    Object, Item, Briefcase) AND
  • (vid,s,e) IN FindVideoWithActivityandProp(Exchange
    Object, Giver, Jane Shady) AND
  • (vid,s,e) IN FindVideoWithActivityandProp(Exchange
    Object, Receiver, Denis Dopeman)

22
EXAMPLE 3
  • Which people have been seen with Denis Dopeman
    in CrimeVidLib1

23
EXAMPLE 3
  • SELECT vid s,e, Object
  • FROM video CrimeVidLib1
  • WHERE
  • (vid,s,e) IN FindVideoWithObject(Denis Dopeman)
    AND
  • Object IN FindObjectsInVideo(vid,s,e) AND
  • Object Denis Dopeman AND
  • type of (Object, Person)

24
EXERCISE
  • Express the following in Subrahmanians Video SQL
  • Find all the sequences showing Tony Blair
  • Find all the sequences showing Tony Blair eating
    a donut
  • Find all the sequences showing Tony Blair with
    Edwina Currie, wearing a black shirt in a
    nightclub
  • Further Reading
  • Subrahmanian, Principles of Multimedia Database
    Systems, pp. 191-195

25
PART 3 Automatic Video Content Analysis
  • Can a machine understand the content of a video
    data stream?
  • Similar challenges/limitations as for still
    images
  • However systems must also track objects between
    frames (this might provide extra information of
    object segmentation / identification)
  • Also need to recognise events in terms of the
    actions of several objects

26
The Scope of Automatic Video Content Analysis
  • What can be automated?
  • Region (object) segmentation within frames
  • Interval segmentation
  • Recognition of camera actions and editing
    techniques
  • Extraction of representative key-frames
  • Extraction of visual features for
    indexing-retrieval visual features of frames,
    intervals and / or regions (objects)

27
Video Segmentation
  • Intervals within some kinds of moving images can
    be automatically detected
  • Algorithms look for sudden changes in visual
    features between successive frames e.g. sudden
    change in colour between as background to scene
    changes sudden change in motion from car chase
    to conversation

28
Recognising Camera Actions and Editing Techniques
  • Shots in films may be characterised in terms of
    camera actions and editing techniques e.g. the
    pan or zoom slow fade or dissolve
  • Researchers have developed mathematical models of
    how visual features change for these techniques
    and editing effects, and so in some cases they
    can be recognised automatically
  • This may give some insight into the mood or
    genre of a film??

29
Extraction of Key-frames
  • In order to produce video summaries it may be
    useful to automatically extract representative
    key-frames from longer sequences
  • Key-frames should have visual features typical
    of the sequence
  • The number of key-frames required depends on how
    much the visual content varies within the sequence

30
PART 4 Video Retrieval Systems
  • Video retrieval systems may be based on
  • Visual features
  • Manually annotated keywords
  • Keywords extracted from collateral text
  • A combination of these

31
VideoQ Columbia University, New York
  • Indexing based on automatically extracted visual
    features, including colour, shape and motion
    these features are generated for regions/objects
    in sequences within video data streams.
  • Sketch-based queries can specify colour and shape
    are specified as well as motion over a number of
    frames and spatial relationships with other
    objects/regions.  
  • Success depends on how well information needs can
    be expressed as visual features may not capture
    all the semantics of a video sequence 

32
VideoQ sketch-based query
33
Annotating Video Data
  • Content-descriptive metadata for video often
    needs to be manually annotated this will need
    to be more than just keywords to capture video
    content
  • In some cases video annotation can be automated
    by processing collateral texts cross-modal
    information retrieval

34
Informedia Carnegie Mellon University
  • Indexing based on keywords extracted from the
    audio stream and/or subtitles of news and
    documentary programmes.
  • Also combines visual and linguistic features to
    classify video segments 
  • Success depends on how closely the spoken words
    of the presenters relates to what can be seen

35
(No Transcript)
36
Further Reading
  • More on VideoQ at
  • http//www.ctr.columbia.edu/videoq/.index.html
  • More on Informedia at
  • http//www.informedia.cs.cmu.edu/
  • Commercial video retrieval systems
  • http//www.virage.com/index.cfm
  • http//www.dremedia.com
  • Currently a set of major US research projects in
    the area of Video Analysis and Content
    Exploitation (VACE)
  • http//www.ic-arda.org/InfoExploit/vace/
Write a Comment
User Comments (0)
About PowerShow.com