Video Search: Whats New - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Video Search: Whats New

Description:

Video Search: How does it work? 'Conventional' methods: catalogs, ... MPEG-4: DivX, Xvid, 3ivX implementations of certain compression recommendations of MPEG-4. ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 48
Provided by: rohmann
Category:
Tags: divx | new | search | video | whats

less

Transcript and Presenter's Notes

Title: Video Search: Whats New


1
Video SearchWhats New
  • Gloria Rohmann
  • NYU Libraries
  • October 14, 2005

2
The problem I know its in there somewhere
  • Gist (what its about)
  • Genre
  • Style
  • Scenes
  • People
  • Objects
  • Dialogue
  • Soundtrack

3
Video Search How does it work?
  • Conventional methods catalogs, databases and
    analog previewing
  • Why digitize?
  • Discovering video structure
  • Automatic and manual indexing
  • Data models user interfaces
  • Prospects for the future mobile and web services

4
Conventional Methods Browse and Search
  • Structured databases
  • AV cataloging (AACR2, MARC 21)
  • Shot lists
  • Asset management systems
  • Pathfinders (librarians, archivists)
  • Embedded markers hints, chapters, scenes (DVD)
  • Video logging systems
  • Hardware browse/skim FF, slow-mo, etc.

5
Video Search in Libraries
  • Mainly MARC
  • 245 Title (usually main entry)
  • 300 Description (physical piece)
  • 505 Contents
  • 508 Credits
  • 511 Performer note
  • 520 Summary

6
Sample screen from BobCat video record
505 Contents
520 Summary
650 Subject headings
7
Enhanced metadata shot lists, transcripts Open
University video collection
8
Footage - Opening creditsChocolate factory
workers. Alan Coxon and Kathy Sykes preparing
food. Man biting into
chocolate bar (0'00-0'50") Alan opening fridge
and walking over to Kathy at table. Kathy grating
orange. Alan showing
ingredients for cheesecake. Cookingchocolate.
Alan and Kathy breakingchocolate and smelling
it. Breakichocolate. Kathy
tasting chocolate (0'51"-"2'00"
Footage - Opening credits Chocolate factory
workers. Alan Coxon and Kathy Sykes preparing
food. Man biting into chocolate bar
(0'00-0'50") Alan opening fridge and walking over
to Kathy at table. Kathy grating orange. Alan
showing ingredients for cheesecake. Cooking
chocolate. Alan and Kathy breaking chocolate and
smelling it. Breaking chocolate.Kathy tasting
chocolate (0'51"-"2'00)
9
Video Pathfinders
10
Asset Management Systems
  • Building digital library collections
  • What metadata (METS, MPEG-21, etc.)?
  • Distribution standards required
  • Not born digital ingest problem
  • DRM what drives commercial distribution?

11
Browse and skim Analog Control (VCRs)
  • Pause, FF, rewind (all VCRs)
  • Some VCRs
  • Pause and frame-by-frame
  • High-speed picture search AKA FF
  • Variable speed picture search
  • Index recording VCR marks beginning of each
    recording on a tape.

12
Browse and skim DVDs Digital Advantages
  • Pause, FF, rewind
  • Navigate
  • Frame-by-frame menus, chapters or tracks
  • Insert markers, repeat play
  • Change audio, subtitle languages, show closed
    captioning
  • Shuttle/scrub onscreen

13
Browse and Skim Media PlayersDVD player clones
can be enhanced with SDKs
  • Media Players are DECODERS
  • Pause, FF, rewind
  • Variable speed
  • Navigate menus, chapters, tracks
  • Insert markers
  • Change audio subtitles
  • Show closed captioning
  • Shuttle/scrub

14
Media Player ExampleDVD player clones can be
enhanced with SDKs
File markers added by end-user
Play speed settings 0.5 gtgt 3X
Start, stop, pause, rewind to beginning, FF to
end, advance by frame
15
What Is Video?
  • Authored video has
  • Series of still images _at_25-30 fps
  • Structure frames gtgt shots gtgt scenes
  • MODALITIES
  • (Audio tracks)
  • (Text captioning, subtitles, etc.)
  • (Graphics logos, running tickers etc.)
  • Production metadata timestamp, datestamp, flash
    on/off

16
Advantages of Digital Video
  • Store and deliver over networks
  • Allow analysis by computers
  • Allow auto manual indexing
  • USING
  • Image processing
  • Signal processing
  • Information visualization

17
Why Compress Video?
  • 1 frame (_at_TV brightness) 0.9 megabytes (MB) of
    storage
  • At 29 fps, each second 26.1 MB of storage
  • 30 minute film 53 gigabytes (GB) of storage
  • OBJECT Make file smaller retain as much
    information as possible

18
Encoding Formats
  • These formats use some kind of compression
    similar encoding methodsmany CODECSsome
    lossy, others lossless
  • AVI audio-video interleave or interactive
  • QuickTime
  • MPEG family MPEG-1, 2, 4
  • H261 for video conferencing
  • New H264 JPEG 2000

19
CODECS
  • Compressor/Decompressor, or Coder/Decoder
  • Produce and work with encoding formats.
  • Central to compression and encoding perform
    signal and image processing tasks
  • Examples Cinepak, Indeo, Windows Media Video.
  • MPEG-4 DivX, Xvid, 3ivX implementations of
    certain compression recommendations of MPEG-4.

20
How Do CODECS Work?
  • Movement creates temporal aliasing human
    eye/brain fills in the gaps
  • Blurring produced by camera shutter softens edges
  • Modeled by CODECS and algorithms
  • Goal acceptable facsimile of moving scene

21
Configuring CODECS for analysis
Psychovisual enhancements
Maximum Keyframe Interval
22
What looks best to you?
Segmentation method B
Segmentation method A
Original image
Jermyn, I. Psychovisual Evaluation of Image
Database Retrieval and Image Segmentation
23
Encoding Methods predictive
  • Sampling value of function _at_ regular intervals
    (example brightness of pixels)
  • Quantization frequency of sampling (1 in 10 vs.
    1 in 100 frames)
  • Discrete cosine transforms (DCT) an array of data
    (not just one pixel) is transformed into another
    set of values.
  • Inter-frame vs. Intra-frame encoding

24
Video Structure
  • Video
  • Scene
  • Shot
  • Frame

25
Using Encoding Methods to Discover Structure
26
Shot Boundary Detection
  • Algorithms that compare the similarities between
    nearby frames. When the similarities fall below a
    pre-determined level, the limit of a shot is
    automatically defined
  • Edge detection
  • Compare color histograms
  • Compare motion vectors

27
Revealing Video Structure with Non-linear
Editors
  • Clips are basis for video editing
  • Non-linear editors (like iMovie, Windows Movie
    Maker) can create clips based on keyframes and
    shot boundary detection
  • NLEs can also isolate frames
  • Video logging software works the same way
    (Virage, Scenalyzer Live)

28
Clip Creation with NLEs
29
Spatial Temporal Segmentation
  • 1. Use shot boundary detection and keyframes to
    define shots choose representative frames
  • 2. Use CBIR (Content-based Image Retrieval)
    techniques to reveal features in representative
    frames
  • (shapes, colors, textures)

30
CBIR Techniques
  • Images (frames) have no inherent semantic
    meaning only arrays of pixel intensities
  • Color Retrieval compare histograms
  • Texture Retrieval relative brightness of pixel
    pairs
  • Shape Retrieval Humans recognize objects
    primarily by their shape
  • Retrieval by position within the image

31
MPEG-4Content-based Encoding
  • Encodes objects that can be tracked from frame to
    frame.
  • Video frames are layers of video object planes
    (VOP).
  • Each VOP is segmented coded separately
    throughout the shot
  • Background encoded only once.
  • Objects are not defined as to what they
    represent, only their motion, shapes, colors and
    textures, allowing them to be tracked through
    time.
  • Objects and their backgrounds are brought
    together again by the decoder.

32
MPEG-4 Content-based encoding
Video object plane (VOP)
Video object plane (VOP)
Background encoded only once
Ghanbari, M. (1999) Video Coding An Introduction
to Standard Codecs
33
AMOS Tracking Objects Beyond the Frame
http//www.ctr.columbia.edu/dzhong/rtrack/demo.ht
m
34
Are We Doing Multimedia?Multimodal Indexing
  • Ramesh Jain To solve multimedia problems, we
    should use as much context as we can.
  • Visual (frames, shots, scenes)
  • Audio (soundtrack speech recognition)
  • Text (closed captions, subtitles)
  • Contexthyperlinks, etc.
  • IEEE Multimedia. Oct-Nov. 2003
    http//jain.faculty.gatech.edu/media_vision/doing_
    mm.pdf

35
Multimodal Indexing
Settings, Objects, People
Modalities Video, audio, text
Snoek, C., Worring, M. Multimodal Indexing A
Review of the State-of-the-art. Multimedia Tools
Applications. January 2005
36
Building Video Indexes
  • Same as any indexing processdecide
  • What to index granularity
  • How to index modalities (images, audio, etc.)
  • Which features?
  • Discover spatial and temporal structure
    deconstructing the authoring process
  • Construct data models for access

37
Building Video IndexesStructured modeling
  • Predict relationship between shots
  • Pattern recognition
  • Hidden Markov Models
  • SVM (support vector machines)
  • Neural networks
  • Relevance feedback via machine learning

38
Data Models for Video IR
  • Based on text (DBMS, MARC)
  • Semi-structured (video XML or hypertext)
    MPEG-7, SMIL
  • Based on context Yahoo Video, Blinkx, Truveo
  • Multimodal Marvel, Virage

39
Virage VideoLoggerTM
Mark annotate clips
SMPTE timecode
Keyframes
Text or audio extracted automatically
40
Annotation Metadata Schemes
  • MPEG-7
  • MPEG-21
  • METS
  • SMIL

41
IBM MPEG-7 Annotation Tool
42
MPEG-7 Output from IBM Annotation Tool
Duration of shot in frames
- ltMediaTimegt   ltMediaTimePointgtT00002720830F30
000lt/MediaTimePointgt   ltMediaIncrDuration
mediaTimeUnit"PT1001N30000F"gt248lt/MediaIncrDurati
ongt   lt/MediaTimegt - ltTemporalDecompositiongt -
ltVideoSegmentgt - ltMediaTimegt   ltMediaTimePointgtT00
003123953F30000lt/MediaTimePointgt  
lt/MediaTimegt - ltSpatioTemporalDecompositiongt -
ltStillRegiongt - ltTextAnnotationgt  
ltFreeTextAnnotationgtIndoorslt/FreeTextAnnotationgt
  lt/TextAnnotationgt - ltSpatialLocatorgt   ltBox
mpeg7dim"2 2"gt14 15 351 238lt/Boxgt  
lt/SpatialLocatorgt   lt/StillRegiongt
Location and dimension of spatial locator in
pixels
Annotation
43
Browse Video Surrogates
44
SMIL Hypertext Hypermedia
ltwindow type"generic" duration"13000"
height"480" width"320 underline_hyperlinks"tru
e" /gt ltfont face"arial" size"2"gt ltolgt ltligtlta
href"commandseek(00)" target"_player"gtIntrolt/a
gtlt/ligt ltbr/gt ltligt lta href"commandseek(210)"
target"_player"gtQ1 to Kerrylt/agt, lta
href"commandseek(426)" target"_player"gtBush
rebuttallt/agt lt/ligt
45
Scholarly Primitives
  • Low-level methods for higher-level research
  • Discovering
  • Annotating
  • Comparing
  • Referring
  • Sampling
  • Illustrating
  • Representing

Unsworth, John. (2000) Scholarly Primitives
what methods do humanities researchers have in
common, and how might our tools reflect this?
46
User Interfaces for Video IR
  • Discovering
  • Annotating
  • Comparing
  • Referring
  • Sampling
  • Illustrating
  • Representing
  • Browse, query text
  • Browse surrogates
  • Interactive filtering dynamic query based on
    visual aspects
  • Interactive zooming
  • Interactive distortion
  • Compare results for feedback
  • Annotate results

47
(No Transcript)
48
IBM Research MARVel
  • MPEG-7 video search engine
  • Manual annotations are used for machine learning
  • Automatic multimodal indexing
  • Image processing
  • Automatic speech recognition
  • Structured modeling clustering by comparing
    features

http//www.research.ibm.com/marvel
49
MARVEL demo
50
Video Search on the Web Yahoo
  • Uses existing (text) metadata
  • Does not analyze content of media stream
  • Horowitz Web pages are self-describing
  • Analyze the web page around the link
  • Analyze the metadata included in video file
  • Media RSS publishers can add links to multimedia
    within feed

51
(No Transcript)
52
Video Search on the Web Google
  • Using metadata in the video stream
  • Almost all broadcast news video is closed
    captioned
  • Google ingests video with closed captioning
  • Transcripts are created linked to time-code
  • Transcripts are indexed
  • Thumbnails grabbed at time intervals
  • Still text-based thumbnails provide visual
    surrogate

53
Results of Google Video Search social security
54
Results of Google Search screen 2
55
(No Transcript)
56
Opportunities for Research
  • User needs
  • User interfaces
  • Classification and description
  • Metadata whither standards?
Write a Comment
User Comments (0)
About PowerShow.com