Title: Lecture 10: Metadata for Media
1Lecture 10 Metadata for Media
SIMS 202 Information Organization and Retrieval
- Prof. Ray Larson Prof. Marc Davis
- UC Berkeley SIMS
- Tuesday and Thursday 1030 am - 1200 pm
- Fall 2003
- http//www.sims.berkeley.edu/academics/courses/is2
02/f03/
2Todays Agenda
- Review of Last Time
- Metadata for Motion Pictures
- Representing Video
- Current Approaches
- Media Streams
- Discussion Questions
- Action Items for Next Time
3Todays Agenda
- Review of Last Time
- Metadata for Motion Pictures
- Representing Video
- Current Approaches
- Media Streams
- Discussion Questions
- Action Items for Next Time
4The Media Opportunity
- Vastly more media will be produced
- Without ways to manage it (metadata creation and
use) we lose the advantages of digital media - Most current approaches are insufficient and
perhaps misguided - Great opportunity for innovation and invention
- Need interdisciplinary approaches to the problem
5What is the Problem?
- Today people cannot easily find, edit, share, and
reuse media - Computers dont understand media content
- Media is opaque and data rich
- We lack structured representations
- Without content representation (metadata),
manipulating digital media will remain like
word-processing with bitmaps
6Traditional Media Production Chain
METADATA
Metadata-Centric Production Chain
PRE-PRODUCTION
POST-PRODUCTION
PRODUCTION
DISTRIBUTION
7Automated Media Production Process
8Technology Summary
- Media Streams provides a framework for creating
metadata throughout the media production cycle to
make media assets searchable and reusable - Active Capture automates direction and
cinematography using real-time audio-video
analysis in an interactive control loop to create
reusable media assets - Adaptive Media uses adaptive media templates and
automatic editing functions to mass customize and
personalize media and thereby eliminate the need
for editing on the part of end users - Together, these technologies will automate,
personalize, and speed up media production,
distribution, and reuse
9Active Capture
10Active Capture Reusable Shots
11Marc Davis in Godzilla Scene
12Evolution of Media Production
- Customized production
- Skilled creation of one media product
- Mass production
- Automatic replication of one media product
- Mass customization
- Skilled creation of adaptive media templates
- Automatic production of customized media
13Central Idea Movies as Programs
- Movies change from being static data to programs
- Shots are inputs to a program that computes new
media based on content representation and
functional dependency (US Patents 6,243,087
5,969,716)
14Todays Agenda
- Review of Last Time
- Metadata for Motion Pictures
- Representing Video
- Current Approaches
- Media Streams
- Discussion Questions
- Action Items for Next Time
15Representing Video
- Streams vs. Clips
- Video syntax and semantics
- Ontological issues in video representation
16Video is Temporal
17Streams vs. Clips
18Stream-Based Representation
- Makes annotation pay off
- The richer the annotation, the more numerous the
possible segmentations of the video stream - Clips
- Change from being fixed segmentations of the
video stream, to being the results of retrieval
queries based on annotations of the video stream - Annotations
- Create representations which make clips, not
representations of clips
19Video Syntax and Semantics
- The Kuleshov Effect
- Video has a dual semantics
- Sequence-independent invariant semantics of shots
- Sequence-dependent variable semantics of shots
20Ontological Issues for Video
- Video plays with rules for identity and
continuity - Space
- Time
- Person
- Action
21Space and Time Actual vs. Inferable
- Actual Recorded Space and Time
- GPS
- Studio space and time
- Inferable Space and Time
- Establishing shots
- Cues and clues
22Time Temporal Durations
- Story (Fabula) Duration
- Example Brushing teeth in story world (5
minutes) - Plot (Syuzhet) Duration
- Example Brushing teeth in plot world (1 minute
6 steps of 10 seconds each) - Screen Duration
- Example Brushing teeth (10 seconds 2 shots of 5
seconds each)
23Character and Continuity
- Identity of character is constructed through
- Continuity of actor
- Continuity of role
- Alternative continuities
- Continuity of actor only
- Continuity of role only
24Representing Action
- Physically-based description for
sequence-independent action semantics - Abstract vs. conventionalized descriptions
- Temporally and spatially decomposable actions and
subactions - Issues in describing sequence-dependent action
semantics - Mental states (emotions vs. expressions)
- Cultural differences (e.g., bowing vs. greeting)
25Cinematic Actions
- Cinematic actions support the basic narrative
structure of cinema - Reactions/Proactions
- Nodding, screaming, laughing, etc.
- Focus of Attention
- Gazing, headturning, pointing, etc.
- Locomotion
- Walking, running, etc.
- Cinematic actions can occur
- Within the frame/shot boundary
- Across the frame boundary
- Across shot boundaries
26Todays Agenda
- Review of Last Time
- Metadata for Motion Pictures
- Representing Video
- Current Approaches
- Media Streams
- Discussion Questions
- Action Items for Next Time
27The Search for Solutions
- Current approaches to creating metadata dont
work - Signal-based analysis
- Keywords
- Natural language
- Need standardized metadata framework
- Designed for video and rich media data
- Human and machine readable and writable
- Standardized and scaleable
- Integrated into media capture, archiving,
editing, distribution, and reuse
28Signal-Based Parsing
- Practical problem
- Parsing unstructured, unknown video is very, very
hard - Theoretical problem
- Mismatch between percepts and concepts
29Perceptual/Conceptual Issue
Similar Percepts / Dissimilar Concepts
Clown Nose
Red Sun
30Perceptual/Conceptual Issue
Dissimilar Percepts / Similar Concepts
John Dillingers
Timothy McVeighs
Car
Car
31Signal-Based Parsing
- Effective and useful automatic parsing
- Video
- Shot boundary detection
- Camera motion analysis
- Low level visual similarity
- Feature tracking
- Face detection
- Audio
- Pause detection
- Audio pattern matching
- Simple speech recognition
- Speech vs. music detection
- Approaches to automated parsing
- At the point of capture, integrate the recording
device, the environment, and agents in the
environment into an interactive system - After capture, use human-in-the-loop algorithms
to leverage human and machine intelligence
32Keywords vs. Semantic Descriptors
dog, biting, Steve
33Keywords vs. Semantic Descriptors
dog, biting, Steve
34Why Keywords Dont Work
- Are not a semantic representation
- Do not describe relations between descriptors
- Do not describe temporal structure
- Do not converge
- Do not scale
35Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
36Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
37Notation for Time-Based Media Music
38Visual Language Advantages
- A language designed as an accurate and readable
representation of time-based media - For video, especially important for actions,
expressions, and spatial relations - Enables Gestalt view and quick recognition of
descriptors due to designed visual similarities - Supports global use of annotations
39Todays Agenda
- Review of Last Time
- Metadata for Motion Pictures
- Representing Video
- Current Approaches
- Media Streams
- Discussion Questions
- Action Items for Next Time
40After Capture Media Streams
41Media Streams Features
- Key features
- Stream-based representation (better segmentation)
- Semantic indexing (what things are similar to)
- Relational indexing (who is doing what to whom)
- Temporal indexing (when things happen)
- Iconic interface (designed visual language)
- Universal annotation (standardized markup schema)
- Key benefits
- More accurate annotation and retrieval
- Global usability and standardization
- Reuse of rich media according to content and
structure
42Media Streams GUI Components
- Media Time Line
- Icon Space
- Icon Workshop
- Icon Palette
43Media Time Line
- Visualize video at multiple time scales
- Write and read multi-layered iconic annotations
- One interface for annotation, query, and
composition
44Media Time Line
45Icon Space
- Icon Workshop
- Utilize categories of video representation
- Create iconic descriptors by compounding iconic
primitives - Extend set of iconic descriptors
- Icon Palette
- Dynamically group related sets of iconic
descriptors - Reuse descriptive effort of others
- View and use query results
46Icon Space
47Icon Space Icon Workshop
- General to specific (horizontal)
- Cascading hierarchy of icons with increasing
specificity on subordinate levels - Combinatorial (vertical)
- Compounding of hierarchically organized icons
across multiple axes of description
48Icon Space Icon Workshop Detail
49Icon Space Icon Palette
- Dynamically group related sets of iconic
descriptors - Collect icon sentences
- Reuse descriptive effort of others
50Icon Space Icon Palette Detail
51Video Retrieval In Media Streams
- Same interface for annotation and retrieval
- Assembles responses to queries as well as finds
them - Query responses use semantics to degrade
gracefully
52Media Streams Technologies
- Minimal video representation distinguishing
syntax and semantics - Iconic visual language for annotating and
retrieving video content - Retrieval-by-composition methods for repurposing
video
53Non-Technical Challenges
- Standardization of media metadata (MPEG-7)
- Broadband infrastructure and deployment
- Intellectual property and economic models for
sharing and reuse of media assets
54Todays Agenda
- Review of Last Time
- Metadata for Motion Pictures
- Representing Video
- Current Approaches
- Media Streams
- Discussion Questions
- Action Items for Next Time
55Discussion Questions (Davis)
- John Snydal on Media Streams
- What is the target audience of users
(annotators/retrievers) for Media Streams? In the
article the following groups are mentioned - Content providers
- Video editors
- News teams
- Documentary film makers
- Film archives
- Stock photo houses
- Video archivists
- Video producers
- (international audience)
- (illiterate and preliterate people)
- Is it possible that Media Streams could satisfy
the needs, goals and requirements of all of these
groups, or would it be more appropriate to
develop separate, tailored applications for the
unique needs of each group?
56Discussion Questions (Davis)
- danah boyd on Media Streams
- Icons require visual literacy. Icons are also
culturally constructed. Thus, for them to work as
an information access bit, people must learn the
visual language it is not inherent. What are
the social consequences of a system dependent on
unfamiliar cues?
57Discussion Questions (Davis)
- danah boyd on Media Streams
- Films are constructed narratives. But most
commonplace storytelling is not. Even in a
creative form, people often piece together found
objects instead of finding objects to fit their
story. (Think teenage girls making collages out
of the latest YM.) Storytelling also happens
around media far more than through media (i.e.
telling a story about a picture rather than using
a collection of pictures to tell a story). My
guess is that this social phenomenon goes beyond
the retrieval issues. Do you think that Media
Streams would encourage new behavior regarding
storytelling or will it only be useful for those
with a constructed narrative in mind? Why (not)?
58Discussion Questions (Davis)
- Jesse Mendelsohn on Media Streams
- Media Streams does not allow iconic descriptions
of emotion or scene-interpretation. How would
someone searching stock footage for a
suspenseful scene of two men beating each other
go about doing it? The actual sense of suspense
and the act of beating cannot be iconified.
Does this limit Media Streams' ability or is
there a way around it within its capabilities as
described?
59Discussion Questions (Davis)
- Jesse Mendelsohn on Media Streams
- In order for Media Streams to work well it relies
on a the availability of a very large and
extensive resource of well-annotated video. Is
the current annotation process too primitive
and/or time consuming to allow Media Streams to
work to its full potential? Will changing how
Media Streams can be used to annotate video or
changing video annotation methods in general make
Media Streams more effective?
60Todays Agenda
- Review of Last Time
- Metadata for Motion Pictures
- Representing Video
- Current Approaches
- Media Streams
- Discussion Questions
- Action Items for Next Time
61Assignment 4.1
- Assignment 4.1
- Phone Metadata Design - Part 1
- Due Oct 2
62Next Time
- Database Design (RRL)
- Readings
- Handouts in Class
- Database Modeling and Design -- Ch. 2 The ER
Model - Basic Concepts (Teorey, T.J.) - Logical Database Design and the Relational Model
(F. R. McFadden, J. A. Hoffer)