Lecture 10: Metadata for Media - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 10: Metadata for Media

Description:

2003.09.23 - SLIDE 1. IS 202 FALL 2003. Lecture 10: Metadata for Media ... Tuesday and Thursday 10:30 am - 12:00 pm. Fall 2003 ... Jesse Mendelsohn on Media Streams ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 63

Provided by: ValuedGate1

Learn more at: https://courses.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 10: Metadata for Media

1
Lecture 10 Metadata for Media
SIMS 202 Information Organization and Retrieval

Prof. Ray Larson Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 1030 am - 1200 pm
Fall 2003
http//www.sims.berkeley.edu/academics/courses/is2
02/f03/

2
Todays Agenda

Review of Last Time
Metadata for Motion Pictures
Representing Video
Current Approaches
Media Streams
Discussion Questions
Action Items for Next Time

3
Todays Agenda

Review of Last Time
Metadata for Motion Pictures
Representing Video
Current Approaches
Media Streams
Discussion Questions
Action Items for Next Time

4
The Media Opportunity

Vastly more media will be produced
Without ways to manage it (metadata creation and
use) we lose the advantages of digital media
Most current approaches are insufficient and
perhaps misguided
Great opportunity for innovation and invention
Need interdisciplinary approaches to the problem

5
What is the Problem?

Today people cannot easily find, edit, share, and
reuse media
Computers dont understand media content
Media is opaque and data rich
We lack structured representations
Without content representation (metadata),
manipulating digital media will remain like
word-processing with bitmaps

6
Traditional Media Production Chain
METADATA
Metadata-Centric Production Chain
PRE-PRODUCTION
POST-PRODUCTION
PRODUCTION
DISTRIBUTION
7
Automated Media Production Process
8
Technology Summary

Media Streams provides a framework for creating
metadata throughout the media production cycle to
make media assets searchable and reusable
Active Capture automates direction and
cinematography using real-time audio-video
analysis in an interactive control loop to create
reusable media assets
Adaptive Media uses adaptive media templates and
automatic editing functions to mass customize and
personalize media and thereby eliminate the need
for editing on the part of end users
Together, these technologies will automate,
personalize, and speed up media production,
distribution, and reuse

9
Active Capture
10
Active Capture Reusable Shots
11
Marc Davis in Godzilla Scene
12
Evolution of Media Production

Customized production
Skilled creation of one media product
Mass production
Automatic replication of one media product
Mass customization
Skilled creation of adaptive media templates
Automatic production of customized media

13
Central Idea Movies as Programs

Movies change from being static data to programs
Shots are inputs to a program that computes new
media based on content representation and
functional dependency (US Patents 6,243,087
5,969,716)

14
Todays Agenda

Review of Last Time
Metadata for Motion Pictures
Representing Video
Current Approaches
Media Streams
Discussion Questions
Action Items for Next Time

15
Representing Video

Streams vs. Clips
Video syntax and semantics
Ontological issues in video representation

16
Video is Temporal
17
Streams vs. Clips
18
Stream-Based Representation

Makes annotation pay off
The richer the annotation, the more numerous the
possible segmentations of the video stream
Clips
Change from being fixed segmentations of the
video stream, to being the results of retrieval
queries based on annotations of the video stream
Annotations
Create representations which make clips, not
representations of clips

19
Video Syntax and Semantics

The Kuleshov Effect
Video has a dual semantics
Sequence-independent invariant semantics of shots
Sequence-dependent variable semantics of shots

20
Ontological Issues for Video

Video plays with rules for identity and
continuity
Space
Time
Person
Action

21
Space and Time Actual vs. Inferable

Actual Recorded Space and Time
GPS
Studio space and time
Inferable Space and Time
Establishing shots
Cues and clues

22
Time Temporal Durations

Story (Fabula) Duration
Example Brushing teeth in story world (5
minutes)
Plot (Syuzhet) Duration
Example Brushing teeth in plot world (1 minute
6 steps of 10 seconds each)
Screen Duration
Example Brushing teeth (10 seconds 2 shots of 5
seconds each)

23
Character and Continuity

Identity of character is constructed through
Continuity of actor
Continuity of role
Alternative continuities
Continuity of actor only
Continuity of role only

24
Representing Action

Physically-based description for
sequence-independent action semantics
Abstract vs. conventionalized descriptions
Temporally and spatially decomposable actions and
subactions
Issues in describing sequence-dependent action
semantics
Mental states (emotions vs. expressions)
Cultural differences (e.g., bowing vs. greeting)

25
Cinematic Actions

Cinematic actions support the basic narrative
structure of cinema
Reactions/Proactions
Nodding, screaming, laughing, etc.
Focus of Attention
Gazing, headturning, pointing, etc.
Locomotion
Walking, running, etc.
Cinematic actions can occur
Within the frame/shot boundary
Across the frame boundary
Across shot boundaries

26
Todays Agenda

Review of Last Time
Metadata for Motion Pictures
Representing Video
Current Approaches
Media Streams
Discussion Questions
Action Items for Next Time

27
The Search for Solutions

Current approaches to creating metadata dont
work
Signal-based analysis
Keywords
Natural language
Need standardized metadata framework
Designed for video and rich media data
Human and machine readable and writable
Standardized and scaleable
Integrated into media capture, archiving,
editing, distribution, and reuse

28
Signal-Based Parsing

Practical problem
Parsing unstructured, unknown video is very, very
hard
Theoretical problem
Mismatch between percepts and concepts

29
Perceptual/Conceptual Issue
Similar Percepts / Dissimilar Concepts
Clown Nose
Red Sun
30
Perceptual/Conceptual Issue
Dissimilar Percepts / Similar Concepts
John Dillingers
Timothy McVeighs
Car
Car
31
Signal-Based Parsing

Effective and useful automatic parsing
Video
Shot boundary detection
Camera motion analysis
Low level visual similarity
Feature tracking
Face detection
Audio
Pause detection
Audio pattern matching
Simple speech recognition
Speech vs. music detection

Approaches to automated parsing
At the point of capture, integrate the recording
device, the environment, and agents in the
environment into an interactive system
After capture, use human-in-the-loop algorithms
to leverage human and machine intelligence

32
Keywords vs. Semantic Descriptors
dog, biting, Steve
33
Keywords vs. Semantic Descriptors
dog, biting, Steve
34
Why Keywords Dont Work

Are not a semantic representation
Do not describe relations between descriptors
Do not describe temporal structure
Do not converge
Do not scale

35
Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
36
Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
37
Notation for Time-Based Media Music
38
Visual Language Advantages

A language designed as an accurate and readable
representation of time-based media
For video, especially important for actions,
expressions, and spatial relations
Enables Gestalt view and quick recognition of
descriptors due to designed visual similarities
Supports global use of annotations

39
Todays Agenda

Review of Last Time
Metadata for Motion Pictures
Representing Video
Current Approaches
Media Streams
Discussion Questions
Action Items for Next Time

40
After Capture Media Streams
41
Media Streams Features

Key features
Stream-based representation (better segmentation)
Semantic indexing (what things are similar to)
Relational indexing (who is doing what to whom)
Temporal indexing (when things happen)
Iconic interface (designed visual language)
Universal annotation (standardized markup schema)
Key benefits
More accurate annotation and retrieval
Global usability and standardization
Reuse of rich media according to content and
structure

42
Media Streams GUI Components

Media Time Line
Icon Space
Icon Workshop
Icon Palette

43
Media Time Line

Visualize video at multiple time scales
Write and read multi-layered iconic annotations
One interface for annotation, query, and
composition

44
Media Time Line
45
Icon Space

Icon Workshop
Utilize categories of video representation
Create iconic descriptors by compounding iconic
primitives
Extend set of iconic descriptors
Icon Palette
Dynamically group related sets of iconic
descriptors
Reuse descriptive effort of others
View and use query results

46
Icon Space
47
Icon Space Icon Workshop

General to specific (horizontal)
Cascading hierarchy of icons with increasing
specificity on subordinate levels
Combinatorial (vertical)
Compounding of hierarchically organized icons
across multiple axes of description

48
Icon Space Icon Workshop Detail
49
Icon Space Icon Palette

Dynamically group related sets of iconic
descriptors
Collect icon sentences
Reuse descriptive effort of others

50
Icon Space Icon Palette Detail
51
Video Retrieval In Media Streams

Same interface for annotation and retrieval
Assembles responses to queries as well as finds
them
Query responses use semantics to degrade
gracefully

52
Media Streams Technologies

Minimal video representation distinguishing
syntax and semantics
Iconic visual language for annotating and
retrieving video content
Retrieval-by-composition methods for repurposing
video

53
Non-Technical Challenges

Standardization of media metadata (MPEG-7)
Broadband infrastructure and deployment
Intellectual property and economic models for
sharing and reuse of media assets

54
Todays Agenda

Review of Last Time
Metadata for Motion Pictures
Representing Video
Current Approaches
Media Streams
Discussion Questions
Action Items for Next Time

55
Discussion Questions (Davis)

John Snydal on Media Streams
What is the target audience of users
(annotators/retrievers) for Media Streams? In the
article the following groups are mentioned
Content providers
Video editors
News teams
Documentary film makers
Film archives
Stock photo houses
Video archivists
Video producers
(international audience)
(illiterate and preliterate people)
Is it possible that Media Streams could satisfy
the needs, goals and requirements of all of these
groups, or would it be more appropriate to
develop separate, tailored applications for the
unique needs of each group?

56
Discussion Questions (Davis)

danah boyd on Media Streams
Icons require visual literacy. Icons are also
culturally constructed. Thus, for them to work as
an information access bit, people must learn the
visual language it is not inherent. What are
the social consequences of a system dependent on
unfamiliar cues?

57
Discussion Questions (Davis)

danah boyd on Media Streams
Films are constructed narratives. But most
commonplace storytelling is not. Even in a
creative form, people often piece together found
objects instead of finding objects to fit their
story. (Think teenage girls making collages out
of the latest YM.) Storytelling also happens
around media far more than through media (i.e.
telling a story about a picture rather than using
a collection of pictures to tell a story). My
guess is that this social phenomenon goes beyond
the retrieval issues. Do you think that Media
Streams would encourage new behavior regarding
storytelling or will it only be useful for those
with a constructed narrative in mind? Why (not)?

58
Discussion Questions (Davis)

Jesse Mendelsohn on Media Streams
Media Streams does not allow iconic descriptions
of emotion or scene-interpretation. How would
someone searching stock footage for a
suspenseful scene of two men beating each other
go about doing it? The actual sense of suspense
and the act of beating cannot be iconified.
Does this limit Media Streams' ability or is
there a way around it within its capabilities as
described?

59
Discussion Questions (Davis)

Jesse Mendelsohn on Media Streams
In order for Media Streams to work well it relies
on a the availability of a very large and
extensive resource of well-annotated video. Is
the current annotation process too primitive
and/or time consuming to allow Media Streams to
work to its full potential? Will changing how
Media Streams can be used to annotate video or
changing video annotation methods in general make
Media Streams more effective?

60
Todays Agenda

Review of Last Time
Metadata for Motion Pictures
Representing Video
Current Approaches
Media Streams
Discussion Questions
Action Items for Next Time

61
Assignment 4.1

Assignment 4.1
Phone Metadata Design - Part 1
Due Oct 2

62
Next Time

Database Design (RRL)
Readings
Handouts in Class
Database Modeling and Design -- Ch. 2 The ER
Model - Basic Concepts (Teorey, T.J.)
Logical Database Design and the Relational Model
(F. R. McFadden, J. A. Hoffer)

Write a Comment

User Comments (0)