P1253814526wCsfi - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

P1253814526wCsfi

Description:

On-screen Narration. Voice Over. Narration Sections. Raw footage, ... Narrative Structures Hierarchy: On-Screen Narration. Discussion sections. Direct Narration ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 27

Provided by: vide

Category:

more less

Transcript and Presenter's Notes

Title: P1253814526wCsfi

1
Saras Shareable Rich Media Learning Object
Repositories and Management for e-Learning

Chitra Dorai
IBM T.J. Watson Research Center
New York
dorai_at_us.ibm.com
(Saras(wati), a Sanskrit word for flow of
knowledge/Goddess of Learning)

2
Overview of e-Learning Content Management Research
Manage learning assets of various types
Middleware for shareable learning object
repositories Metadata model creation from XML
schema
E-learning media semantic analysis for metadata
generation SCORM and MPEG-7 conformant asset
metadata model Search and browse client interfaces
Learning Management System
Learning Authoring Tool
Search Browse Client LO ingest
E-Learning Media Analyzer
Content Manager
SCORM / MPEG-7 Data Model
Metadata
Multimodal narrative structure analysis for
partitioning of instructional media
Asset Repository
Asset Repository
Asset Repository
Video
Narration
Dialog,
raw footage, text, ...
Text, Images
sections
interviews,...
Course catalogs, Student Assessments
Audio, Video
On-screen
Voice Over
narration
Discussion
Direct
Assistive
Uninterrupted
Interrupted
Linkage
Sections
Narration
Narration
Voice Over
Voice Over
Sections
(
DD
)
(
DN
)
(A
N
)
(UV
)
(IV
)
(LF
)
3
Project Goals

Develop SCORM support technologies
Enable generic content repositories (CMv8 and
DB2) to support standards compliant e-learning
and transform into shareable and interoperable
learning object repositories
Analyze instructional media for automated
SCORM/MPEG-7 compliant metadata generation

4
E-Learning and Standards

The Department of Defense (DoD) established
Advanced Distributed Learning (ADL) initiative in
1997.
ADL develops strategy for using learning and
information technologies to modernize education
and training on the Web, and to promote
e-learning standardization.
SCORM (Shareable Content Object Reference Model)
ADL reference model for shareable learning
content objects that enable interoperability,
accessibility and reusability of Web-based
learning content.
Content Aggregation Model LO Metadata, Content
Packaging
SCORM is built on many e-Learning standardization
efforts --- AICC, IMS, IEEE LOM (became a
standard in 06/02), ARIADNE.

5
SCORM LOM Overview

Nine learning object metadata categories from
IEEE LOM specification
General, Lifecycle, Meta-metadata, Technical,
Educational, Rights, Relation, Annotation, and
Classification
IMSs XML binding specification for metadata
representation
Describe three content model components
Asset, Sharable Content Object (SCO), Content
Aggregation

6
Enabling Content Repositories for e-Learning

Objective
Develop middleware tools to enable content
management products (IBM CM v8) and databases
(DB2) for standards-based e-Learning archival and
for supporting SCORM-compliant learning object
metadata.
Creation of SCORM compliant learning object
meta-data model on a repository
Automated storage of learning objects and their
meta-data in the content repository
Search and retrieval of learning objects based
on their meta-data

7
E-Learning Content Management with Content Manager
8
Meta-data Generation Pages
9
Automated Instructional Media Analysis

Objectives
Develop technologies for standards-based
e-learning content tagging, supporting shareable
and searchable learning object repositories with
rich media.
Rich instructional media analysis for automated
extraction of learning objects and their metadata
from media for content-based search and browse

10
Problem with the State of the Art

The user seeks semantic similarity, the
multimedia database can only provide similarity
on data processing
Existing content annotation/management systems
cannot ensure reliable content location and
access
Fall far short from the expectations of users
Semantic gap
Generic, low-level annotations that deal only
with characterizing perceived content, not the
meaning of it
Lack of structure in content organization for
non-linear navigation

11
Our Approach to Media Semantics Analysis

New Research Approach
Computational Media Aesthetics is the
algorithmic study of visual and aural elements in
media and associated analysis of the principles
that underlie their manipulation in the creative
art of clarifying and interpreting some event for
an audience.
Best semantic grid for media interpretation is
that within which its creators work - Derive
meaning from the production grammar, aesthetic
conventions used
Create tools for understanding high-level
semantic constructs in a domain by interpreting
the data with its makers eye, exploiting media
production methods for their perceptual and
interpretive guidance.

Focus Areas
Motion picture analysis for affect and story
essence using film grammar (recognized w best
paper awards)
e-learning Multimodal algorithms to parse and
structure audiovisual content in media for
content distillation nonlinear browsing
Multigranular media narrative segmentation to
generate annotate reusable assets

Example 2 - Titanic Movie Analysis for Tempo
Content Repository
Media Semantic Analyzer
Metadata
Example 1 - Multimodal analysis for extracting
hierarchy of narrative structures in
education/training video
Video
Dialog,
raw footage, text, ...
Narration
sections
interviews,...
On-screen
Voice Over
narration
Discussion
Direct
Assistive
Uninterrupted
Interrupted
Linkage
Tempo in Titanic Tempo ebb and flow and
associated story elements and events
automatically deconstructed
Sections
Narration
Narration
Voice Over
Voice Over
Sections
(
DD
)
(
DN
)
(A
N
)
(UV
)
(IV
)
(LF
)
12
Example Narrative Structure Based Segmentation of
Education and Training Videos Problem Statement
Automatically structuralize instructional media
through high-level semantics-based video
partitioning and content tagging for effective
segment search, access, and browse services in
e-learning content management systems
Joint Work with Dinh Q. Phung and Svetha
Venkatesh, Curtin University of Technology, W.
Australia
13
Narrative Structures Hierarchy
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
14
Narrative Structures Hierarchy Discussion
Sections
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
15
Narrative Structures Hierarchy On-Screen
Narration
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
16
Narrative Structures Hierarchy Voice Overs
The audio track is dominated by the voice of the
narrator, but without their appearances (no faces)
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
smooth and continuous
interrupted
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
17
Narrative Structures Hierarchy Linkage Sections
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
18
Visual Processing
S f1, f2, , fN Sequence of frames from
shots in a video for face detection
Two frame sequences from a shot are used
Uniformly sampled and key frames sequence
Detect faces in frames using CMUs face
detector software Feature 1 How many faces --
How many frames contain faces as a proportion of
the total frames in a shot ? Feature 2 Avg.
face areas -- If there is a face, how big is
the face?
19
Audio Processing
Characterize dominance of speech in audio tracks
of shots
Cluster audio clips into two classes and assume
the larger cluster as one of clips with speech
domination
N total of audio clips within a shot Nv
of clips classified as voice-dominated Va
voice activity Nv/N

Classify shot audio into voice (V), no-voice
(N) or mixture of two (M)
Is the voice consistently delivered ?
New voice connectivity feature Number of
contiguous speech-dominant clips normalized by
the shot length.

20
Classification

Decision Trees as machine learning classifiers
for final labeling of narrative structures
C4.5 algorithm to train and test decision trees
First learn all six classes at the first children
level and test accuracy of labeling
Propose a two-level decision tree for improved
performance

21
Experimental Results Confusion Matrix for Six
Classes
Experimental Results
Average classification result is high 91.6
22
Exp. Results (cont.)
VO with presences of many faces (meetings,
party,..) accounts for most of misclassification
Results are very good for classes DD, DN, AN
and UV. However, poor for classes IV and LF
Solution group IV, LF and UV into a group G
and study separately
23
Exp. Results (cont.)
24
Exp. Results (cont.)
Over-fitting is the problem identified in G due
to UV instances outnumbering IV and LF
To solve the problem to a certain extent,
reduce number of UV such that number of instances
of (IV, UV, LF) are approx. the same, and train
with C4.5
25
Conclusion

Novel narrative structure based analysis for
segmentation of education and training videos
Hierarchical DT-classification system achieves an
overall accuracy of 84.7
Focus on higher level semantics such as
segmentation of topics
Work is underway
Map media objects to LOs
Algorithms for support of both SCORM and MPEG-7
compliant XML metadata

26
Acknowledgements