Title: P1253814526wCsfi
1 Saras Shareable Rich Media Learning Object
Repositories and Management for e-Learning
- Chitra Dorai
- IBM T.J. Watson Research Center
- New York
- dorai_at_us.ibm.com
- (Saras(wati), a Sanskrit word for flow of
knowledge/Goddess of Learning)
2Overview of e-Learning Content Management Research
Manage learning assets of various types
Middleware for shareable learning object
repositories Metadata model creation from XML
schema
E-learning media semantic analysis for metadata
generation SCORM and MPEG-7 conformant asset
metadata model Search and browse client interfaces
Learning Management System
Learning Authoring Tool
Search Browse Client LO ingest
E-Learning Media Analyzer
Content Manager
SCORM / MPEG-7 Data Model
Metadata
Multimodal narrative structure analysis for
partitioning of instructional media
Asset Repository
Asset Repository
Asset Repository
Video
Narration
Dialog,
raw footage, text, ...
Text, Images
sections
interviews,...
Course catalogs, Student Assessments
Audio, Video
On-screen
Voice Over
narration
Discussion
Direct
Assistive
Uninterrupted
Interrupted
Linkage
Sections
Narration
Narration
Voice Over
Voice Over
Sections
(
DD
)
(
DN
)
(A
N
)
(UV
)
(IV
)
(LF
)
3Project Goals
- Develop SCORM support technologies
- Enable generic content repositories (CMv8 and
DB2) to support standards compliant e-learning
and transform into shareable and interoperable
learning object repositories - Analyze instructional media for automated
SCORM/MPEG-7 compliant metadata generation
4 E-Learning and Standards
- The Department of Defense (DoD) established
Advanced Distributed Learning (ADL) initiative in
1997. - ADL develops strategy for using learning and
information technologies to modernize education
and training on the Web, and to promote
e-learning standardization. - SCORM (Shareable Content Object Reference Model)
ADL reference model for shareable learning
content objects that enable interoperability,
accessibility and reusability of Web-based
learning content. - Content Aggregation Model LO Metadata, Content
Packaging - SCORM is built on many e-Learning standardization
efforts --- AICC, IMS, IEEE LOM (became a
standard in 06/02), ARIADNE.
5SCORM LOM Overview
- Nine learning object metadata categories from
IEEE LOM specification - General, Lifecycle, Meta-metadata, Technical,
Educational, Rights, Relation, Annotation, and
Classification - IMSs XML binding specification for metadata
representation - Describe three content model components
- Asset, Sharable Content Object (SCO), Content
Aggregation
6Enabling Content Repositories for e-Learning
- Objective
- Develop middleware tools to enable content
management products (IBM CM v8) and databases
(DB2) for standards-based e-Learning archival and
for supporting SCORM-compliant learning object
metadata. - Creation of SCORM compliant learning object
meta-data model on a repository - Automated storage of learning objects and their
meta-data in the content repository - Search and retrieval of learning objects based
on their meta-data
7E-Learning Content Management with Content Manager
8Meta-data Generation Pages
9Automated Instructional Media Analysis
- Objectives
- Develop technologies for standards-based
e-learning content tagging, supporting shareable
and searchable learning object repositories with
rich media. - Rich instructional media analysis for automated
extraction of learning objects and their metadata
from media for content-based search and browse
10Problem with the State of the Art
- The user seeks semantic similarity, the
multimedia database can only provide similarity
on data processing - Existing content annotation/management systems
cannot ensure reliable content location and
access - Fall far short from the expectations of users
Semantic gap - Generic, low-level annotations that deal only
with characterizing perceived content, not the
meaning of it - Lack of structure in content organization for
non-linear navigation
11Our Approach to Media Semantics Analysis
- New Research Approach
- Computational Media Aesthetics is the
algorithmic study of visual and aural elements in
media and associated analysis of the principles
that underlie their manipulation in the creative
art of clarifying and interpreting some event for
an audience. - Best semantic grid for media interpretation is
that within which its creators work - Derive
meaning from the production grammar, aesthetic
conventions used - Create tools for understanding high-level
semantic constructs in a domain by interpreting
the data with its makers eye, exploiting media
production methods for their perceptual and
interpretive guidance.
- Focus Areas
- Motion picture analysis for affect and story
essence using film grammar (recognized w best
paper awards) - e-learning Multimodal algorithms to parse and
structure audiovisual content in media for
content distillation nonlinear browsing - Multigranular media narrative segmentation to
generate annotate reusable assets
Example 2 - Titanic Movie Analysis for Tempo
Content Repository
Media Semantic Analyzer
Metadata
Example 1 - Multimodal analysis for extracting
hierarchy of narrative structures in
education/training video
Video
Dialog,
raw footage, text, ...
Narration
sections
interviews,...
On-screen
Voice Over
narration
Discussion
Direct
Assistive
Uninterrupted
Interrupted
Linkage
Tempo in Titanic Tempo ebb and flow and
associated story elements and events
automatically deconstructed
Sections
Narration
Narration
Voice Over
Voice Over
Sections
(
DD
)
(
DN
)
(A
N
)
(UV
)
(IV
)
(LF
)
12Example Narrative Structure Based Segmentation of
Education and Training Videos Problem Statement
Automatically structuralize instructional media
through high-level semantics-based video
partitioning and content tagging for effective
segment search, access, and browse services in
e-learning content management systems
Joint Work with Dinh Q. Phung and Svetha
Venkatesh, Curtin University of Technology, W.
Australia
13Narrative Structures Hierarchy
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
14Narrative Structures Hierarchy Discussion
Sections
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
15Narrative Structures Hierarchy On-Screen
Narration
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
16Narrative Structures Hierarchy Voice Overs
The audio track is dominated by the voice of the
narrator, but without their appearances (no faces)
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
smooth and continuous
interrupted
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
17Narrative Structures Hierarchy Linkage Sections
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
18Visual Processing
S f1, f2, , fN Sequence of frames from
shots in a video for face detection
Two frame sequences from a shot are used
Uniformly sampled and key frames sequence
Detect faces in frames using CMUs face
detector software Feature 1 How many faces --
How many frames contain faces as a proportion of
the total frames in a shot ? Feature 2 Avg.
face areas -- If there is a face, how big is
the face?
19Audio Processing
Characterize dominance of speech in audio tracks
of shots
Cluster audio clips into two classes and assume
the larger cluster as one of clips with speech
domination
N total of audio clips within a shot Nv
of clips classified as voice-dominated Va
voice activity Nv/N
- Classify shot audio into voice (V), no-voice
(N) or mixture of two (M) - Is the voice consistently delivered ?
- New voice connectivity feature Number of
contiguous speech-dominant clips normalized by
the shot length.
20Classification
- Decision Trees as machine learning classifiers
for final labeling of narrative structures - C4.5 algorithm to train and test decision trees
- First learn all six classes at the first children
level and test accuracy of labeling - Propose a two-level decision tree for improved
performance
21 Experimental Results Confusion Matrix for Six
Classes
Experimental Results
Average classification result is high 91.6
22Exp. Results (cont.)
VO with presences of many faces (meetings,
party,..) accounts for most of misclassification
Results are very good for classes DD, DN, AN
and UV. However, poor for classes IV and LF
Solution group IV, LF and UV into a group G
and study separately
23Exp. Results (cont.)
24Exp. Results (cont.)
Over-fitting is the problem identified in G due
to UV instances outnumbering IV and LF
To solve the problem to a certain extent,
reduce number of UV such that number of instances
of (IV, UV, LF) are approx. the same, and train
with C4.5
25Conclusion
- Novel narrative structure based analysis for
segmentation of education and training videos - Hierarchical DT-classification system achieves an
overall accuracy of 84.7 - Focus on higher level semantics such as
segmentation of topics - Work is underway
- Map media objects to LOs
- Algorithms for support of both SCORM and MPEG-7
compliant XML metadata
26Acknowledgements
- Team
- Geetika Tewari (IBM TJW, currently at Harvard U)
- Norman Haas (IBM TJW)
- Austin Schilling (IBM SWG)