P1253814526wCsfi - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

P1253814526wCsfi

Description:

On-screen Narration. Voice Over. Narration Sections. Raw footage, ... Narrative Structures Hierarchy: On-Screen Narration. Discussion sections. Direct Narration ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 27
Provided by: vide
Category:

less

Transcript and Presenter's Notes

Title: P1253814526wCsfi


1
Saras Shareable Rich Media Learning Object
Repositories and Management for e-Learning
  • Chitra Dorai
  • IBM T.J. Watson Research Center
  • New York
  • dorai_at_us.ibm.com
  • (Saras(wati), a Sanskrit word for flow of
    knowledge/Goddess of Learning)

2
Overview of e-Learning Content Management Research
Manage learning assets of various types
Middleware for shareable learning object
repositories Metadata model creation from XML
schema
E-learning media semantic analysis for metadata
generation SCORM and MPEG-7 conformant asset
metadata model Search and browse client interfaces
Learning Management System
Learning Authoring Tool
Search Browse Client LO ingest
E-Learning Media Analyzer
Content Manager
SCORM / MPEG-7 Data Model
Metadata
Multimodal narrative structure analysis for
partitioning of instructional media
Asset Repository
Asset Repository
Asset Repository
Video
Narration
Dialog,
raw footage, text, ...
Text, Images
sections
interviews,...
Course catalogs, Student Assessments
Audio, Video
On-screen
Voice Over
narration
Discussion
Direct
Assistive
Uninterrupted
Interrupted
Linkage
Sections
Narration
Narration
Voice Over
Voice Over
Sections
(
DD
)
(
DN
)
(A
N
)
(UV
)
(IV
)
(LF
)
3
Project Goals
  • Develop SCORM support technologies
  • Enable generic content repositories (CMv8 and
    DB2) to support standards compliant e-learning
    and transform into shareable and interoperable
    learning object repositories
  • Analyze instructional media for automated
    SCORM/MPEG-7 compliant metadata generation

4
E-Learning and Standards
  • The Department of Defense (DoD) established
    Advanced Distributed Learning (ADL) initiative in
    1997.
  • ADL develops strategy for using learning and
    information technologies to modernize education
    and training on the Web, and to promote
    e-learning standardization.
  • SCORM (Shareable Content Object Reference Model)
    ADL reference model for shareable learning
    content objects that enable interoperability,
    accessibility and reusability of Web-based
    learning content.
  • Content Aggregation Model LO Metadata, Content
    Packaging
  • SCORM is built on many e-Learning standardization
    efforts --- AICC, IMS, IEEE LOM (became a
    standard in 06/02), ARIADNE.

5
SCORM LOM Overview
  • Nine learning object metadata categories from
    IEEE LOM specification
  • General, Lifecycle, Meta-metadata, Technical,
    Educational, Rights, Relation, Annotation, and
    Classification
  • IMSs XML binding specification for metadata
    representation
  • Describe three content model components
  • Asset, Sharable Content Object (SCO), Content
    Aggregation

6
Enabling Content Repositories for e-Learning
  • Objective
  • Develop middleware tools to enable content
    management products (IBM CM v8) and databases
    (DB2) for standards-based e-Learning archival and
    for supporting SCORM-compliant learning object
    metadata.
  • Creation of SCORM compliant learning object
    meta-data model on a repository
  • Automated storage of learning objects and their
    meta-data in the content repository
  • Search and retrieval of learning objects based
    on their meta-data

7
E-Learning Content Management with Content Manager
8
Meta-data Generation Pages
9
Automated Instructional Media Analysis
  • Objectives
  • Develop technologies for standards-based
    e-learning content tagging, supporting shareable
    and searchable learning object repositories with
    rich media.
  • Rich instructional media analysis for automated
    extraction of learning objects and their metadata
    from media for content-based search and browse

10
Problem with the State of the Art
  • The user seeks semantic similarity, the
    multimedia database can only provide similarity
    on data processing
  • Existing content annotation/management systems
    cannot ensure reliable content location and
    access
  • Fall far short from the expectations of users
    Semantic gap
  • Generic, low-level annotations that deal only
    with characterizing perceived content, not the
    meaning of it
  • Lack of structure in content organization for
    non-linear navigation

11
Our Approach to Media Semantics Analysis
  • New Research Approach
  • Computational Media Aesthetics is the
    algorithmic study of visual and aural elements in
    media and associated analysis of the principles
    that underlie their manipulation in the creative
    art of clarifying and interpreting some event for
    an audience.
  • Best semantic grid for media interpretation is
    that within which its creators work - Derive
    meaning from the production grammar, aesthetic
    conventions used
  • Create tools for understanding high-level
    semantic constructs in a domain by interpreting
    the data with its makers eye, exploiting media
    production methods for their perceptual and
    interpretive guidance.
  • Focus Areas
  • Motion picture analysis for affect and story
    essence using film grammar (recognized w best
    paper awards)
  • e-learning Multimodal algorithms to parse and
    structure audiovisual content in media for
    content distillation nonlinear browsing
  • Multigranular media narrative segmentation to
    generate annotate reusable assets

Example 2 - Titanic Movie Analysis for Tempo
Content Repository
Media Semantic Analyzer
Metadata
Example 1 - Multimodal analysis for extracting
hierarchy of narrative structures in
education/training video
Video
Dialog,
raw footage, text, ...
Narration
sections
interviews,...
On-screen
Voice Over
narration
Discussion
Direct
Assistive
Uninterrupted
Interrupted
Linkage
Tempo in Titanic Tempo ebb and flow and
associated story elements and events
automatically deconstructed
Sections
Narration
Narration
Voice Over
Voice Over
Sections
(
DD
)
(
DN
)
(A
N
)
(UV
)
(IV
)
(LF
)
12
Example Narrative Structure Based Segmentation of
Education and Training Videos Problem Statement
Automatically structuralize instructional media
through high-level semantics-based video
partitioning and content tagging for effective
segment search, access, and browse services in
e-learning content management systems
Joint Work with Dinh Q. Phung and Svetha
Venkatesh, Curtin University of Technology, W.
Australia
13
Narrative Structures Hierarchy
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
14
Narrative Structures Hierarchy Discussion
Sections
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
15
Narrative Structures Hierarchy On-Screen
Narration
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
16
Narrative Structures Hierarchy Voice Overs
The audio track is dominated by the voice of the
narrator, but without their appearances (no faces)
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
smooth and continuous
interrupted
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
17
Narrative Structures Hierarchy Linkage Sections
Raw footage, text,
Narration Sections
Dialog, interviews,
On-screen Narration
Voice Over
Discussion sections
Direct Narration
Assistive Narration
Un-interrupted VO
Interrupted VO
Linkage Sections
18
Visual Processing
S f1, f2, , fN Sequence of frames from
shots in a video for face detection
Two frame sequences from a shot are used
Uniformly sampled and key frames sequence
Detect faces in frames using CMUs face
detector software Feature 1 How many faces --
How many frames contain faces as a proportion of
the total frames in a shot ? Feature 2 Avg.
face areas -- If there is a face, how big is
the face?
19
Audio Processing
Characterize dominance of speech in audio tracks
of shots
Cluster audio clips into two classes and assume
the larger cluster as one of clips with speech
domination
N total of audio clips within a shot Nv
of clips classified as voice-dominated Va
voice activity Nv/N
  • Classify shot audio into voice (V), no-voice
    (N) or mixture of two (M)
  • Is the voice consistently delivered ?
  • New voice connectivity feature Number of
    contiguous speech-dominant clips normalized by
    the shot length.

20
Classification
  • Decision Trees as machine learning classifiers
    for final labeling of narrative structures
  • C4.5 algorithm to train and test decision trees
  • First learn all six classes at the first children
    level and test accuracy of labeling
  • Propose a two-level decision tree for improved
    performance

21
Experimental Results Confusion Matrix for Six
Classes
Experimental Results
Average classification result is high 91.6
22
Exp. Results (cont.)
VO with presences of many faces (meetings,
party,..) accounts for most of misclassification
Results are very good for classes DD, DN, AN
and UV. However, poor for classes IV and LF
Solution group IV, LF and UV into a group G
and study separately
23
Exp. Results (cont.)
24
Exp. Results (cont.)
Over-fitting is the problem identified in G due
to UV instances outnumbering IV and LF
To solve the problem to a certain extent,
reduce number of UV such that number of instances
of (IV, UV, LF) are approx. the same, and train
with C4.5
25
Conclusion
  • Novel narrative structure based analysis for
    segmentation of education and training videos
  • Hierarchical DT-classification system achieves an
    overall accuracy of 84.7
  • Focus on higher level semantics such as
    segmentation of topics
  • Work is underway
  • Map media objects to LOs
  • Algorithms for support of both SCORM and MPEG-7
    compliant XML metadata

26
Acknowledgements
  • Team
  • Geetika Tewari (IBM TJW, currently at Harvard U)
  • Norman Haas (IBM TJW)
  • Austin Schilling (IBM SWG)
Write a Comment
User Comments (0)
About PowerShow.com