Title: Supervised by Prof. LYU, Rung Tsong Michael
1Department of Computer Science Engineering The
Chinese University of Hong Kong
LYU0102 XML for Interoperable Digital Video
Library
- Supervised by Prof. LYU, Rung Tsong Michael
Prepared by Chan Pik Wah, Pat Ngai Cheuk Han,
Table
2Outline
- Introduction to XVIP
- Overview of Project
- Extraction Techniques
- Face Detection
- Speech Recognition
- Multimedia Transformation Presentation
- XSL
- SMIL
- Transformation
- Problems Solutions
- Conclusion
3Motivations
- Rapid increase in the usage of multimedia
information - New approach DIGITAL VIDEO LIBRARY
Project Outline
4Motivations
- Little attention paying on video information
extraction and storage - Scalability of the system in terms of adding new
extraction components - Lack of a generic framework for presentation and
visualization of video information
Project Outline
5Overview of XVIP
Project Outline
6Achievements in last Semester
- 2 Extraction Techniques
- Scene Change
- VOCR
- Integrate data into XML
- XML Editor
- Knowledge Enrichment
Project Outline
7Achievements in this Semester
- 2 more extraction techniques
- Face Detection
- Speech Recognition
- New data integrated to XML
- XML to SMIL Transformer
Project Outline
8Extraction Techniques
XML
Video
Scene Change
VOCD
Face Detection
Speech Recognition
Extraction Techniques
9Face Detection
- Object-presence detections are also an important
technique. - Identify and index features to support image
similarity matching. Face detection is a good
example
Extraction Techniques
10Face Detection
- Name of people appearing in the video
- How they are interacting with the environment
- More searchable
Extraction Techniques
11Face Detection
- Neural Network-Based Algorithm
- The basic algorithm used for face detection
Extraction Techniques
12Face Detection
- Face Recognition
- Facial Expression Analysis
- Enrich the XML
- Easier for user to search the content of video
Extraction Techniques
13Speech Recognition
- Speech recognition technology can make any spoken
data useful for library indexing and retrieval
Extraction Techniques
14Speech Recognition Engine
Extraction Techniques
15Speech Recognition
- ViaVoice
- Error rate gt 50
Extraction Techniques
16Usage of XML
Indexing Searching
XML
Combine with other XML for Knowledge Enrichment
Exchange data with different application
Presentation
17Presentation of the video data
- XML is not presentable without processing
- HTML with images, but is static
- SMIL is good for multimedia presentation
- No existing tools for integrating different XML
data into a SMIL presentation - Current transformation language has
- a lot of limitations in transforming
- XML to SMIL
SMIL
18SMIL
- SMIL stands for Synchronized Multimedia
Integration Language is currently a W3C
Recommendation. - It is a markup language that can synchronize and
integrate multimedia. - It enables authors to specify when and what
should be presented. - RealPlayer, QuickTime, IE support
SMIL
19Advantages
- SMIL is text-based
- Easy to develop with a text editor
- Generate customized presentations
- Generate customized SMIL file based on
preferences recorded in the visitor's browser - SMIL effort is led by the W3C
- W3C tries to shape a specification that is
beneficial to all parties involved. - Avoid using container formats.
- SMIL can stream many media formats, no need to
merge clips into a single streaming file.
SMIL
20Timing and Synchronization
- Parallel element
- ltpargt
- lttext src"text/transcript.rt" region"transcript"
/gt - lttext src"text/mapdetail.rt" region"mapdetail"
/gt - ltvideo src"news.mpg" region"video"
fill"freeze"/gt -
- lt/pargt
- Sequence element
- ltseqgt
- ltimg src"pix/0.jpg" dur"15" region"scene"/gt
- ltimg src"pix/15.jpg" dur"5" region"scene"/gt
- ltimg src"pix/20.jpg" dur"7" region"scene"/gt
- ltimg src"pix/27.jpg" dur"4" region"scene"/gt
-
- lt/seqgt
SMIL
21XSL
- Stands for Extensible Stylesheet Language
- XSL is the language defined by the W3C to add
formatting information to XML data. - XSLT -- most commonly used XSL standard
- Transforms one XML document into another.
- Used in our FYP.
XSL
22Working Principle
XSL Stylesheet
Source Tree
Output
XSL
23Transformation Process
- Input files
- XML file generated by XVIP
- XML files of additional information
-
- Output files
- A SMIL file
- Some RealText files
Transformation
24Design 1
- Build with VC solely
- Read all the input files, get the information
- Create the output the files for the SMIL
presentation.
- Disadvantages
- Layout of the SMIL presentation need to be
hard-coded in the VC program. - The layout becomes hard to change and the
transformer becomes hard to extend. -
Transformation
25Design 1 with modification
- Modification
- Provide an additional file or interface as a
template for user to define the layout of SMIL
presentation. - Disadvantage
- The flexibility provided is still limited.
- Not a standard way to define a template.
Transformation
26Design 2
- Use XSLT assisting the transformation. User can
define his own template with XSL. - Advantages
- Program-independent
- Extensible
- Standard templates
- Limitations of XSLT
- It can only read one input data file and one XSL
file, then generate one output. - It cannot do combin-ation among files.
Transformation
27Design 2
- Solutions
- Knowledge Enrichment
- Combine additional information with the XML file
from XVIP before converting to SMIL - Creating output files
- Use separate XSL files to generate RealText files
- Use separate XSL files to generate layout of the
presentation and displaying order of objects in
different regions, then combine them to a SMIL
file
Transformation
28Knowledge Enrichment
Information of major cities
XML file from XVIP
Combined XML file
Transformation
29Combined XML file
- XML file contains information of major cities
that are related to the video.
- ltCOMBINEgt
- ltTIME begin"10" dur"11"gt
- ltNAMEgt??lt/NAMEgt
- ltDETAILgt??????????lt/DETAILgt
- ltAREAgtChinalt/AREAgt
- lt/TIMEgt
- ltTIME begin"21" dur"20"gt
- ltNAMEgt??lt/NAMEgt
- ltDETAILgt??????????lt/DETAILgt
- ltAREAgtAmericalt/AREAgt
- lt/TIMEgt
- lt/COMBINEgt
Transformation
30Create RealText files
- Geographical Information
- Biographical Information
- Video Transcript
Transformation
31Create SMIL file
Layout
Displaying order
Transformation
32Create SMIL file
SMIL Presentation
Combining the temporary files
Transformation
33Problems Solutions
- Problem 1
- The result from XSLT processor is in UTF-8
encoding format, but SMIL needs the format ANSI. - Solution
- Write a function UTF8toANSI for conversion.
Problems Solutions
34Problems Solutions
- Problem 2
- XSLT has limitation. It can only read one XML,
one XSL file and generate one output file. - Our transformation process has more than one
input files - Solution
- Do knowledge enrichment and produce a combined
XML result file before creating the output files.
Problems Solutions
35Conclusion
- XVIP contains
- Four video information modalities
- Scene change detection
- VOCD
- Speech recognition
- Face detection
- Information integration module with XML
- For storing the extracted video data in XML format
Conclusion
36Conclusion
- XML editor
- For editing the XML file generated
- Knowledge enrichment component
- For adding additional information to the
XML-based video data - XML to SMIL transformer
- For converting the XML-based video data into SMIL
presentation
Conclusion
37Conclusion
- XVIP
- provides multiple functions for extracting
video information - stores video information in a flexible and
scalable way - Comprises a transformer to generate
presentation on the information - Paper XVIP An XML-Based Video Information
Processing System, Michael Lyu, Edward Yau,
C.H.Ngai, P.W.Chan, was accepted by COMPSAC 2002.
Conclusion
38Q A