Title: An XML Object Database: Design Implementation and Applications
1An XML Object Database Design Implementation
and Applications
- Ching-Long Yeh ? ? ?
- Department of Computer Science and Engineering
- Tatung University
- Taipei 104, Taiwan
- ROC
2Introduction
- XML improves upon HTML in
- capturing the meaning of a document and
- extending the tag set.
- At the same time, it also reduces the complexity
of SGML. - It is believed that XML will soon be the standard
of data exchanges on the Web.
3Introduction
- Due to lack of indices in files, we are not able
to make full use of the meaning (or metadata) in
an XML document, if it is stored in a file. - Since an XML document can be easily viewed
according to the object-oriented model, a
promising solution is to employ object database
technology to manage the access of XML documents.
4Introduction
- In this talk, I will present our research work in
- the design and implementation of an XML object
DB, - an extensible template-based query interface to
accessing to XML object database, and - the applications implemented on the XML object
database - content-based video query system, and
- electronic commerce
5The Remainder of the Talk
- An Introduction to XML
- Design and Implementation of an XML Object
Database - An Extensible Template-based Interface
- A Content-Based Query Interface to Video Database
- XML Object Database and Electronic Commerce
6An Introduction to XML
7HyperText Markup Language
- HTML is a language used to create hyperlink text
in the WWW. - The text is presented according to a set of
predefined tags. - The definition of tags is based on the Document
Type Definition (DTD) of SGML. - In other words, HTML is an application of SGML in
the WWW.
8Standard Generalized Markup Language
- Central to SGML is the concept that documents
have structure, content, and format. - These three ingredients combine to form a
document.
9Content
- What is Content?
- Content is the actual data within a document.
- The words and illustrations that make up a
bicycle assembly manual are its contents.
10Format
- What is Format?
- Format consists of how the words, sentences, and
paragraphs are visually presented and
distinguished from one another within a document. - Boldface for title, italics for special terms,
and blank lines between sections are examples of
document formats. - People often confuse format with structure.
11Structure
Recipe
Title
Coconut Pudding
Ingredient List
Ingredient
Instruction List
Step
12Document Type Definition
- Defining structures in SGML
- The structure of a document ? its type ? is
defined by a document type definition, or DTD. - The DTD lays out the rules for a document through
the use of elements, attributes, and entities.
13Document Type Definition
lt!ELEMENT recipe -- ( title,
ingredientList,
instructionList)gt lt!ELEMENT title --
(PCDATA)gt lt!ELEMENT ingredientList --
(ingredient)gt lt!ELEMENT instructionList --
(step)gt lt!ELEMENT ingredient --
(PCDATA) gt lt!ELEMENT step -- (PCDATA)gt
14Document Instance
lt!DOCTYPE RECIPE PUBLIC recipe"
recipe"gt ltRECIPEgtltTITLEgtCoconut
Puddinglt/TITLEgt ltINGREDIENTLISTgt ltINGREDIENTgt
12 ounces coconut milklt/INGREDIENTgt
ltINGREDIENTgt 4 to 6 tablespoons sugar
lt/INGREDIENTgt ltINGREDIENTgt 4 to 6 tablespoons
cornstarch lt/INGREDIENTgt ltINGREDIENTgt 3/4 cup
water lt/INGREDIENTgt ltINGREDIENTLISTgt ltINSTRUCTIONL
ISTgt ltSTEPgt Pour coconut milk into saucepan.
lt/STEPgt ltSTEPgtCombine sugar and cornstarch
stir in water and blend well. lt/STEPgt
ltSTEPgtStir sugar mixture into coconut milk cook
and stir over low heat until
thickened. lt/STEPgt lt/INSTRUCTIONLISTgt lt/RE
CIPEgt
15HTML, SGML, XML
- HTML helped establish the Internet by providing a
universal way to present information. - However, HTML only addresses the presentation of
data. - Using SGML, user can add structure along with the
content of a document. - However, SGML has proven too heavy-weight for the
Internet.
16Extensible Markup Language
- The XML is a simple dialect of SGML.
- HTML is sufficient for sending web pages that are
viewed by human beings. - XML, however, adds the tags that enable computers
to understand, act on or process the information. - XML has been designed for ease of implementation
and for interoperability with both SGML and HTML.
17XML Application Profile
- Electronic commerce
- Electronic data interchange (EDI)
- Fine-grain content publishing
- Internet search engines
- Distributed application design
- etc.
18Data Type Requirements of Documents
- HTML
- One file per page
- Simple uni-directional linking
- XML
- Tens, hundreds or even thousands of objects per
page - Multiple DTDs
- Hierarchical structure and rich linking
- Query and navigation capabilities required
- Agents and business rules interact with the data
19Data Types of Storage File System
- File system
- Store monolithic stuff.
- Folder system on top of them
- Good at storing multimedia data
20Data Types of Storage Relational DB
- Relational database
- Tabular in nature
- Good at storing rows and columns of data like
spreadsheets and data from forms like invoices.
21Data Types of Storage Object-Oriented DB
- Object-oriented database
- Good at managing structured, hierarchical rich
linked information. - Thats exactly what XML is.
- XML is the object representation of data.
22Design and Implementation of an XML Object
Database
23System Architecture
24DTD Parser
25Parsing Result
26Schema Generation
27DI Parser
28DI Parser Generation
for each contentModel(ElementName,ContentStructure
) do generate the rule head for ElementName
generate the start tag for ElementName
generate the rule body for ContentStructure
generate the end tag for ElementName generate
the semantic action
29Implementation
- We have built a prototype of the system using LPA
Win-Prolog V3.5 on personal computer. - It consists of a DTD parser, Schema generator and
DI parser generator. - After creating the physical store and class
family for XML documents, we can proceed to build
the database schema for DTD by executing the ODQL
codes generated by the DTD schema generator.
30(No Transcript)
31(No Transcript)
32An Extensible Query-By-Template Interface to
Accessing XML Document Database
33Motivation
- Vastness of search results on current WWW search
engines - Textual-based query language with a simple
English-like syntax is inconvenient for the user. - Current user interfaces primarily use form-based
queries.
34Goal
- The goal is to design a convenient interface for
user to access XML document without knowing the
knowledge of the document types. - The interface will relieve user from typing
complex query language. - The interface should be web-based and
platform-independent.
35System Architecture
Visual Query Interface
36Related Knowledge
- XML( eXtensible Markup Language )
- Jasmine Database
- Structured Document Database
- Visual Query Facility
- Java Language
37 XML
- XML has the potential to be the standard of WWW
document and electronic data interchange of the
future. - In XML, the structure of the document is defined
using a Document Type Definition (DTD).
38Recipe DTD
39Recipe Document
40Jasmine Database
- Jasmine is a multimedia object-oriented database
management system(DBMS) with built-in
Web-connectivity. - It provides a powerful Object Data Query
language(ODQL) which is very similar to the ODMG
2.0(ODL,OML,OQL) standard. - It provides an extensive array of application
development tools, which includes JADE, ActiveX,
C-API, J-API, Weblink.
41 Strucutured Document Database
- Combine structured document with OODB technology
- VERSO project at INRIA
- News-On-Demand Application
- Document Database from GMD-IPSI
- Other Related Document Database
- Open Text
- DocBase
- The Poet XML Repository
- ODIs eXcelon
42Visual Query Facility
- Query By Example (QBE)
- The interface is composed of tabular skeletons
representing tables in the database. - Query By Forms (QBF)
- The interface is presented with a list of
searchable fields, each with an entry area that
can be used to indicate the search string. - Query By Template (QBT)
- The interface is displayed a template for a
representative entry of the database. User
express their queries by indicating the search
keywords in the appropriate regions of the
template.
43Document Database Schema
44Building an XML Document Database
45 Example of Image-based QBT
46Limits of Image-based QBT
- The image template is divided into regions, each
of which corresponds to an element in the
document structure. - Associated with each regions is the query action.
- Its significant drawback is the lack of
flexibility in the template creation. - It is difficult to automate the task of
reconfiguration of query action associate with
the new template. - A single interface template for all types of
document is probably not a good idea.
47Concept of eXtensible QBT (XQBT)
- The environment provides a template creator which
consists of a DTD schema browser and a scene for
presentation design. - The environment aims at providing automatic
configuration of query actions associated with
presentation of template. - The design of the template presentation must be
tightly coupled with the arrangement of document
data stored in the repository. - The component in the design of presentation must
be properly associated with corresponding nodes
in the object database schema.
48Environment for XQBT
49Template Creator
- The template creator consists of a DTD schema
browser a scene for template draft, and
functional area. - The template creator in mainly relied on a DTD
schema browser, which corresponds to the database
schema. - The scene is a visual display area where the
designer can organize a template draft for
certain purpose. - The content of template draft is exported to a
file, which contains the template presentation
and additional information.
50Template Creator
Functional Area
51Exported File
- The file contains the information about the
template presentation property associate with
each element. - Each element is appended with the path
information in the database schema, in order that
the template executor, which can make use of the
information to carry out query actions.
52Template Executor
- The template executor loads the exported file and
presents the template as was originally designed
in the template creator. - The path of each node in the DTD schema browser
is used to carry out the query action required by
the user.
53Comparison between Image-based QBT and XQBT
XQBT
QBT
- The template is an image by taking a photograph
or by scanning from existing pages. - The query action associate with each region is
hand-coded. - Either planar or nested template is limited to
region level that is not very deep.
- The template is generated for a representative
document. - The associated query action can be generated
automatically for the interface program. - The designer can change the template to meet the
requirement of various region level.
54Implementation
- Java Proxies (Jp) for Jasmine
- Jp allows developer to build their application in
J-API, and take advantage of Jasmine class
libraries.
55The interface for our XML document database
ingredient
Ingredient name
Ingredient step
56Query Formulation
- Such searches are performed by simply entering
the search string in the corresponding region of
the template.
57Query Formulation (cont.)
58Query Formulation (cont.)
- The multiple condition are specified in different
regions which are combined using logical
conjunctions(such as AND, OR, NOT). - The approach used to derive the logical
expression - from its graphical representation is using
the default precedence. - User can insert parentheses as necessary in the
- condition box, which used in QBE interface.
59The results of the query formulation
60Template Creator
61Template Executor
62Future Works
- A first step towards enhancement is improved the
- template ability in order to support more
complex - query facility.
- An enhancement of the template creator would
- be to provide more sophisticated facility
for manage - the template, such as layout, size, color,
position, etc. - We will try to include other document types to
test - the applicability of the XQBT.
63A Video Content Query SystemBased on an OODB
64Introduction
- We store the content description of video in an
OODB (JasmineCA) to provide the user to query
video segments according to the video content. - A VDBMS needs to address the following important
issues - Video data modeling
- Video data insertion
- Video data indexing
- Video data query and retrieval
65Introduction(cont.)
- Overview of System Architecture
66Related Research
- R. Jain and A. Hampapur, Metadata in Video
Databases. - Application of Video
- Query Dimensionality
- M. Carrer, L. Ligresti, G. Ahanger, and T.D.C.
Little, An Annotation Engine for Supporting
Video Database Population. - Video Segmentation
- A Newscast Video Data Model
67Related Research
- E. Hwang and V. S. Subrahuanian, Querying Video
Libraries. - A Formal Model of Video Data Structures
- R. Hjelsvold and R. Midtstraum, Modelling and
Querying Video Data. - Structure of the Generic Video Data Model
68Related Research
- Dublin Core-based Video Description Scheme
- Hunter proposes to extend part of Dublin Core
elements, i.e., Type, Description, Format,
Relation, and Coverage, to cope with video
content metadata requirements. - Hunter breaks film and video documents into the
following hierarchical segments
Sequence Scene Shot
Frame
Object/Actor/Person
69Research Issues
- Video Data Modeling
- Characteristics of Video Data
- Video Logical Structure
- Content of Video Data
70Research Issues
- Video data insertion
- Extract key information
- Break the given video stream into a set of basic
units. - Manually or semi-automatically annotate the video
unit. - Index and store video data into the video database
71Research Issues
- Video Data Indexing
- Annotation-Based Indexing
- Feature-Based Indexing
- Domain-Specific Indexing
72Research Issues
- Video Data Query
- Query content
- Query matching type
- Query granularity
- Query behavior
- Query specification
73Research Issues
- Content Description Language
- Every video subset of a video has a set of
associated objects and associated activities,
which can be what we may describe. - Content description language is used to describe
the video content and video structure. - An example is to describe video content by
applying qualified Dublin Core to a hierarchical
segmented video structure .
74System Architecture
System Architecture
75(No Transcript)
76System Architecture
- Video Content Annotating Program
- Output in two formats one is in XML and the
other is in an object database language, the
Object Data Query Language (ODQL) of Jasmine. - The video content annotating program of the
system employs a bottom-up approach to guide the
human annotator to describe the content of video
segments.
77System Architecture
Bottom-Up
78System Architecture
- Query Interface
- The matching type used in the query interface is
keyword match, which is a kind of exact match. - The user specifies the keyword he or she wishes
to find in the value field of each attribute, the
interface then looks for the content description
of each SHOT in the object database in order to
find the satisfied ones.
79Implementation
- The Jasmine Object-Oriented Database
- We use Jasmine OODB CA to store the video
content description. - Jasmine provides the ODQL to define, manipulate,
and query the data in OODB - An ODQL program
SHOT shot0 shot0SHOT.new(shotnumber0,name"Sh
ot_0",filename"F\\G.ARMANI\\Shot1.MPG",starttim
e"0000",stoptime"0013",commentcomment0)
80Implementation
- The Content Description of Fashion Show Video
lt!ELEMENT VIDEO (ABSTRACT?,SEQUENCE)gt
lt!ELEMENT SEQUENCE (DESIGNER,SCENE)gt
lt!ELEMENT SCENE (TOPIC,BACKGROUND,SHOT)gt
lt!ELEMENT SHOT (CLOTHES,ACCESSORY)gt
lt!ELEMENT ABSTRACT (PCDATA)gt lt!ELEMENT
BACKGROUND EMPTYgt lt!ELEMENT CLOTHES
EMPTYgt lt!ELEMENT ACCESSORY EMPTYgt lt!ELEMENT
DESIGNER EMPTYgt lt!ELEMENT TOPIC
EMPTYgt
81Implementation
lt!ATTLIST SCENE scenenumber NUMBER
REQUIRED name CDATA IMPLIED
starttime CDATA REQUIRED endtime
CDATA REQUIREDgt
lt! ATTLIST CLOTHES name CDATA
REQUIRED department CDATA REQUIRED
type (MenWomenChildren) Men color
CDATA REQUIRED season (ArbitrarySpringSu
mmerFallWinter) Arbitrary
fabric CDATA IMPLIED narrative CDATA
IMPLIED gt
82(No Transcript)
83Implementation
- Video Content Annotation Program
84Implementation
85(No Transcript)
86(No Transcript)
87(No Transcript)
88Video Content Query Interface
89Conclusion
- A video content-based query system
- A hierarchical scheme for the content
descriptions of fashion show video - The content description of video is made by using
XML DTD. - The annotation of video content description based
on the scheme is stored in an object database - We then build the form-based query interface on
the object database.
90Future Work
- The problem of video transmission.
- How detailed should we describe the video
content? - In the future we will develop a Query-By-Template
interface with key frame image, and an iterative
query interface to allow user to incrementally
refine their queries until the satisfying result
is obtained. - The study of how to produce the query interface
automatically with different kinds of video.
91An Agent-Based EC System Based on an OODB
92Background
93Software Agent
- Properties
- Autonomous, Reactive, Goal-driven, Persistent,
Social, Intelligent, Mobile - Agency
- A collection of software agents that communicate
and cooperate with each other is called an agency.
94XML
- XML(eXtensible Markup Language) is a description
language for structural documents it is a markup
language, but unlike HTML it does not keep a
fixed set of tags.
95KQML
- KQML (Knowledge Query and Manipulation Language)
is a language and protocol for exchanging
information and knowledge. KQML is both a message
format and a message-handling protocol to support
run-time knowledge sharing among agents. - A KQML message is called a performative, in that
the message is intended to perform some action by
virtue of being sent.
96System Architecture
97Agency
- Agent
- Facilator
- Authentication
- Message Handler
- Reasoning
- Document Handler
- Resource Manager
- KQML Interpreter
98Analysis of System Components
- Facilitator
- authentication, create agents, delete agents,
sleep agents, resume agents, and communicate with
others. - Agent
- action handling, display result, and communicate
with others. - Message Handler
- send message, receive message, and parse message.
99Future Work
- How the schema hierarchy of XML object database
affects the performance of accessing the
document. - Extensible template-based query interface
- Applications of XML object database in electronic
commerce