Title: The Mediation of Information using Xml project
1 The
Mediation of
Information using Xml
project
- BYAmir Atauna Michael Brautbar
2What is a Mediator and Why is it Needed?
- Huge quantity of information on the web.
- Users wants to find information on the web that
is related to their problem. - Problem The information is distributed across
many sources, each source provides a
different interface and exports the data
in a different format.
3- Mediator systems will assist the users by
providing them integrated views of the data
they are interested in. - Example a Web-shopping mediator will provide
to the Web value-shopper a view where the
lowest prices for each product are
provided. - The goal of MIX is to facilitate the development
of such mediators.
4Is the mediator concept new?
- No, the TSIMMIS mediator uses the semistructured
model OEM (Object Exchange Model). - Wrappers export the source data translated to
OEM. - The mediator export an integrated view of the
wrapper data based on a view definition provided
by the administrator.
5- The view definition is expressed in the Mediator
Specification Language (MSL). - At runtime the mediator receives queries, which
refer to the view objects and expressed in MSL. - First, the incoming query is combined with the
view definition into a query which refers
directly to source data. - Then the optimizer finds a plan to execute the
latter query by sending queries to the wrappers
and combining their results in the mediator.
6- The wrappers translate the queries they receive
into queries understood by the sources. - The MSL specifications can be very loose on
the amount of info they provide on the structures
they provide. - This is a valuable feature when working with
dynamic semistructured sources. - There are two weak points
- - The user does not know the structure ot the
underlying data and this impedes his efforts to
formulate a reasonable queries.
7- Second - the mediator may not have complete or
any information of the metadata and structure of
each source and this leads to a heavy loss of
performance -
- MIX solves this problems with DTDs
8The Philosophy of MIX The Web as a Distributed
Database
- The developer of this system strongly believe
that the Web will emerge as a distributed
database and XML (or some extension/modification
of XML) will be the data model of this huge
database. - The MIX mediator views XML as a database model
and uses the mediator concept as known in the DB
area.
9(No Transcript)
10- Sources will be exporting an XML view of their
data along with semantic descriptions of the
content (Source DTDs) and descriptions of the
interfaces (XML queries) that may be used for
accessing the data. - Users and applications will then be able to
query these view documents using some XML query
language. - The MIX mediator uses the source DTDs to assist
the user in query formulation and the query
processors in running queries more efficiently.
11- MIXs query evaluation is done in a lazy approach
(on demand), i.e. XML queries (expressed in XMAS)
are unfolded and rewritten at runtime. - In the other approach, the eager (warehousing),
the data integration occurs in a separate
materialization step, before the actual user
queries.
12- Conventional data repositories are not expected
to be converted to XML. - Wrappers technologies that allow us to logically
view an information source (which may be a
relational database, a collection of html pages,
or even a legacy information system) as a large
XML source. - The wrappers are able to translate XMAS queries
into queries or commands that the underlying
source understands. - They are also able to translate the result of the
source into XML.
13Creating Mediated Views Using MIX mediator and
Querying them with BBQ
- The XML documents have to be integrated.
- One goal of MIX is to develop integrated views
and fast. - For this the developers use XMAS as the view
definition language.
14(No Transcript)
15- The BBQ (Blended Browsing and Querying ) user
interface enables the users to formulate XMAS
queries using a GUI that reminds of
query-by-example interfaces in relational
database
16The MIX Architecture
17- The graphical user interface BBQ allows the
construction of queries. - In order to accomplish the integration, the MIX
mediator comprises several modules. - - Its main inputs are XMAS queries generated by
the BBQ, and the mediator view definition (also
in XMAS) for the integrated view. - - The resolution module resolves the user query
with the mediator view definition, resulting in a
set of unfolded XML queries that refer to the
wrapper views.
18- - The simplification module is used to further
simplify the XML queries based on the underlying
XML DTDs. - - The DTD inference module can be used to
automatically derive view DTDs from source DTDs
and queries for supporting the integration task
of the mediation engineer (This is done
off-line). - - The translation module maps the simplified
queries into the XMAS algebra. -
19- - The optimization module can be used to
further optimize the XMAS queries. - - The execution engine issues XMAS queries
against the wrappers, and returns the requested
XML data to the user, after integrating the
retrieved data according to the mediator view. - The wrappers are used to export data in a uniform
format to the mediator -
20 The XMAS Language
- The data model of the sources of the mix mediator
are valid XML docs - We need a way to formulate queries that can
relate to data in multiple XML docs - XML document structure may be tightly structured
as in a relational databases or to
have no structure at all
21The XMAS Language Cont
- So we need a query language that is as strong
as relational algebra - Preferable features of the language
- Simple formulation of queries
- Will logically describe what we
want to say
22Solution XMAS
- XMAS stands for XML matching and structuring
language - Declarative ,high level language
- Build upon ideas of languages like XML -
QL , MSL.
23General Structure Of An XMAS Query
- CONSTRUCT head WHERE
body1 IN source1
(AND OR NOT ) body2 IN source2
(AND OR NOT ) body3 IN source3
...
(AND OR NOT ) bodyn IN sourcen
(AND OR) predicate
24- Body (the where clause)
specifies the data which is to be extracted from
the XML sources - Head (the construct clause)
describes how the extracted data is arranged into
a new answer XML document. In this part we may
use the collection operator and the ordering
operator. (Will be
explained later on) - ( Body and head roughly resembles the select and
where in SQL)
25- Predicate defines conditions on the variables
occurring in the sources - Lets look at an example
-
population)
type (pcdata) (pcdata)
26 For Example We Can Have The Following
XML Doc For That DTD
- ip91901 alpine
rural/town 13238n p91903 alpine r
ural/town 4783
27 Query Example
- Suppose we want to retrieve all names of big
neighborhoods ,say where population is greater
than 30000 - In XMAS we can write the following
query
28- Construct
-
-
- n
- N
-
- Where
-
-
- n
- p
-
-
- IN "http//www.Pnaci.Edu/dice/mix/tutorial/neighbo
rhoods.Xml - And p30000
29 How Does It Work
- Lets look at the body of the query above. This
tree pattern mimics the tree structure of the
input XML document - The variables N and P are used to get a hold
of the data at the corresponding locations in the
tree structure representing the input XML doc.
In other words , the tree pattern specifies
that the root element of the XML doc is
of type big_neighborhoods
30- Within big_neighborhoods there must be some
big_neighborhood subelement ,which itself contain
name and population subelements - In this way , the tree pattern specifies a list
of pairs of variable bindings for N and P - From this list we want to select only those which
satisfy the condition P 30000 - To summarize , the body defines a list (n1
p1) ... (nk pk) of all variable bindings for
(N,P), which match (or satisfy) the body
31- The head consists of an XML tree pattern which
contains some or all the of the variables of the
body - In the example above , the head define a root
element big_neighborhoods with a big_neighborhood
subelement, having in turn a name subelement.
The latter is used to hold the bindings for
N which have been obtained through the body - Using N expresses that we want to have only
one big_neighborhoods element that has a number
of big_neighborhood subelements. (One for each
name N obtained from the body)
32 The Collection Operator
- Is used to collect all binding of the subelemnt
to be put under the father element - Has two kinds implicit and explicit
- The usage for the explicit version is N where
N is a free variable in that level - For example (of the explicit usage),
consider the previous example
33The Collection Operator Cont
- We create exactly one big neighborhood element
for each binding n1 ... nk of N (thereby
biding the value of N within the big
neighborhood element to one ni), and all these
elements are collected as subelements of the
parent element
34The Collection Operator Cont
- For elements in the head which do not have an
explicit collection label, an implicit collection
label may be used - The implicit collection variables of an element E
are those which are free in E - The usage for the explicit version is ...
where is before the beginning of the section
and is at its end
35The Collection Operator Cont
- For example consider the following code
A
B C
- The above corresponds to a nested loop structure
36 The Ordering Operator
- All subelemnts binding may be ordered by a given
order - If no order is specified a default order is
used.(Based on the order in which the data was
found) - Example consider the next DTD and the given
query after it
37-
pcdata required - And the query is CONSTRUCT
H order by H.Price
WHERE
H
IN "http//www.Mine.Xml"
38So ,Mmm ,Is XMAS So Powerful ?
- Home buyer's scenario. A user who wants
to buy a home .
he wants to make use of
information available from the web to guide this
decision. A possible query that the user may
issue is find all houses with 3 bedrooms, 2
baths, interior area at least 1600 sq.Ft.,
Priced between 250k and 350k, in
regions where the school rating is at least 70
(out of 100) and the crime rate is no more
than 15 incidents per year. Group the answers by
region and order them by price. For each home
also show the nearby schools."
39(No Transcript)
40Strong As Relational Algebra
- As mentioned before , one of the features of XMAS
is that it is as expressive as relational algebra
. some examples for this - Selection
selection on a variable is made in the
predicate part of the query - Projection write in the head just those variable
that you want to project
41- A natural join can be obtained by equating
variables in the body - Cartesian product may also be expressed easily
42CONSTRUCT
N S
N, S WHERE
N
Z IN
"http//www.npaci.edu/DICE/MIX/tutorial/neighborho
ods.xml" AND S
Z1
IN "http//www.npaci.edu/D
ICE/MIX/tutorial/schools.xml" AND ZZ1
Cartesian product is easily expressed by removing
the condition ZZ1
43Merry XMAS
44DTD Inference
45The MIX mediator and the advantages of living
with DTD-provided structure
- The MIX mediator employs DTDs to assist the user
in information discovery, query formulation and
to allow the query processor to derive more
efficient plans. - The view DTD inference module derive view DTD
given the source DTDs and the view.
46- The view DTD is passed to the DTD-based query
interface to enable query formulation. - A DTD inference algorithms developed for a
limited class of XMAS queries/views. - - pick-elements XMAS queries, i.e., queries
whose SELECT clause has a single variable, called
pick-variable, that binds to elements and WHERE
clause consists of a single condition that is
applied to only one source.
47- It is easy to compute a loose DTD for a view but
it is critical to the query interface and the
query processor to get the one that describe the
view as precisely as possible.
48- Also precise view DTDs may have other
applications than ours, for example, it may be
used as a toolkit for generating XSL style sheets
for presentation of the view. - A criterion for judging the precision of a view
DTD is tightness. - A DTD d1 is tighter then a DTD d2 if every
document described by d1 also described by d2. - The tightness criterion can be a benchmark for
other powerful view definition languages and view
inference algorithms.
49- So the view DTD inference algorithm attempts to
derive to tightest DTD that contains all the
possible documents that may appear as the content
of the view. - Unfortunately, even the tightest view DTD
describes structures that can never appear as the
views content. - For this the view DTD inference algorithm derive
an extended form of DTDs that typically does not
have non-tightness problems known as Specialized
DTDs.
50Model and Query Language Framework
- The focus is on XML documents that meet the
following requirements - - XML always valid i.e. Have a DTD.
- - There are no other attributes than the ID
attribute and all elements have an ID attribute. - - There are no empty elements but elements with
empty content are allowed. - - Mix content elements are not allowed i.e
elements whose content mixes strings with elements
51- DefinitionElement - An element e is a triplet
consisting a name, name(e), a unique ID and
content, content(e) which is a sequence of
elements or PCDATA value. - DefinitionA DTD is a set
n is in N where N is
the set of names and type(n) is either a regular
expression over N or PCDATA. - L(r) is the regular language described by r.
52- DefinitionAn element e satisfies a DTD D,
e D, if the following conditions exist - - name(e) is in N where N is the set of
element names - - if content(e) e1,e2,...,em then name(e1)
... Name(e m) are in L(type(name(e)) and ei D
1 - Else if content(e) is a string then
type(name(e))PCDATA.
53Soundness Tightness
- DefinitionA view DTD DV is sound if, given
source DTDs D1,D2,...,Dn and a view definition V,
for every tuple (d1,d2,...,dn) of n documents
such that d1 D1,d2 D2,...,dn Dn the view
document V(d1,d2,...,dn) DV - DefinitionA DTD D is tighter then a DTD D if
every document satisfying D satisfies D. - A type is tighter then a type
if L(r) is contained in L(r).
54- Definition A DTD DV is a tightest view DTD for
given source DTDs D1,D2,...,Dn and a view
definition V is there is no view DTD DV such
that DV tighter than DV.
55Structural Tightness
- In many practical cases even the tightest view
DTDs describe view document structures that
cannot be produced by the view. - This information loss phenomenon is formalized
by introducing the structural tightness property
of view DTDs.
56(No Transcript)
57- Definition A structural class of documents is a
set of documents such that for every two
documents d1,d2 in the class there is a mapping
that maps - - every string of d1 on a string of d2 and vice
versa. - - every id of d1 into an id of d2 and vice
versa - - if the mappings are applied to d1 , d1
becomes identical to d2 and vice versa
58- Definition A structural class of documents
satisfies a DTD D if the documents of the class
satisfy D. - Definition Given a set of sources DTDs D1,,Dn
- and a view V, a DTD DV is structurally tight if
- - it is the tightest DTD of the view given the
source DTDs - - for every structural class S that satisfies
DV there is a view document I that satisfies DV
and there are also source documents I1,,In,
satisfying D1,,Dn and I V(I1,,In).
59Specialized DTDs
- Specialized DTDs resolve the inherent
non-tightness problems of DTDs - Query Find all the professor and grad
sub-elements of department with one journal
publication.
60How specialized DTDs are computed?
- The DTD tightening algorithm recursively
tightens each type of the initial DTD by means
of the type refinement algorithm. - Definition The type refinement refine(r,n) of a
regular expression r given a name n is the
regular expression r that describes all strings
L(r) that contain at least one instance of n.
61Converting s-DTDs to DTDs
- First we obtain the images of all types of the
s-DTDs. - Then we merge all images that have the same name.
62Schema Inference Algorithm
- Refinement
- - Tightens individual types
- Specialization
- - uses the refinement algorithm and tightens
the whole input document. - Result List Type Inference.
- - Discovers the names and order of the types
that appear in the result.
63Future Work
- Powerful Query Languages
- - group-by, nest, navigation using recursive
paths in the vertical and horizontal direction,
check order, manipulate order. - More powerful/flexible schema descriptions
- - XML-Data, DCDs, many academic proposals
- Conditions for existence of tight/tightest DTDs.
- Other quality metrics for a view DTD.
64 The BBQ application
introduction
- BBQ stand for Blended Browsing and
Querying - a graphical user interface
for browsing and querying XML data
sources. - There are very few visual interfaces for querying
and browsing semistructured data, and fewer for
XML.
65 introduction cont.
- BBQ support query refinement by having query
results be sources used in subsequent queries.
Users can construct a query result document
(essentially a virtual view) and that document
becomes a first-class data source within BBQ,
meaning it can be browsed, queried, or used to
construct another query result document.
66 introduction cont.
- This is quiet useful if the user does not know ,
in advance , what exactly he is looking for. - The interface allows users to quickly create
complex queries without writing XMAS syntax by
hand. - BBQ displays the structure of multiple data
sources using a paradigm that resembles
drilling-down in Windows director structures.
67Mix Mediator
Wrapper
Wrapper
XML Data Source
Data Source
Computational Source
68 The BBQ interface
- BBQ ,which is XML driven, uses a set of DTDs
exported by the MIX mediator. They will be
referred from now on as base DTDs - The BBQ interface consists of one main window and
zero or more floating windows. The main window
contains a of toolbar, a split pane, and a
message console, while the floating windows
contain a toolbar and split pane only.
69(No Transcript)
70- From now on we will use the following DTDs which
will represent the base DTDs . - CSEStudents (CSEStudent)
degree)
(PCDATA)
71- Interns (Intern) (name, supervisor, sponsor)
72 BBQ power selecting and browsing XML
source DTD and data
- The DTDs are represented as trees in the obvious
hierarchical manner an element name is a parent
node, and that elements sub-elements are its
children - BBQ features special tree nodes to represent XML
DTD's structural operators such as the choice and
the seq(uence).
73- These special tree nodes give the user a more
accurate view of the DTD's structure than other
semistructured-data viewing systems,
and they also facilitate more complex queries. - For example, a default order constraint is
introduced, namely the one that corresponds to
the order in which elements are listed on the
screen.
74- XML data corresponding to given DTD are
represented as a directory tree. - The XML data is materialized on demand from the
source. - The buttons labeled next and previous in the XML
panel retrieve the next and previous n
instances, respectively.
75(No Transcript)
76 BBQ power cont. Creating XMAS
Queries with BBQ
- A query session is the set of events that occur
while BBQ is connected to the mediator. - Each query session consists of one or more query
cycles. A query cycle is the set of events that
starts with the user constructing a query, and
ends with the user browsing the query result.
77- The basic BBQ query cycles takes place in four
steps - First, constraints are set on the data sources.
- Second, a tree representing the query result
schema is created by dragging and dropping
elements. - Third, the XMAS query is generated and submitted
to the mediator. - Fourth, a DTD is generated for the query result
and the query result schema and data are
displayed.
78First step constraints set
- Constraints can be set on the leaf nodes of the
DTD tree or XML tree. Constraints cannot be set
on nonleaf nodes - The operators are a basic set of comparators
(,, , substr)
79 Example
- The user right-clicks the degree element and
selects "View/Edit Constraint...
from the popup menu. This action
brings up the "View/Edit Constraint"
dialog box, where is selected as the
operator, and PhD is typed in as the
operand. At this point,
the user clicks OK
80(No Transcript)
81- Joins can take place within a data source or
across data sources. Creating a join in BBQ is as
simple as selecting one leaf element, and
dragging and dropping it onto another leaf
elements - Suppose the user is interested in CSEStudents
who are also interns, and whose advisor is also
their supervisor.
82(No Transcript)
83Second construct the head
- construct a tree that the answer document(s) must
conform to, called the head or query result tree.
The right panel of BBQs main window is where the
head is built. - The head is composed of elements (and their
sub-trees) dragged from source DTDs, and tags
created on the spot with the Create New Child
popup menu item. - Ordering and group - by operators are also used
in the creation of the head.
84(No Transcript)
85 Third and forth steps
- BBQ converts the visual layout into XMAS query
language, contacts the MIX mediator and submits
the query. - Finally, BBQ generates a DTD for the query result
and it is displayed with the corresponding data
86BBQ Interface
Query in xmas
Xml result ,DTD
Mix mediator
wrapper
wrapper
OODB Database
87Important things to remember about
the BBQ
- Enable the query creator to construct queries in
an easy and graphical-oriented way. - Graphically support all the features of the XMAS
query language. - Supports blended browsing and querying
- accurate representation of DTDs and XML data.
88- Allows graphical represantion for the query
result also. - DTD for the result XML page of the given query is
created by the DTD -inference mechanism. - Because of that ,we may treat the query result as
any other XML source we use.( so we may use this
result as one of the sources used to build new
queries.
89- These is usually the case when we want to get
some information from the internet. We dont know
exactly what we are looking for , and the results
of the first queries aim us towards the goal of
our search.
Mix mediator
90 Selected biblography
- Enhancing Semistructured Data Mediators with
Document Type Denitions by
Yannis Papakonstantinou, Pavel
Velikhov - BBQ A Visual Interface for Integrated Browsing
and Querying of XML Kevin D.
Munroe, Yannis Papakonstantinou - XML-Based Information Mediation with MIX
Chaitanya Baru Amarnath Gupta Bertram Ludascher - Introduction to XMAS by
the XMAS sub-group of MIX