Title: Historical XML Databases
1Historical XML Databases
- Fusheng Wang and Carlo Zaniolo
- University of California, Los Angeles
2Overview
- State of the art
- Two scenarios of archiving history
- Publishing relational database history in XML
- Temporal queries with XQuery
- Historical database architecture
- Efficient query support for temporal queries
- Conclusion
3State of the Art
- Publishing Relational DBs as XML Documents
- as actual documents to be processed using the
very rich XML tool set (XSLT, DOM) - as views to be queried by languages such as
XPath or XQuery. Queries against these views are
then mapped into SQL queries on the DB - DB vendors are very active in this area, e.g.
- SQLX and SQL functions for XML publishing
- XTables ( XPeranto ) as a middleware
4Our Proposal Publish the History of Relational
DBs as XML Documents
- Publish the history of relational DBs as XML
documents - Natural ways to represent such history in XML
- Historical queries can be expressed in XQuery as
isno extensions to the data model or query
language required for temporal queries - Approach amenable to efficient implementation
query and storage efficiency of alternative
approaches - Gain Temporal applications are very important
and are not supported well by current databases
5Two Basic Scenarios
- XML Data Warehouses archive the history
- change can be detected by current database update
logs - or compute the delta between the published XML
document snapshots of the new version and old
version - Traditional version management (RCS, SCCS). More
recent techniques (UBCC, RBVM) used for XML and
complex queries - RDBMSs archive the history
- XML history is a view---and historical queries
are mapped back into relational ones (e.g., using
XTables)
6A Short History of Timein Databases
- Between 33 and 48 proposals counted
- A perennial struggle to get around the
limitations of relational (flat) tables and a
rigid query language (SQL) - Clifford, Croker, Grandi, and Tuzhilin in their
On Temporal Grouping paper show that the
temporal-grouped models are more natural and
powerful Temp DB workshop, 1995 - But it is hard to fit temporally grouped models
and query languages into SQLan infinite morass
7Temporal Grouping in XML
- XML makes it possible to express and support
temporal grouping - The history of a relational DB can be viewed as
an XML document, using such representation - Then, powerful temporal queries can be specified
without requiring the introduction of new
constructs in the language - There are many ways to publish DBs using XMLand
not all will do
8History of Tables Transaction-Time Relational
Tables
- Timestamped tuple snapshots
- Temporally grouped history of employees
9Publishing DB History in XML Many Alternatives
- Each table as XML document columns as attributes
- Flat structure that corresponds to the tuple
snapshots (employees2.xml) - Each table as an XML document columns as
elements - A natural structure Clifford et al. which
simplifies many queries (employees.xml,
depts.xml) - Multiple tables as a single XML document flat
structure - Good for some join queries but not for
others(company.xml) - Multiple tables as a single XML document
hierarchy - similar but more intuitive than previous
(depts3.xml) - Multiple tables as an XML document flat
structure with IDs - Can simplify join queries with IDs and IDREFs
(company2.xml)
10XML Representation of DB HistoryTable Columns as
XML Elements
ltemployees tstart"1995-01-01"
tend"1996-12-31"gt ltemployee tstart"1995-01-01"
tend"1996-12-31"gt ltempno tstart"1995-01-01"
tend"1996-12-31"gt10003lt/empnogt ltname
tstart"1995-01-01" tend"1996-12-31"gtBoblt/namegt
ltsalary tstart"1995-01-01" tend"1995-05-31"gt60
000lt/salarygt ltsalary tstart"1995-06-01"
tend"1996-12-31"gt70000lt/salarygt lttitle
tstart"1995-01-01" tend"1995-09-30"gtEngineerlt/ti
tlegt lttitle tstart"1995-10-01"
tend"1996-01-31"gtSr Engineerlt/titlegt lttitle
tstart"1996-02-01" tend"1996-12-31"gtTech
Leaderlt/titlegt ltdept tstart"1995-01-01"
tend"1995-09-30"gtQAlt/deptgt ltdept
tstart"1995-10-01" tend"1996-12-31"gtRDlt/deptgt
ltDOB tstart"1995-01-01" tend"1996-12-31"gt1945-0
4-09lt/DOBgt lt/employeegt lt!-- More
--gt lt/employeesgt
11XML Representation of DB History(contd)
- Historical data is represented in an XML document
- Two attributes tstart and tend are used to
represent the time interval - The value now is used to denote the
ever-increasing current time - Node updates
- delete tend is updated to the current timestamp
- insert a new node is appended with tend set as
now - update delete followed by an insert
12Schema of the XML Representation
- The document has a well-defined schema derived
from the snapshot document
- lt!ELEMENT employees (employee)gt
- lt!ATTLIST employees tstart CDATA REQUIRED tend
CDATA REQUIREDgt - lt!ELEMENT employee (empno, name, salary,
title, dept, DOB)gt - lt!ATTLIST employee tstart CDATA REQUIRED tend
CDATA REQUIREDgt - lt!ELEMENT empno (PCDATA)gt
- lt!ATTLIST empno tstart CDATA REQUIRED tend CDATA
REQUIREDgt - lt!ELEMENT salary (PCDATA)gt
- lt!ATTLIST salary tstart CDATA REQUIRED tend
CDATA REQUIREDgt - lt!-- --gt
13Temporal Queries with XQuery
- Temporal projection retrieve the salary history
of Bob
element salary_history for s in
document("employees.xml")/
employees/employee/nameBob/salary return
s
- Snapshot queries retrieve the departments on
1996-01-31
for d in document("depts.xml")/depts/dept tstart
(.) lt "1996-01-31" and tend(.) gt
"1996-01-31" let n d/nametstart(.)lt"1996-0
1-31" and tend(.)gt"1996-01-31" let m
d/managertstart(.)lt"1996-01-31" and
tend(.)gt "1996-01-31" return( element
deptn,m )
14Temporal Queries with XQuery (contd)
- Interval Queries. Find employee(s) who worked in
the QA department throughout the history of
that department
for d in document("depts.xml")/depts/deptdeptnam
e'QA'/deptno for e in document("employees.xml")
/employees/employeedeptnod where
tstart(e/deptno)tstart(d) and
tend(e/deptno)tend(d) return e/name
15Complex Temporal Queries with XQuery
- A Since B. Find the employee who has been the
manager of the dept since he/she joined the dept
d007
for e in document("employees.xml")/employees/empl
oyee let m e/titletitle"Manager" and
tend(.)current-date() let d
e/deptnodeptno "d007" and tcontains(m, .)
where not empty(d) and not empty(m) return
ltemployeegt e/empno, e/firstname,
e/lastnamelt/employeegt
16Complex Temporal Queries with XQuery (contd)
- Period Containment. Find employees with same
history as employee 10112, i.e., they worked in
the same dept(s) as employee 10112 and exactly
for the same periods
for e1 in document("employees.xml")/employees/emp
loyee empno '10112' for e2 in
document("employees.xml")/employees/employee
empno ! '10112' where every d1 in e1/deptno
satisfies some d2 in e2/deptno
satisfies(string(d1) string( d2 ) and
tequals(d2, d1)) and every d2 in
e2/deptno satisfies some d1 in e1/deptno
satisfies (string(d2) string( d1 ) and
tequals(d1, d2)) return ltemployeegte2/empnolt/e
mployeegt
17User-Defined Temporal Functions
- Shield the user from the low-level details used
in representing time, e.g., now - Eliminate the need for the user to write complex
functions, e.g., coalescing and diff - Predefined functions
- History functions history(e,Ts, Te),
snapshot(e, T), invariance(e, Ts, Te) - Restructuring functions coalese(l)
- Interval functions toverlaps, tprecedes,
tcontains, tequals, tmeets - Duration and date/time functions timespan(e),
tstart(e), tend(e), tinterval(e), telement(Ts,
Te), getdbnow(), rtend(e), external(e)
18Support for now
- now no change until now. Values of tuples are
still current at the time the query is asked - Internally, end of time values are used to
denote now, e.g., 9999-12-31 - Intervals are only accessed through built-in
functions tstart() returns the start of an
interval, tend() returns the end or CURRENT_DATE
if its different from 9999-12-31 - In the output, tend value can be
- 9999-12-31,
- CURRENT_DATE (through rtend() ), or
- now (through externalnow() )
19Historical Database Architecture
XML Data
XML Queries
XML Publishing XML Views
Historical XML Data
Temporal XML Queries
Historical Database
20Historical XML Database ArchitectureTwo
Approaches
- XML-enabled RDBMS
- Historical view decomposed into relational
databases as binary tables - Historical data can then be published as XML
document through SQL/XML publishing functions or
queried through a middleware as XML views - Native XML databases
- Historical data are stored in native XML database
- XML queries can be specified directly upon the
database - Native XML databases SoftwareAGs Tamino,
eXcelons XIS
21Relational Storage of Temporal Relational Data
- Assumptions
- Each entity or relation has a unique key ( or
composite keys) to identify it which will not
change along the history. e.g., employee empno - Relational schema
- employee(empno, firstname, lastname, sex, DOB,
deptno, salary, title) - The historical XML documents are decomposed into
tables
22Relational Storage of Temporal Relational Data
(contd)
- Key table for keys
- employee_id(id, tstart, tend), where id empno
- For composite keys, the table will be like
- lineitem_id(id, supplierno, itemno, tstart, tend)
- Attribute history tables
- employee_lastname(id, lastname, tstart, tend)
-
- employee_salary(id, salary, tstart, tend)
- employee_title(id, title, tstart, tend)
-
- Global relation table keep all the relations
history - relations(name, tstart, tend)
-
23Relational Storage of Temporal Relational Data
(contd)
- Sample contents of employee_salary
ID SALARY TSTART TEND
100022 58805
02/04/1985 02/04/1986 100022 61118
02/05/1986 02/04/1987 100022 65103
02/05/1987 02/04/1988 100022 64712
02/05/1988 02/03/1989 100022 65245
02/04/1989 02/03/1990 100023 43162
07/13/1988 07/13/1989 ...
24XML publishing and XML Queries
- A middleware (XPERANTO/XTABLES) can be used to
publish and query historical tables as XML
documents - Create XML views over relational data
- Each database has a default XML view
- The temporal XML document representation can be
reconstructed with user-defined XML views with
XQuery, and be queried with XQuery - Query upon XML views with XQuery
- Only the desired relational data items are
materialized - Most computation pushed down to relational engine
25Automatic Archiving
- Statement
- CREATE HISTORICAL VIEW viewname AS
- SELECT col1, col2,
- FROM tablename USING KEY coli, Colj,
- Results
- Historical tables are created for each attribute
of the current table - Temporal XML views are created with XPERANTO
- The historical tables are initialized with the
snapshot of the current table - Active rules are started to trace any changes and
archive into the historical tables - Temporal XQuery can be specified on the XML views
26Implementation Comparisons
- A temporal data simulation program automatically
generates the historical data in XML - Total number of employees 300,024
- Database systems and major supported query
languages for comparison - Relational DB2. SQL
- Native
- SoftwareAGs Tamino (text-based storage). XPath
- eXcelons XIS (XML Information Server)
(OODBMS-based storage). XQuery
27Performance Comparisons
Storage Size
28Performance Comparisons (contd)
Query Performance of DB2 and Tamino
Q2 history query Q4,Q6 snapshot queries Q3,Q5
interval queries Q1 scan of databases Q7 join
29Performance Comparisons (contd)
Query Performance of Tamino and XIS (1/3 data
size)
30Efficient Query Support for Temporal Queries
- H-document is first clustered by document
structure, and then by the change history - Tamino will preserve the clustering structure
thus retrieving the history of a node can be
efficient - In RDBMS approach, tuples are stored in the order
of updates, neither temporarily clustered nor
clustered by objects - Traditional B Tree index will not help on
interval-related temporal queries - A segment-based archiving scheme was used in this
project
31Segment-based Archiving Scheme (contd)
Segment1 (01/01/1985 - 10/17/1991) ID
SALARY TSTART TEND 100002
40000 02/20/1988 02/19/1989 100002 42010
02/20/1989 02/19/1990 100002 42525
02/20/1990 02/19/1991 100002 42727
02/20/1991 12/31/9999 ... Segment2
(10/18/1991 - 07/08/1995) ID SALARY
TSTART TEND 100002 42727
02/20/1991 02/19/1992 100002 45237
02/20/1992 02/18/1993 100002 46465
02/19/1993 02/18/1994 100002 47418
02/19/1994 02/18/1995 100002 47273
02/19/1995 12/31/9999 ... segment3
(07/09/1995 - 01/08/1999) ...
32Query Performance
Query Performance with different usefulness
Q1,Q3 snapshot queries Q5 interval queries Q2,
Q4 history queries
33Conclusion
- XML can be used to support a temporally grouped
data model, and represent temporal relational
data - The framework supports complex temporal queries
with XQuery, without extension to XQuery - The XML-viewed history of database tables can be
stored using a native XML database or using a
RDBMS - RDBMS has significant query performance compared
to native XML database, while the latter can be
more effective in terms of storage due to
compression techniques - A segment-based archiving scheme based on
usefulness can significantly boost the
performance on most temporal queries
34History of XML Documents
- The temporal representation in XML not only
applies to historical relational data, but also
historical XML documents
ltdocument tstart"2002-01-01" tend"now"gt
ltchapter tstart"2002-01-01" tend"now"gt
ltno isAttr"yes" tstart"2002-01-01"
tend"now"gt1lt/nogt lttitle
tstart"2002-01-01" tend"2002-01-01"gtIntroduction
lt/titlegt lttitle tstart"2002-01-02"
tend"now"gtIntroduction and Overviewlt/titlegt
ltsection tstart"2002-01-01" tend"now"gt
lttitle tstart"2002-01-01"
tend"now"gtBackgroundlt/titlegt
ltsubsection tstart"2002-01-01" tend"now"gt
lttitle tstart"2002-01-01"
tend"now"gtPrevious Worklt/titlegt
ltcontent tstart"2002-01-01" tend"now"gt...lt/conte
ntgt lt/subsectiongt
lt/sectiongt lt/chaptergt lt!-- ...
--gt lt/documentgt