Historical XML Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Historical XML Databases

Description:

Publishing Relational DBs as XML Documents: ... Publishing DB History in XML. Many Alternatives. Each table as XML document: columns as attributes ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 35
Provided by: Fushen6
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Historical XML Databases


1
Historical XML Databases
  • Fusheng Wang and Carlo Zaniolo
  • University of California, Los Angeles

2
Overview
  • State of the art
  • Two scenarios of archiving history
  • Publishing relational database history in XML
  • Temporal queries with XQuery
  • Historical database architecture
  • Efficient query support for temporal queries
  • Conclusion

3
State of the Art
  • Publishing Relational DBs as XML Documents
  • as actual documents to be processed using the
    very rich XML tool set (XSLT, DOM)
  • as views to be queried by languages such as
    XPath or XQuery. Queries against these views are
    then mapped into SQL queries on the DB
  • DB vendors are very active in this area, e.g.
  • SQLX and SQL functions for XML publishing
  • XTables ( XPeranto ) as a middleware

4
Our Proposal Publish the History of Relational
DBs as XML Documents
  • Publish the history of relational DBs as XML
    documents
  • Natural ways to represent such history in XML
  • Historical queries can be expressed in XQuery as
    isno extensions to the data model or query
    language required for temporal queries
  • Approach amenable to efficient implementation
    query and storage efficiency of alternative
    approaches
  • Gain Temporal applications are very important
    and are not supported well by current databases

5
Two Basic Scenarios
  • XML Data Warehouses archive the history
  • change can be detected by current database update
    logs
  • or compute the delta between the published XML
    document snapshots of the new version and old
    version
  • Traditional version management (RCS, SCCS). More
    recent techniques (UBCC, RBVM) used for XML and
    complex queries
  • RDBMSs archive the history
  • XML history is a view---and historical queries
    are mapped back into relational ones (e.g., using
    XTables)

6
A Short History of Timein Databases
  • Between 33 and 48 proposals counted
  • A perennial struggle to get around the
    limitations of relational (flat) tables and a
    rigid query language (SQL)
  • Clifford, Croker, Grandi, and Tuzhilin in their
    On Temporal Grouping paper show that the
    temporal-grouped models are more natural and
    powerful Temp DB workshop, 1995
  • But it is hard to fit temporally grouped models
    and query languages into SQLan infinite morass

7
Temporal Grouping in XML
  • XML makes it possible to express and support
    temporal grouping
  • The history of a relational DB can be viewed as
    an XML document, using such representation
  • Then, powerful temporal queries can be specified
    without requiring the introduction of new
    constructs in the language
  • There are many ways to publish DBs using XMLand
    not all will do

8
History of Tables Transaction-Time Relational
Tables
  • Timestamped tuple snapshots
  • Temporally grouped history of employees

9
Publishing DB History in XML Many Alternatives
  • Each table as XML document columns as attributes
  • Flat structure that corresponds to the tuple
    snapshots (employees2.xml)
  • Each table as an XML document columns as
    elements
  • A natural structure Clifford et al. which
    simplifies many queries (employees.xml,
    depts.xml)
  • Multiple tables as a single XML document flat
    structure
  • Good for some join queries but not for
    others(company.xml)
  • Multiple tables as a single XML document
    hierarchy
  • similar but more intuitive than previous
    (depts3.xml)
  • Multiple tables as an XML document flat
    structure with IDs
  • Can simplify join queries with IDs and IDREFs
    (company2.xml)

10
XML Representation of DB HistoryTable Columns as
XML Elements
ltemployees tstart"1995-01-01"
tend"1996-12-31"gt ltemployee tstart"1995-01-01"
tend"1996-12-31"gt ltempno tstart"1995-01-01"
tend"1996-12-31"gt10003lt/empnogt ltname
tstart"1995-01-01" tend"1996-12-31"gtBoblt/namegt
ltsalary tstart"1995-01-01" tend"1995-05-31"gt60
000lt/salarygt ltsalary tstart"1995-06-01"
tend"1996-12-31"gt70000lt/salarygt lttitle
tstart"1995-01-01" tend"1995-09-30"gtEngineerlt/ti
tlegt lttitle tstart"1995-10-01"
tend"1996-01-31"gtSr Engineerlt/titlegt lttitle
tstart"1996-02-01" tend"1996-12-31"gtTech
Leaderlt/titlegt ltdept tstart"1995-01-01"
tend"1995-09-30"gtQAlt/deptgt ltdept
tstart"1995-10-01" tend"1996-12-31"gtRDlt/deptgt
ltDOB tstart"1995-01-01" tend"1996-12-31"gt1945-0
4-09lt/DOBgt lt/employeegt lt!-- More
--gt lt/employeesgt
11
XML Representation of DB History(contd)
  • Historical data is represented in an XML document
  • Two attributes tstart and tend are used to
    represent the time interval
  • The value now is used to denote the
    ever-increasing current time
  • Node updates
  • delete tend is updated to the current timestamp
  • insert a new node is appended with tend set as
    now
  • update delete followed by an insert

12
Schema of the XML Representation
  • The document has a well-defined schema derived
    from the snapshot document
  • lt!ELEMENT employees (employee)gt
  • lt!ATTLIST employees tstart CDATA REQUIRED tend
    CDATA REQUIREDgt
  • lt!ELEMENT employee (empno, name, salary,
    title, dept, DOB)gt
  • lt!ATTLIST employee tstart CDATA REQUIRED tend
    CDATA REQUIREDgt
  • lt!ELEMENT empno (PCDATA)gt
  • lt!ATTLIST empno tstart CDATA REQUIRED tend CDATA
    REQUIREDgt
  • lt!ELEMENT salary (PCDATA)gt
  • lt!ATTLIST salary tstart CDATA REQUIRED tend
    CDATA REQUIREDgt
  • lt!-- --gt

13
Temporal Queries with XQuery
  • Temporal projection retrieve the salary history
    of Bob

element salary_history for s in
document("employees.xml")/
employees/employee/nameBob/salary return
s
  • Snapshot queries retrieve the departments on
    1996-01-31

for d in document("depts.xml")/depts/dept tstart
(.) lt "1996-01-31" and tend(.) gt
"1996-01-31" let n d/nametstart(.)lt"1996-0
1-31" and tend(.)gt"1996-01-31" let m
d/managertstart(.)lt"1996-01-31" and
tend(.)gt "1996-01-31" return( element
deptn,m )
14
Temporal Queries with XQuery (contd)
  • Interval Queries. Find employee(s) who worked in
    the QA department throughout the history of
    that department

for d in document("depts.xml")/depts/deptdeptnam
e'QA'/deptno for e in document("employees.xml")
/employees/employeedeptnod where
tstart(e/deptno)tstart(d) and
tend(e/deptno)tend(d) return e/name

15
Complex Temporal Queries with XQuery
  • A Since B. Find the employee who has been the
    manager of the dept since he/she joined the dept
    d007

for e in document("employees.xml")/employees/empl
oyee let m e/titletitle"Manager" and
tend(.)current-date() let d
e/deptnodeptno "d007" and tcontains(m, .)
where not empty(d) and not empty(m) return
ltemployeegt e/empno, e/firstname,
e/lastnamelt/employeegt
16
Complex Temporal Queries with XQuery (contd)
  • Period Containment. Find employees with same
    history as employee 10112, i.e., they worked in
    the same dept(s) as employee 10112 and exactly
    for the same periods

for e1 in document("employees.xml")/employees/emp
loyee empno '10112' for e2 in
document("employees.xml")/employees/employee
empno ! '10112' where every d1 in e1/deptno
satisfies some d2 in e2/deptno
satisfies(string(d1) string( d2 ) and
tequals(d2, d1)) and every d2 in
e2/deptno satisfies some d1 in e1/deptno
satisfies (string(d2) string( d1 ) and
tequals(d1, d2)) return ltemployeegte2/empnolt/e
mployeegt
17
User-Defined Temporal Functions
  • Shield the user from the low-level details used
    in representing time, e.g., now
  • Eliminate the need for the user to write complex
    functions, e.g., coalescing and diff
  • Predefined functions
  • History functions history(e,Ts, Te),
    snapshot(e, T), invariance(e, Ts, Te)
  • Restructuring functions coalese(l)
  • Interval functions toverlaps, tprecedes,
    tcontains, tequals, tmeets
  • Duration and date/time functions timespan(e),
    tstart(e), tend(e), tinterval(e), telement(Ts,
    Te), getdbnow(), rtend(e), external(e)

18
Support for now
  • now no change until now. Values of tuples are
    still current at the time the query is asked
  • Internally, end of time values are used to
    denote now, e.g., 9999-12-31
  • Intervals are only accessed through built-in
    functions tstart() returns the start of an
    interval, tend() returns the end or CURRENT_DATE
    if its different from 9999-12-31
  • In the output, tend value can be
  • 9999-12-31,
  • CURRENT_DATE (through rtend() ), or
  • now (through externalnow() )

19
Historical Database Architecture
XML Data
XML Queries
XML Publishing XML Views
Historical XML Data
Temporal XML Queries
Historical Database
20
Historical XML Database ArchitectureTwo
Approaches
  • XML-enabled RDBMS
  • Historical view decomposed into relational
    databases as binary tables
  • Historical data can then be published as XML
    document through SQL/XML publishing functions or
    queried through a middleware as XML views
  • Native XML databases
  • Historical data are stored in native XML database
  • XML queries can be specified directly upon the
    database
  • Native XML databases SoftwareAGs Tamino,
    eXcelons XIS

21
Relational Storage of Temporal Relational Data
  • Assumptions
  • Each entity or relation has a unique key ( or
    composite keys) to identify it which will not
    change along the history. e.g., employee empno
  • Relational schema
  • employee(empno, firstname, lastname, sex, DOB,
    deptno, salary, title)
  • The historical XML documents are decomposed into
    tables

22
Relational Storage of Temporal Relational Data
(contd)
  • Key table for keys
  • employee_id(id, tstart, tend), where id empno
  • For composite keys, the table will be like
  • lineitem_id(id, supplierno, itemno, tstart, tend)
  • Attribute history tables
  • employee_lastname(id, lastname, tstart, tend)
  • employee_salary(id, salary, tstart, tend)
  • employee_title(id, title, tstart, tend)
  • Global relation table keep all the relations
    history
  • relations(name, tstart, tend)

23
Relational Storage of Temporal Relational Data
(contd)
  • Sample contents of employee_salary

ID SALARY TSTART TEND
100022 58805
02/04/1985 02/04/1986 100022 61118
02/05/1986 02/04/1987 100022 65103
02/05/1987 02/04/1988 100022 64712
02/05/1988 02/03/1989 100022 65245
02/04/1989 02/03/1990 100023 43162
07/13/1988 07/13/1989 ...
24
XML publishing and XML Queries
  • A middleware (XPERANTO/XTABLES) can be used to
    publish and query historical tables as XML
    documents
  • Create XML views over relational data
  • Each database has a default XML view
  • The temporal XML document representation can be
    reconstructed with user-defined XML views with
    XQuery, and be queried with XQuery
  • Query upon XML views with XQuery
  • Only the desired relational data items are
    materialized
  • Most computation pushed down to relational engine

25
Automatic Archiving
  • Statement
  • CREATE HISTORICAL VIEW viewname AS
  • SELECT col1, col2,
  • FROM tablename USING KEY coli, Colj,
  • Results
  • Historical tables are created for each attribute
    of the current table
  • Temporal XML views are created with XPERANTO
  • The historical tables are initialized with the
    snapshot of the current table
  • Active rules are started to trace any changes and
    archive into the historical tables
  • Temporal XQuery can be specified on the XML views

26
Implementation Comparisons
  • A temporal data simulation program automatically
    generates the historical data in XML
  • Total number of employees 300,024
  • Database systems and major supported query
    languages for comparison
  • Relational DB2. SQL
  • Native
  • SoftwareAGs Tamino (text-based storage). XPath
  • eXcelons XIS (XML Information Server)
    (OODBMS-based storage). XQuery

27
Performance Comparisons
Storage Size
28
Performance Comparisons (contd)
Query Performance of DB2 and Tamino
Q2 history query Q4,Q6 snapshot queries Q3,Q5
interval queries Q1 scan of databases Q7 join
29
Performance Comparisons (contd)
Query Performance of Tamino and XIS (1/3 data
size)
30
Efficient Query Support for Temporal Queries
  • H-document is first clustered by document
    structure, and then by the change history
  • Tamino will preserve the clustering structure
    thus retrieving the history of a node can be
    efficient
  • In RDBMS approach, tuples are stored in the order
    of updates, neither temporarily clustered nor
    clustered by objects
  • Traditional B Tree index will not help on
    interval-related temporal queries
  • A segment-based archiving scheme was used in this
    project

31
Segment-based Archiving Scheme (contd)
  • Sample segments

Segment1 (01/01/1985 - 10/17/1991) ID
SALARY TSTART TEND 100002
40000 02/20/1988 02/19/1989 100002 42010
02/20/1989 02/19/1990 100002 42525
02/20/1990 02/19/1991 100002 42727
02/20/1991 12/31/9999 ... Segment2
(10/18/1991 - 07/08/1995) ID SALARY
TSTART TEND 100002 42727
02/20/1991 02/19/1992 100002 45237
02/20/1992 02/18/1993 100002 46465
02/19/1993 02/18/1994 100002 47418
02/19/1994 02/18/1995 100002 47273
02/19/1995 12/31/9999 ... segment3
(07/09/1995 - 01/08/1999) ...
32
Query Performance
Query Performance with different usefulness
Q1,Q3 snapshot queries Q5 interval queries Q2,
Q4 history queries
33
Conclusion
  • XML can be used to support a temporally grouped
    data model, and represent temporal relational
    data
  • The framework supports complex temporal queries
    with XQuery, without extension to XQuery
  • The XML-viewed history of database tables can be
    stored using a native XML database or using a
    RDBMS
  • RDBMS has significant query performance compared
    to native XML database, while the latter can be
    more effective in terms of storage due to
    compression techniques
  • A segment-based archiving scheme based on
    usefulness can significantly boost the
    performance on most temporal queries

34
History of XML Documents
  • The temporal representation in XML not only
    applies to historical relational data, but also
    historical XML documents

ltdocument tstart"2002-01-01" tend"now"gt
ltchapter tstart"2002-01-01" tend"now"gt
ltno isAttr"yes" tstart"2002-01-01"
tend"now"gt1lt/nogt lttitle
tstart"2002-01-01" tend"2002-01-01"gtIntroduction
lt/titlegt lttitle tstart"2002-01-02"
tend"now"gtIntroduction and Overviewlt/titlegt
ltsection tstart"2002-01-01" tend"now"gt
lttitle tstart"2002-01-01"
tend"now"gtBackgroundlt/titlegt
ltsubsection tstart"2002-01-01" tend"now"gt
lttitle tstart"2002-01-01"
tend"now"gtPrevious Worklt/titlegt
ltcontent tstart"2002-01-01" tend"now"gt...lt/conte
ntgt lt/subsectiongt
lt/sectiongt lt/chaptergt lt!-- ...
--gt lt/documentgt
Write a Comment
User Comments (0)
About PowerShow.com