Managing and Querying Transactiontime Databases under Schema Evolution - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Managing and Querying Transactiontime Databases under Schema Evolution

Description:

MARS for a query rewriting engine [VLDB03] Input: XQuery, XICs ... 8 queries fail to finish (all bars touching the highest grid indicate out-of-memory error) ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 25
Provided by: yellowsto
Category:

less

Transcript and Presenter's Notes

Title: Managing and Querying Transactiontime Databases under Schema Evolution


1
Managing and Querying Transaction-time Databases
under Schema Evolution
  • Hyun J. Moon (UCLA)
  • Carlo Curino (Politecnico di Milano)
  • Alin Deutsch (UC San Diego)
  • Chien-Yi Hou (UC San Diego)
  • Carlo Zaniolo (UCLA)

Teradata
10/8/2008
2
Transaction-time DBs
  • Transaction-time DBs
  • History of data, automatically recorded by
    systems
  • Benefits History analysis, fault recovery,
    support for auditing
  • Temporally ungrouped model
  • a.k.a. tuple-level timestamping
  • Problems redundancy, need for coalesce at
    projection
  • Temporally grouped model
  • a.k.a. attribute-level timestamping
  • Better than ungrouped Clifford95
  • Problem not easily represented by relational
    model
  • Solution XML!

empno
3
XML for Temporal Data
  • V-Document
  • Naturally represent the temporally grouped model
    VLDBJ08

Table
ltdb tsT1 tenowgt ltempacct tsT1 tenowgt
ltrow tsT1 tenowgt ltempno tsT1
tenowgt1001lt/empnogt lttitle tsT1
teT3gtEngineerlt/titlegt lttitle tsT3
teT4gtSr Engineerlt/titlegt lttitle tsT4
tenowgtTech Leaderlt/titlegt ltdeptno tsT1
teT3gtd01lt/titlegt ltdeptno tsT3
tenowgtd02lt/titlegt lt/rowgt lt/empacctgt lt/dbgt
Columns
4
XQuery for Temporal Query
  • XQuery is a good temporal query language
  • Complex temporal queries are easily written
  • Turing-complete language
  • Current XML standards no extension required
  • This is the logical viewthat is then shredded
    into the flat tables (and SQL/XML) for better
    performance VLDBJ08

Query 1 Temporal projection. Retrieve the title
history of employee Bob for t in
doc(emp.xml)/db/empacct/row
nameBob/title return t
Query 2 Temporal Snapshot. Retrieve all the
titles at 1990-07-01 for t in
doc(emp.xml)/db/empacct/row/title
_at_tslt1990-07-01 and _at_tegt1990-07-01 return t
5
Outline
  • Background Transaction-time DBs with Fixed
    Schemas
  • Challenges with Schema Evolution
  • How to Archive?
  • How to Query?
  • Optimization

6
Schema Evolution
  • Schemas change. Why?
  • Data reorganization
  • Better logical design, performance
  • Changes in the modeled reality
  • New data collection, MA and restructuring,
    government regulations
  • Is it real?
  • Wikipedia DB 171 schema versions over 4.5 years
    ICEIS08
  • More case studies Marche93 Sjoberg93

7
Schema Modification Operators (SMOs)
  • Language for schema change
  • Procedural fashion
  • Do this, do that, Easier for regular DBAs
  • Similar to primitives in Bernstein06

SMOs describing Wikipedia DB Schema Evolution
ICEIS08
8
With Schema Changes Life is much harder.
Problem1 How to Archive!
Schema time
Schema time
TDB1
TDB2
TDB1 TDB2
S2
S2
T
T
TDB1
TDB1
S1
S1
Data time
T
Data time
T
  • Current-Schema Archival
  • History migrated into the current schema
  • Problem Some history lost e.g. DROP
    COLUMN
  • Original-Schema Archival
  • History stored under the original schema
  • Lossless!

9
Archiving, with Schema Changes?
  • Previous work relational approaches
  • Single-pool and Multi-pool DeCastro97
  • Problem temporally ungrouped, i.e. redundant,
    expensive coalescing
  • Our approach MV-Document
  • Extending V-Document for evolving schemas
  • Temporally grouped!

10
MV-Document Example
V1 T1T5
ltdb tsT1 tenowgt ltempacct tsT1 tenowgt
ltrow tsT1 tenowgt ltempno tsT1
tenowgt1001lt/empnogt lttitle tsT1
teT3gtEngineerlt/titlegt lttitle tsT3
teT4gtSr Engineerlt/titlegt lttitle tsT4
tenowgtTech Leaderlt/titlegt ltdeptno tsT1
teT3gtd01lt/titlegt ltdeptno tsT3
tenowgtd02lt/titlegt ltsalary tsT5
tenowgt70000lt/salarygt lt/rowgt lt/empacctgt
ltjob tsT1 teT5gt ltrow tsT1 teT5gt
lttitle tsT1 teT5gtTech Leaderlt/titlegt
ltsalary tsT1 teT5gt70000lt/salarygt lt/rowgt
lt/jobgt lt/dbgt
SMOs MOVE COLUMN salary FROM job INTO empacct
WHERE
empacct.titlejob.title DROP TABLE job
V2 T5now
  • Temporally grouped
  • No history duplicated storage-efficient,
    schema changes applied quickly

11
Problem 2 How to Query w/ Schema Changes?
  • Manual Querying
  • Write one query per version
  • Doesnt scale 100s of versions
  • Schema Versioning by Data Translation
  • Translate data into the queried version. Survey
    Roddick95
  • Inefficient!
  • Implementation by Query Rewriting
  • Use the above method as the semantics definition
  • Rewrite the input query into source versions
  • Efficient!

Q find salary history over last 20 years
Q
Q
Q
Q
Q
V2
V3
V4
V5
V1
time
12
Query Rewriting
Input XQuery
Rewritten XQuery
  • MARS for a query rewriting engine VLDB03
  • Input XQuery, XICs
  • Output XQuery
  • Chase input query to find an equivalent query
    modulo XICs
  • PRIMA translate SMO into XICs
  • XML Integrity Constraints
  • ICs for XML
  • First-order logic with XPath
  • Simple case MERGE TABLE S, T into R (from v1 to
    v2)

SMOs Schema History
SMOs
SMO2XIC
PRIMA
XICs
MARS
/v1db/S(x1), ./_at_ts(x1,s), ./_at_te(x1,e),
./row(x1, x2) ??y1 /v2db/R(y1),
./_at_ts(y1,s), ./_at_te(y1,e), ./row(y1,x2) /v1d
b/T(x1), ./_at_ts(x1,s), ./_at_te(x1,e),
./row(x1, x2) ??y1 /v2db/R(y1),
./_at_ts(y1,s), ./_at_te(y1,e), ./row(y1,x2)
13
Query Rewriting A Simple Example
  • engineerpersonnel(empno, name, hiredate, title,
    deptname)
  • otherpersonnel(empno, name, hiredate, title,
    deptname)
  • MERGE TABLE engineerpersonnel, otherpersonnel
    into empacct
  • empacct(empno, name, hiredate, title, deptname)

Query 1 Retrieve the title history of employee
Bob
for t in doc(emp.xml)/db/empacct/rownameBob
/title return t
rewrite
for t in doc(emp.xml)/db/empacct/rownameBob
/title return t union for t in
doc(emp.xml)/db/engineerpersonnel/rownameBob
/title return t union for t in
doc(emp.xml)/db/otherpersonnel/rownameBob/t
itle return t
See paper for more. VLDB08b
14
Rewriting Optimization
  • Problem Rewriting with many constraints is
    expensive
  • Goal minimize constraints used in rewriting
  • SMO Prune
  • use SMOs that affect the tables/columns used in
    the input query
  • SMO Compress
  • group SMOs of a same type and produce a single
    mapping

15
Query Optimization
  • Problem query rewriting is for non-temporal
    queries
  • Goal exploit queries temporal semantics to
    optimize
  • Minimal Source-version Detection
  • E.g. Find Joes salary at 2007-01-01
  • We dont need to query other schema versions
  • Temporal Join Detection
  • Temporal join a join with validity overlap
  • We dont need to perform temporal joins across
    versions

R1
R2
R
Union ? Join
S1
S2
S
R1
R2
R
Join ? Union
S1
S2
S
Some joins (R1-S2, R2-S1) pruned!
16
Experiment Results (1/2)
  • Query rewriting scales with many schema versions?
  • Wikipedias 171 schema versions between Apr 2003
    and Nov 2007
  • Top 20 queries from Wikipedias real query
    workload

gt100s
Scales well up to 170 versions
Improvement More than two orders (By SMO prune)
3.0s
1.5s
lt0.5s
17
Experiment Results (2/2)
  • Temporal query optimization effective?
  • Synthetic employee data 642KB of MV-document in
    MonetDB/XQuery
  • Five schema versions, 12 representative temporal
    queries
  • Without MSD (MinSourceDetect), 8 queries fail to
    finish (all bars touching the highest grid
    indicate out-of-memory error)

The only two cases with temporal joins
18
Challenges of Schema Evolution Panta Rhei
Framework
  • Pain in DB migration and query adaptation
  • Our approach PRISM workbench VLDB08a
  • High-level language for schema modification,
    automatic data translation, quasi-automatic
    legacy query rewriting
  • Archiving and querying of DB history made
    difficult
  • Our approach PRIMA system VLDB08b
  • Standard XML, query rewriting, optimizations
    using temporal semantics
  • Management of schema history itself
  • Our approach Historical Metadata Manager
    ECDM08
  • Using PRIMAs technology to preserve and query
    the evolution history generated by PRISM

19
Conclusions
  • Transaction-time DB is a great thing
  • But, it doesnt support evolving schemas
  • We propose an effective and efficient solution
  • Temporal data model for schema evolution,
  • Efficient query answering, by query rewriting
  • Optimization for rewriting and queries

20
  • Thank you!

Question or Comments?
21
Extra Slides
22
Presentation in a Nutshell
  • Problem
  • Transaction-time DBs have strong applications
  • Study cases reveal that schema evolution is a
    real problem
  • How to support evolving schemas in
    transaction-time DB?
  • Challenges and our contributions
  • Data model how to archive, with evolving
    schemas?
  • Previous approach relational-based approaches,
    which are temporally ungrouped
  • Our contribution XML-based temporally grouped
    data model
  • Query support how to query evolving data with
    evolving schemas?
  • Previous approach schema versioning, based on
    data translation
  • Our contribution schema versioning based on
    query rewriting
  • Solution scalability how to rewrite complex
    temporal queries over hundreds of schema versions
  • Our contribution mapping pruning, rewriting
    using minimal source versions and more

23
PRIMA in Schema Versioning Taxonomy
  • Schema versioning supported
  • Schema modification allow schema changes
  • Schema evolution above data preserved after
    schema changes
  • Schema versioning above any schema version
    can be used for querying/updating
  • Single-pool supported
  • Single-pool one big table for all schema
    versions, using completed schema, single line of
    history
  • Multi-pool one table for each schema, multiple
    lines of history
  • Synchronous management of versioned data and
    schemata
  • Synchronous temporal data conform to schema of
    same time
  • Asynchronous temporal data may not conform to
    schema of same time
  • Partial schema versioning
  • Partial schema versioning supports querying,
    retrospectively and prospectively. Update allowed
    only on a designated (usually current) schema
    version
  • Full schema versioning supports both querying
    and update retrospectively and prospectively

24
Bibliography
Bernstein06 P.A. Bernstein, P.A., T.J. Green,
S. Melnik, A. Nash, Implementing Mapping
Composition, VLDB 2006 Clifford95 J. Clifford,
A. Croker, F. Grandi, and A. Tuzhilin. On
temporal grouping. In Recent Advances in Temporal
Databases, pages 194213. Springer Verlag,
1995. DeCastro97 Cristina De Castro, Fabio
Grandi, Maria Rita Scalas. Schema Versioning for
Multitemporal Relational Databases. Inf. Syst.
22(5) 249-290 (1997) ECDM08 C. Curino, H. J.
Moon, and C. Zaniolo. Managing the history of
metadata in support for db archiving and schema
evolution. In ECDM, 2008. ICEIS08 Carlo A.
Curino, Hyun J. Moon, Letizia Tanca, Carlo
Zaniolo. Schema Evolution in Wikipedia toward a
Web Information System Benchmark, International
Conference on Enterprise Information Systems
(ICEIS) 2008 Marche93 S. Marche. Measuring the
stability of data models, European Journal of
Information Systems, 2(1)37-47,
1993. Roddick95 J. Roddick. A Survey of Schema
Versioning Issues for Database Systems.
Information and Software Technology,
37(7)383393, 1995. Sjoberg93 D.I. Sjoberg.
Quantifying schema evolution, Information and
Software Technology, 35(1)35-44, 1993. VLDB03
A. Deutsch and V. Tannen. MARS A system for
publishing XML from mixed and redundant storage.
In VLDB, 2003. VLDB08a Carlo A. Curino, Hyun J.
Moon, and Carlo Zaniolo. "Graceful database
schema evolution the PRISM workbench". VLDB,
2008. VLDB08b Hyun J. Moon, Carlo A. Curino,
Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo.
"Managing and querying transaction-time databases
under schema evolution". VLDB, 2008. VLDBJ08
Fusheng Wang, Carlo Zaniolo, Xin Zhou. ArchIS
An XML-based approach to transaction-time
temporal database systems Accepted for
publication in VLDB Journal. Zhou06 Xin Zhou,
Fusheng Wang, Carlo Zaniolo. Efficient Temporal
Coalescing Query Support in Relational Database
Systems DEXA 2006
Write a Comment
User Comments (0)
About PowerShow.com