Title: Managing and Querying Transactiontime Databases under Schema Evolution
1Managing and Querying Transaction-time Databases
under Schema Evolution
- Hyun J. Moon (UCLA)
- Carlo Curino (Politecnico di Milano)
- Alin Deutsch (UC San Diego)
- Chien-Yi Hou (UC San Diego)
- Carlo Zaniolo (UCLA)
Teradata
10/8/2008
2Transaction-time DBs
- Transaction-time DBs
- History of data, automatically recorded by
systems - Benefits History analysis, fault recovery,
support for auditing - Temporally ungrouped model
- a.k.a. tuple-level timestamping
- Problems redundancy, need for coalesce at
projection - Temporally grouped model
- a.k.a. attribute-level timestamping
- Better than ungrouped Clifford95
- Problem not easily represented by relational
model - Solution XML!
empno
3XML for Temporal Data
- V-Document
- Naturally represent the temporally grouped model
VLDBJ08
Table
ltdb tsT1 tenowgt ltempacct tsT1 tenowgt
ltrow tsT1 tenowgt ltempno tsT1
tenowgt1001lt/empnogt lttitle tsT1
teT3gtEngineerlt/titlegt lttitle tsT3
teT4gtSr Engineerlt/titlegt lttitle tsT4
tenowgtTech Leaderlt/titlegt ltdeptno tsT1
teT3gtd01lt/titlegt ltdeptno tsT3
tenowgtd02lt/titlegt lt/rowgt lt/empacctgt lt/dbgt
Columns
4XQuery for Temporal Query
- XQuery is a good temporal query language
- Complex temporal queries are easily written
- Turing-complete language
- Current XML standards no extension required
- This is the logical viewthat is then shredded
into the flat tables (and SQL/XML) for better
performance VLDBJ08
Query 1 Temporal projection. Retrieve the title
history of employee Bob for t in
doc(emp.xml)/db/empacct/row
nameBob/title return t
Query 2 Temporal Snapshot. Retrieve all the
titles at 1990-07-01 for t in
doc(emp.xml)/db/empacct/row/title
_at_tslt1990-07-01 and _at_tegt1990-07-01 return t
5Outline
- Background Transaction-time DBs with Fixed
Schemas - Challenges with Schema Evolution
- How to Archive?
- How to Query?
- Optimization
6Schema Evolution
- Schemas change. Why?
- Data reorganization
- Better logical design, performance
- Changes in the modeled reality
- New data collection, MA and restructuring,
government regulations - Is it real?
- Wikipedia DB 171 schema versions over 4.5 years
ICEIS08 - More case studies Marche93 Sjoberg93
7Schema Modification Operators (SMOs)
- Language for schema change
- Procedural fashion
- Do this, do that, Easier for regular DBAs
- Similar to primitives in Bernstein06
SMOs describing Wikipedia DB Schema Evolution
ICEIS08
8With Schema Changes Life is much harder.
Problem1 How to Archive!
Schema time
Schema time
TDB1
TDB2
TDB1 TDB2
S2
S2
T
T
TDB1
TDB1
S1
S1
Data time
T
Data time
T
- Current-Schema Archival
- History migrated into the current schema
- Problem Some history lost e.g. DROP
COLUMN
- Original-Schema Archival
- History stored under the original schema
- Lossless!
9Archiving, with Schema Changes?
- Previous work relational approaches
- Single-pool and Multi-pool DeCastro97
- Problem temporally ungrouped, i.e. redundant,
expensive coalescing - Our approach MV-Document
- Extending V-Document for evolving schemas
- Temporally grouped!
10MV-Document Example
V1 T1T5
ltdb tsT1 tenowgt ltempacct tsT1 tenowgt
ltrow tsT1 tenowgt ltempno tsT1
tenowgt1001lt/empnogt lttitle tsT1
teT3gtEngineerlt/titlegt lttitle tsT3
teT4gtSr Engineerlt/titlegt lttitle tsT4
tenowgtTech Leaderlt/titlegt ltdeptno tsT1
teT3gtd01lt/titlegt ltdeptno tsT3
tenowgtd02lt/titlegt ltsalary tsT5
tenowgt70000lt/salarygt lt/rowgt lt/empacctgt
ltjob tsT1 teT5gt ltrow tsT1 teT5gt
lttitle tsT1 teT5gtTech Leaderlt/titlegt
ltsalary tsT1 teT5gt70000lt/salarygt lt/rowgt
lt/jobgt lt/dbgt
SMOs MOVE COLUMN salary FROM job INTO empacct
WHERE
empacct.titlejob.title DROP TABLE job
V2 T5now
- Temporally grouped
- No history duplicated storage-efficient,
schema changes applied quickly
11Problem 2 How to Query w/ Schema Changes?
- Manual Querying
- Write one query per version
- Doesnt scale 100s of versions
- Schema Versioning by Data Translation
- Translate data into the queried version. Survey
Roddick95 - Inefficient!
- Implementation by Query Rewriting
- Use the above method as the semantics definition
- Rewrite the input query into source versions
- Efficient!
Q find salary history over last 20 years
Q
Q
Q
Q
Q
V2
V3
V4
V5
V1
time
12Query Rewriting
Input XQuery
Rewritten XQuery
- MARS for a query rewriting engine VLDB03
- Input XQuery, XICs
- Output XQuery
- Chase input query to find an equivalent query
modulo XICs - PRIMA translate SMO into XICs
- XML Integrity Constraints
- ICs for XML
- First-order logic with XPath
- Simple case MERGE TABLE S, T into R (from v1 to
v2)
SMOs Schema History
SMOs
SMO2XIC
PRIMA
XICs
MARS
/v1db/S(x1), ./_at_ts(x1,s), ./_at_te(x1,e),
./row(x1, x2) ??y1 /v2db/R(y1),
./_at_ts(y1,s), ./_at_te(y1,e), ./row(y1,x2) /v1d
b/T(x1), ./_at_ts(x1,s), ./_at_te(x1,e),
./row(x1, x2) ??y1 /v2db/R(y1),
./_at_ts(y1,s), ./_at_te(y1,e), ./row(y1,x2)
13Query Rewriting A Simple Example
- engineerpersonnel(empno, name, hiredate, title,
deptname) - otherpersonnel(empno, name, hiredate, title,
deptname) - MERGE TABLE engineerpersonnel, otherpersonnel
into empacct - empacct(empno, name, hiredate, title, deptname)
Query 1 Retrieve the title history of employee
Bob
for t in doc(emp.xml)/db/empacct/rownameBob
/title return t
rewrite
for t in doc(emp.xml)/db/empacct/rownameBob
/title return t union for t in
doc(emp.xml)/db/engineerpersonnel/rownameBob
/title return t union for t in
doc(emp.xml)/db/otherpersonnel/rownameBob/t
itle return t
See paper for more. VLDB08b
14Rewriting Optimization
- Problem Rewriting with many constraints is
expensive - Goal minimize constraints used in rewriting
- SMO Prune
- use SMOs that affect the tables/columns used in
the input query - SMO Compress
- group SMOs of a same type and produce a single
mapping
15Query Optimization
- Problem query rewriting is for non-temporal
queries - Goal exploit queries temporal semantics to
optimize - Minimal Source-version Detection
- E.g. Find Joes salary at 2007-01-01
- We dont need to query other schema versions
- Temporal Join Detection
- Temporal join a join with validity overlap
- We dont need to perform temporal joins across
versions
R1
R2
R
Union ? Join
S1
S2
S
R1
R2
R
Join ? Union
S1
S2
S
Some joins (R1-S2, R2-S1) pruned!
16Experiment Results (1/2)
- Query rewriting scales with many schema versions?
- Wikipedias 171 schema versions between Apr 2003
and Nov 2007 - Top 20 queries from Wikipedias real query
workload
gt100s
Scales well up to 170 versions
Improvement More than two orders (By SMO prune)
3.0s
1.5s
lt0.5s
17Experiment Results (2/2)
- Temporal query optimization effective?
- Synthetic employee data 642KB of MV-document in
MonetDB/XQuery - Five schema versions, 12 representative temporal
queries - Without MSD (MinSourceDetect), 8 queries fail to
finish (all bars touching the highest grid
indicate out-of-memory error)
The only two cases with temporal joins
18Challenges of Schema Evolution Panta Rhei
Framework
- Pain in DB migration and query adaptation
- Our approach PRISM workbench VLDB08a
- High-level language for schema modification,
automatic data translation, quasi-automatic
legacy query rewriting - Archiving and querying of DB history made
difficult - Our approach PRIMA system VLDB08b
- Standard XML, query rewriting, optimizations
using temporal semantics - Management of schema history itself
- Our approach Historical Metadata Manager
ECDM08 - Using PRIMAs technology to preserve and query
the evolution history generated by PRISM
19Conclusions
- Transaction-time DB is a great thing
- But, it doesnt support evolving schemas
- We propose an effective and efficient solution
- Temporal data model for schema evolution,
- Efficient query answering, by query rewriting
- Optimization for rewriting and queries
20Question or Comments?
21Extra Slides
22Presentation in a Nutshell
- Problem
- Transaction-time DBs have strong applications
- Study cases reveal that schema evolution is a
real problem - How to support evolving schemas in
transaction-time DB? - Challenges and our contributions
- Data model how to archive, with evolving
schemas? - Previous approach relational-based approaches,
which are temporally ungrouped - Our contribution XML-based temporally grouped
data model - Query support how to query evolving data with
evolving schemas? - Previous approach schema versioning, based on
data translation - Our contribution schema versioning based on
query rewriting - Solution scalability how to rewrite complex
temporal queries over hundreds of schema versions - Our contribution mapping pruning, rewriting
using minimal source versions and more
23PRIMA in Schema Versioning Taxonomy
- Schema versioning supported
- Schema modification allow schema changes
- Schema evolution above data preserved after
schema changes - Schema versioning above any schema version
can be used for querying/updating - Single-pool supported
- Single-pool one big table for all schema
versions, using completed schema, single line of
history - Multi-pool one table for each schema, multiple
lines of history - Synchronous management of versioned data and
schemata - Synchronous temporal data conform to schema of
same time - Asynchronous temporal data may not conform to
schema of same time - Partial schema versioning
- Partial schema versioning supports querying,
retrospectively and prospectively. Update allowed
only on a designated (usually current) schema
version - Full schema versioning supports both querying
and update retrospectively and prospectively
24Bibliography
Bernstein06 P.A. Bernstein, P.A., T.J. Green,
S. Melnik, A. Nash, Implementing Mapping
Composition, VLDB 2006 Clifford95 J. Clifford,
A. Croker, F. Grandi, and A. Tuzhilin. On
temporal grouping. In Recent Advances in Temporal
Databases, pages 194213. Springer Verlag,
1995. DeCastro97 Cristina De Castro, Fabio
Grandi, Maria Rita Scalas. Schema Versioning for
Multitemporal Relational Databases. Inf. Syst.
22(5) 249-290 (1997) ECDM08 C. Curino, H. J.
Moon, and C. Zaniolo. Managing the history of
metadata in support for db archiving and schema
evolution. In ECDM, 2008. ICEIS08 Carlo A.
Curino, Hyun J. Moon, Letizia Tanca, Carlo
Zaniolo. Schema Evolution in Wikipedia toward a
Web Information System Benchmark, International
Conference on Enterprise Information Systems
(ICEIS) 2008 Marche93 S. Marche. Measuring the
stability of data models, European Journal of
Information Systems, 2(1)37-47,
1993. Roddick95 J. Roddick. A Survey of Schema
Versioning Issues for Database Systems.
Information and Software Technology,
37(7)383393, 1995. Sjoberg93 D.I. Sjoberg.
Quantifying schema evolution, Information and
Software Technology, 35(1)35-44, 1993. VLDB03
A. Deutsch and V. Tannen. MARS A system for
publishing XML from mixed and redundant storage.
In VLDB, 2003. VLDB08a Carlo A. Curino, Hyun J.
Moon, and Carlo Zaniolo. "Graceful database
schema evolution the PRISM workbench". VLDB,
2008. VLDB08b Hyun J. Moon, Carlo A. Curino,
Alin Deutsch, Chien-Yi Hou, and Carlo Zaniolo.
"Managing and querying transaction-time databases
under schema evolution". VLDB, 2008. VLDBJ08
Fusheng Wang, Carlo Zaniolo, Xin Zhou. ArchIS
An XML-based approach to transaction-time
temporal database systems Accepted for
publication in VLDB Journal. Zhou06 Xin Zhou,
Fusheng Wang, Carlo Zaniolo. Efficient Temporal
Coalescing Query Support in Relational Database
Systems DEXA 2006