Title: Change-Centric Management of Versions in an XML Warehouse
1Change-Centric Management of Versions in an XML
Warehouse
- Amélie Marian
- Columbia University
- Serge Abiteboul, Grégory Cobéna, Laurent Mignet
- INRIA-Rocquencourt
2Overview
- The Xyleme Project
- Change Management
- Version Management
- XIDs
- XML Diff
- Deltas
- Storage of XML documents versions
- Implementation and experiments
3The Xyleme Project
- A dynamic XML Data Warehouse with high level
services - User-friendly Query Engine
- Semantic Data Integration
- Version Management
- Query Subscription, Change Monitoring services
- Xyleme project is now finished
- Start-up also called Xyleme
4Change Management
- Version Management
- Learning about Changes
- Monitoring Changes Query Subscription
- Querying the PastTemporal Queries
5Version Management
- Our Requirements
- Obtain the current version
- Get the modifications since time t
- Subscribe to change notifications, query changes
- Compute temporal queries
- Rebuild the version Vi of a document at time ti
6Getting the Documents
- XML documents are fetched from the web
- We only have snapshots of the documents
7XIDs
- Unique identifiers needed to track XML nodes
through time - Track changes on a specific node (ex a product
in a catalog) - Reconstruct the history of a node
- But physically adding an ID attribute to each
node is expensive storage-wise - ? XIDs allow to attach persistent IDs to every
node in a storage efficient manner
8XIDs
- XIDs stored separately as a list (XID-map)
- List of the nodes IDs in a postorder traversal of
the tree - XIDnext gives the next available XID
- Compact Representation
- Document is not modified
9XML Diff
- We implemented a XML diff algorithm to compute
changes between two versions of a document - Use of XML structure for matching
- Content matching
- Linear in the size of the document
- XML diff has two roles
- Match nodes
- Build the delta
- Ongoing work on improving the XML diff
10Node Matching using a Diff Algorithm
Diff (V1,V2) delete(5) update(13,150) insert(16,2,
(17-21))
New XID-map (6-10,17-21,11-1622)
11Edit-Scripts SEQUENCE
- Sequences of basic operations over XML trees
- Delete(n)
- Update(n, v)
- Insert(m,k,T)
- Move(n,k,m)
- An Edit Script can be applied to a document D if
its operations are consistent with D - An Edit Script applied to a document D will
result in a unique document D - Several Edit Scripts applied to a document D can
result in the same document D
12Deltas (?) SET
- We introduce an alternative way of representing
changes Deltas - ?i,j (unit delta) contains the Set of operations
needed to go from Vi to Vj ( Diff(Vi,Vj) ) - A Delta (?) over a document D is the sequence of
unit deltas over D - ??1,2,..., ?k-1,k
- There is a (almost) unique delta from Vi to Vj
- We represent Deltas as XML documents
13Shortcomings of Deltas
- Deltas are not reversible and cannot be composed
(information on position is missing)
- Only a) and b) lossless
- But we would like to have fast access to
- Vnow
- ?i,now
- Storage Policies
- V1, ?1,2,?now-1,now
- ?2,1,?now,now-1, Vnow
- V1, ?2,1,?now,now-1
- ?1,2,?now-1,now, Vnow
14Completed Deltas (?)
- Completed deltas contain more information
- Delete(m,k,T)
- Update(n, ov, nv)
- Insert(m,k,T)
- Move(n,k,m,p,q)
- Completed Deltas can be reversed and composed
- Completed Deltas are in the spirit of some logs
in DB systems
15Example of XML ?
- ltdeltagt
- ltunit_deltagt
-
- lt/unit_deltagt
- ltunit_deltagt
- lttime from1 to2/gt
- ltdelete parent16 position1
xid-map(1-5)gt - ltProductgt
- ltNamegtCameralt/Namegt
- ltPricegt300lt/Pricegt
- lt/Productgt
- lt/deletegt
- ltupdate xid13 new_value150
old_value200/gt - ltinsert parent16 position2
xid-map(17-21)gt - ltProductgt
- ltNamegtDVDlt/Namegt
- ltPricegt500lt/Pricegt
- lt/Productgt
- lt/insertgt
16Operations on Deltas
- Compute with version
- Vi o ?i,j Vj
- Vi o ?i,j Vj
- Reverse (?i,j)-1 ?j,i
- Compose ?i,j?j,k ?i,k
- Simplify ?i,j ? ?i,j
17Storage of Versions
- For a document D (or a query result Q), we store
- Current Version Vk
- XID-map (as text) of Vk
- Current ? ?1,2,..., ?k-1,k
- When a new version k1 arrives
- Compute XML diff between k and k1, compute
?k,k1 - Replace current version Vk1
- Replace XID-map
- Append ?k,k1 to ?
18Levels of Versioning
- Full versioning is expensive, we support
different levels of versioning - Full Versioning Vnow ?
- Partial Versioning Vnow ?
- Last Version Update Vnow ?now-1,now
- Change Support Vnow XML diff computed for
Query Subscription - Not Versioned Vnow
19Implementation
- Version Manager and XML diff implemented in C
- A change simulator was implemented for tests
- A GUI was implemented
20GUI Interface
21Deltas Statistics
- Reasonable when there are not many modifications
- Relatively expensive for small documents
- Depends on the quality of the diff
22Deltas Statistics (2)
- 30 of modifications on the document
- From left to right
- Snapshots
- Completed Deltas
- Deltas composition and previous version
reconstruction are not possible - Composed Completed Deltas advantages of
Completed Deltas but coarser granularity and
higher cost.
23Conclusion
- Management of Versions based on Change
Representation - Representation in tree data (XML)
- Study of storage policies
- Implementation of running prototypes
- Completed Deltas a Set of Modifications
- Mathematical properties on completed deltas
(algebraic group) - Current work on Query Subscription, Continuous
Queries and Changes over Collections of Documents
24References
- Version Management
- Chien, Tsotras and Zaniolo. Efficient Management
of Multiversion Documents by Object Referencing.
VLDB 2001. - Chawathe, Abiteboul and Widom. Managing
Historical Semistructured Data. TAPOS 1999. - Cellary and Jomier. Consistency of Versions in
Object-Oriented Databases. VLDB 1990. - Adiba and Lindsay. Database Snapshots. VLDB 1980.
- Diff Algorithms
- Chawathe and Garcia-Molina. Meaningful Change
Detection in Structured Data. Sigmod 1997. - Cobena, Abiteboul and Marian. Detecting Changes
in XML Documents. Technical report INRIA. - Xyleme
- Cluet, Veltri and Vodislav. Views in a Large
Scale XML Repository. VLDB 2001. - Nguyen, Abiteboul, Cobena and Preda. Monitoring
XML data on the Web. Sigmod 2001.
25Example Edit-Scripts vs. Deltas
- A Possible Edit-Script
- Insert(B,1,P)
- Insert(C,1,P)
- The Delta
- Insert(B,2,P)
- Insert(C,1,P)
P
A
Version 0
Edit-Scripts Deltas
Relative position (at time of operation) Absolute position (final)
26Example Missing Information for Delta
Composition (?(0,2))
?(0,1) ?(1,2) ?(1,2)
Insert(B,2,P) Delete(C) Insert (D,2,P) Delete(C,1,P) Insert (D,2,P)
- Deltas do not give information on parents and
positions of deleted elements - Positions of inserted elements in composition
cannot be computed