Title: Schema Mediated Exchange of Temporal XML Data
1Schema Mediated Exchange of Temporal XML Data
- Curtis Dyreson Washington State
University - Richard T. Snodgrass University of Arizona
- Sabah Currim University of
Arizona - Faiz Currim University of
Iowa
2Scenario
- Genomic data from NCBI
- Data collection is growing/changing
- Want data and data provenance (who, what, when, )
3Obtaining Web Data
Overwrite D
Write D
4Data Evolves
- Download XML formatted data (as of January 1)
- ltgene name"TRY4"gt
- ltdescgttrypsin 4lt/descgt
- ltontology ref"MGI" function"unknown"/gt
- lt/genegt
- Download again (as of March 6)
- ltgene name"TRY4"gt
- ltdescgttrypsin 4, beta-cell receptorlt/descgt
- ltontology ref"MGI" function"synthesizes
trypsinogen"/gt - lt/genegt
5Refreshing the Data (using SDOs)
- What about versions between D and Dold?
- Did I download valid data?
- My DB is pretty big
6Did I Download the Right Data?
Schema
Namespace
Validating Parser
XML Data
7Fragment of the Genomic Schema
ltelement name"gene"gt ltcomplexTypegt
ltattribute name"name" type"text"
use"required"/gt ltsequencegt
ltelement name"desc" type"string"/gt
ltelement ref"ontology" minOccurs"0"
maxOccurs"unbounded"/gt lt/sequencegt
lt/complexTypegt lt/elementgt
8Uses of an XML Schema
- Validation
- XML editors
- Guides query formulation
- Query optimization
- Provides a web service binding
9A Temporal Data Collection
- Validate the delta with the temporal schema
- cost is size of change
10Outline
- Motivation
- tXSchema
- Architecture
- Summary
11Goals for a Temporal Schema
- Make it easy to create a schema for temporal data
- Identify which data is temporal
- Upwards compatibility
- Minimal extensions of XML Schema
- Reuse off-the-shelf parsers/tools
- Support
- Valid and transaction time
- Data (element) versioning
- Schema versioning
- Logical/physical independence
- Flexible timestamp representation and location
12Persistent Elements
-
- An item is an element that persists across
snapshots. - Item identifier (like a temporally-invariant key)
- lttxsitemIdentifiergt
- lttxsfield path_at_name/gt
- lt/txsitemIdentifiergt
January snapshot
March snapshot
13Extend a Snapshot Schema
- Specify which elements are temporal
- Temporal elements have
- Item identifiers
- Simple constraints (state/event,
existence/content-varying) - ltelement name"gene"gt
- lttxstemporalgt
- lttxsitemIdentifiergt
- lttxsfield path"_at_name"/gt
- lt/txsitemIdentifiergt
- lttxstransactionTime kind"state"
contentVarying"true" - existenceVarying"no gaps"/gt
- lt/txstemporalgt
- .definition of gene from the snapshot
schema omitted for space - lt/elementgt
14Versions
-
- A version is a change in an item.
- DOM inequivalence
January snapshot
March snapshot
15Temporal Genomic Data
ltdataTemporalgt ltdatagtltgeneTemporal
itemRef"1"/gtlt/datagt ltgeneItem itemId"1"gt
ltgeneVersiongtlttime start"2005-01-01"
end"2005-03-05"/gt ltgene name"TRY4"gt
ltdescgttrypsin 4lt/descgt
ltontologyTemporal itemRef"2"/gt lt/genegt
lt/geneVersiongt ltgeneVersiongtlttime
start"2005-03-06" end"now"/gt next
version of gene lt/geneVersiongt
lt/geneItemgt ltontologyItem itemId"2"gt ontology
item lt/dataTemporalgt
16Outline
- Motivation
- tXSchema
- Architecture
- Summary
17Validating Temporal Data
- Snapshot data validated with a snapshot schema
- Construct a representational schema (details in
paper) - Can also validate the delta
Snapshot Schema
Namespace
Validating Parser
XML Data
18Property of a Good Construction
- Every snapshot must conform to the snapshot
schema
Temporal Schema
(Temporal) Validating Parser
Valid
Temporal data
Valid
19Outline
- Motivation
- tXSchema
- Architecture
- Summary
20Related Work Temporal XML
- Change detection and management
- Nguyen, Abiteboul, Cobena, Preda, SIGMOD 2001
- Xylemes Alerter, described in Data Engineering
Bulletin, 2001 - Dyreson, Lin, Wang WWW 2004
- Leonardi, Bhowmick, ER 2006
- Representing time-varying XML documents
(versioning) - Chawathe, Abiteboul, Widom, ICDE 1998
- Dyreson, Böhlen, Jensen, VLDB 1999
- Chien, Tsotras, Zaniolo, VLDB 2000
- Marian, Abiteboul, Cobena, Mignet, VLDB 2001
- Buneman, Khanna, Tajima, Tan, SIGMOD 2002, TODS
2004 - Rosado, Marquez, Gonzalez, ECDM 2006
- XML Versioning Use Cases (W3C)
21Related Work XML Schemas
- XML Schema languages
- Many, but XML Schema is backed by the W3C
- Incremental XML validation
- Bouchou Halfeld-Ferrari, DBPL 2003
- Papkonstantinou Vianu, ICDT 2003
- Barbosa, Mendelzon, Libkin, Mignet, Arenas, ICDE
2004 - Temporal XML schemas
- Currim, Currim, Dyreson, Snodgrass, EDBT 2004
- Dyreson, Snodgrass, Currim, Currim, Joshi, XSDM
2006
22An Overarching Vision
- Aspect-oriented programming
- Cross-cutting concerns
- Augment behavior without changing the code
- Example aspects logging, garbage collection
Program .java
Aspect .java
23Aspects for Data?
- What are cross-cutting concerns?
- Milieu of metadata
- Time is an aspect
24Aspects in Schema Design
- Schema for aspect schema for data
- Our paper describes the plumbing for a temporal
aspect
data (snapshot) schema
aspect schema
schema tapestry
schema weaver
25Our Contributions
- Temporal schema specification
- What is time-varying
- Some simple constraints
- Validate temporal data
- ?Dt-now cost
- Upwards compatible with XML Schema
- Handle schema evolution (Dyreson et al., XSDM
06) - Suite of tools
- Reuse and extend existing tools
- www.cs.arizona.edu/tau
26tXSchema Project Tools (Beta)
- t VALIDATOR Validating temporal XML document
for conventional and temporal constraints - SQUASH Generating a temporal document from a
sequence of snapshot documents - UNSQUASH Extracting snapshot documents from a
temporal document - RESQUASH Changing a document representation to
be consistent with the new physical annotation.