Title: Update Exchange with Provenance
1Todd J. Green, Grigoris Karvounarakis, Nicholas
E. Taylor, Olivier Biton, Zachary G. Ives, Val
Tannen University of Pennsylvania
Facilitating Collaborative Data Sharing
- Update Exchange with Provenance
- Schemas are related by GLAV schema mappings
(tgds) - M4 Domain_Ref(SrcID, 'Interpro', ITAcc),
Entry2Meth(ITAcc, DBAcc, DB) ? Domain_Ref(SrcID,
DB, DBAcc) - M5 Domain_Ref(SrcID, 'Interpro', ITAcc),
Interpro2Go(ITAcc, goID), Term(_, goName, goID) ?
GoTerm(goID, goName) - Provenance encodes all derivations of each tuple
through the schema mappings
- Application
- Scientific research groups (e.g., biologists)
maintain independent but related warehouses,
where they store and continuously revise their
data - Each group wants to incorporate and curate all
relevant data from other warehouses that they
trust, requiring - Translation of data between different schemas, as
it is updated - Reconciliation of conflicts among data from
different sources - Collaborative data sharing addresses these needs,
facilitating exchange of data among autonomous
sites. - Setting
- Each participant has a local database instance
that they query and update, as well as - Schema mappings specifying data correspondences
- Trust conditions over sources and mappings,
specifying how to filter and prioritize others
updates - Orchestra Operation
- A site periodically publishes the updates it
wants to share - The site may also import updates from elsewhere,
using - 1. Update exchange, which converts all trusted
updates into the requestors schema, using schema
mappings - 2. Reconciliation, which resolves any conflicts
among the trusted updates using trust priorities
resulting in a consistent instance - As part of the process, sites maintain provenance
information, and use it to - Evaluate trust conditions and priorities
- Maintain local instances incrementally
Reconciliation Citations T.J. Green, G.
Karvounarakis, Z. Ives, V. Tannen, Update
Exchange with Mappings and Provenance, submitted
for publication, 2007. T.J. Green, G.
Karvounarakis, V. Tannen, Provenance Semirings,
PODS 2007. N. Taylor and Z. Ives, Reconciling
while Tolerating Disagreement in Collaborative
Data Sharing, SIGMOD 2006. Ives et al.,
ORCHESTRA Rapid, Collaborative Sharing of
Dynamic Data, CIDR 2005. Acknowledgments Work
funded in part by NSF IIS-0477972 and
IIS-0513778 For More Information http//www.cis.up
enn.edu/zives/orchestra
Conflicts with deferred transaction
Participant reconciles at this point
(A,1)è(Q,1)
(Q,5)
Transaction modifies data from an earlier
transaction
(Z,4)
Conflicts with accepted transaction
Conflict Different changes to same tuple
(A,1) (B,2)
(B,3)
Depends on deferred transaction
Accepted
(A,1)è(R,1)
(R,1)è(S,1)
Deferred
Time earlier ? later
Rejected
All transactions are trusted at the same
priority. First attribute is a key for the
relation.
PCBI PLASMODB
GO GOTERM
Domain_Ref
GoTerm
Term
src
_
id
,
db
,
dbacc
goID
,
go
_
name
id
,
name
,
acc
M
5
M
4
EBI INTERPRO
Interpro2Go
Entry2Meth
itacc
,
goid
entry
_
ac
,
method
_
ac
,
db
nodes indicate updates made by the local site