Title: On Explicit Provenance Management in RDFS Graphs
1 On Explicit Provenance Management in RDF/S Graphs
Panagiotis PediaditisGiorgos FlourisIrini
FundulakiVassilis Christophides pped, fgeo,
fundul, christop_at_ics.forth.gr
Institute of Computer Science Foundation for
Research and Technology Hellas Heraklion, Greece
2Provenance Management in RDF/S
- Provenance management problem
- Mostly addressed in the database context
- We are dealing with why provenance in RDF/S
graphs - Why provenance identifying the source data that
had some influence on the existence of the target
data - Three main characteristics (peculiarities of
RDF/S) - Triple-based representation
- Use quadruples to talk about triples provenance
- Inference
- Assign provenance information to implicit data
- Coherence semantics (in updates)
- Implicit data is a first-class citizen and should
be retained during change, along with its
provenance information
3Characteristic 1Triple-based Representation
4RDF Graphs
Define classes Paper rdftype rdfsClass PaperT
APP rdftype rdfsClass Person rdftype
rdfsClass Author rdftype rdfsClass Define
properties writes rdftype rdfProperty writes
rdfsdomain Author writes rdfsrange
Paper Instantiate (and define)
individuals Paper10 rdftype PaperTAPP Giorgos
rdftype Author Giorgos writes Paper10 Define
hierarchies PaperTAPP rdfssubClassOf
Paper Author rdfssubClassOf Person And other
stuff
RDF graph set of RDF triples
Paper
Person
writes
PaperTAPP
Author
Paper10
Giorgos
5Provenance in RDF Graphs
Paper
Person
PUB Paper rdftype rdfsClass TAPP PaperTAPP
rdftype rdfsClass PUB Person rdftype
rdfsClass PUB Author rdftype
rdfsClass PUB writes rdftype
rdfProperty PUB writes rdfsdomain
Author PUB writes rdfsrange
Paper TAPP Paper10 rdftype PaperTAPP TAPP G
iorgos rdftype Author TAPP Giorgos writes
Paper10 TAPP PaperTAPP rdfssubClassOf
Paper PUB Author rdfssubClassOf Person
writes
PaperTAPP
Author
Paper10
Giorgos
6Named Graphs and Provenance
- Create two named graphs and assign an ID (URI) to
each - Publications graph (URI PUB)
- TAPP graph (URI TAPP)
- Each named graph corresponds to a different
source - Need some method to associate named graphs with
triples - Triples become quadruples
- Fourth element is the URI of the named graph
(origin)
Paper
Person
writes
PaperTAPP
Author
Paper10
Giorgos
7Quadruples for Provenance
Paper rdftype rdfsClass PUB PaperTAPP
rdftype rdfsClass TAPP Person rdftype
rdfsClass PUB Author rdftype rdfsClass
PUB writes rdftype rdfProperty PUB writes
rdfsdomain Author PUB writes rdfsrange Paper
PUB Paper10 rdftype PaperTAPP TAPP Giorgos
rdftype Author TAPP Giorgos writes Paper10
TAPP PaperTAPP rdfssubClassOf Paper
TAPP Author rdfssubClassOf Person PUB All
quadruples of the form s p o PUB originate from
named graph PUB (Publications graph) All
quadruples of the form s p o TAPP originate
from named graph TAPP (TAPP graph)
Paper
Person
writes
PaperTAPP
Author
Paper10
Giorgos
8Properties of Named Graphs
- The named graph URI can be used to refer to the
named graph - Can be used for assignment of metadataTAPP
hasAuthor JamesCheney G - Granularity of provenance
- A triple is the smallest bit of information
- The granularity of provenance achieved by named
graphs is at the triple level - Flexible
- A named graph can contain 0,1, or many triples
- A triple can belong to 0,1, or many named graphs
Paper
Person
writes
PaperTAPP
Author
Paper10
Giorgos
9Characteristic 2Inference
10RDF/S Graphs
- RDF Schema add-on to RDF
- RDFS adds inference semantics
- Transitivity of subclass/subproperty
- Implicit instantiations
- Example
- Giorgos rdftype Author
- Author rdfssubClassOf Person
- Inference Giorgos rdftype Person
- Inferred knowledge is implicit
Paper
Person
writes
PaperTAPP
Author
Paper10
Giorgos
11Provenance and Inference
- Quadruples
- Giorgos rdftype Author PUB
- Author rdfssubClassOf Person TAPP
- Giorgos rdftype Person ???
- Needs
- Shared ownership
- A more sophisticated, compound structure
- Keeping the connection with the components
- Composition operator (PTPUB?TAPP)
- Giorgos rdftype Person PT
- Ok, but see characteristic 3
Paper
Person
writes
PaperTAPP
Author
Paper10
Giorgos
12Characteristic 3 Coherence Semantics (in
Updates)
13Foundational Semantics
- Foundational viewpoint (pyramid)
- Knowledge consists of the explicitly represented
knowledge - Only explicit knowledge can be changed
- Implicit knowledge is affected indirectly,
through the changes in the explicit knowledge (so
that the resulting pyramid is stable) - Explicit knowledge is more important than
implicit knowledge
Supported Knowledge
Implicit Knowledge
Explicit Knowledge
Basic Knowledge
14Coherence Semantics
- Coherence viewpoint (raft)
- No discrimination between explicit and implicit
knowledge - Both explicit and implicit knowledge can be
changed - Changes should be made coherently in order for
the resulting knowledge to make sense (so that
the raft is stable) - Explicit and implicit knowledge are of the same
value
Knowledge(includes both implicit and explicit
knowledge)
15Deletes
- Under coherence semantics
- Inferred knowledge needs to be made explicit
(when in danger of being lost) - Explicit assignment of shared origin to triples
- Explicit shared origin assignment
- Cannot use any composition operator
- Must be a first-class construct (autonomous)
- Retain the connection with its constituents
- A need, but also a useful feature
Paper
Person
writes
PaperTAPP
Author
Paper10
Giorgos
16RDF/S Graphsets
- Graphsets are like named graphs
- Have IDs (URIs)
- Used in quadruples
- Association of triples with graphsetsGiorgos
rdftype Person PT - Can be referred to (metadata)PT rdftype
Confidential G - Encode origin or shared origin
- Giorgos rdftype Person PT
- URI association (via skolem function)
- PT is the URI of PUB, TAPP
- PUB is the URI of PUB
- A named graph is a graphset
- PUB corresponds to PUB
Paper
Person
writes
PaperTAPP
Author
PT
Paper10
Giorgos
17Querying With RDF/S Graphsets
- Standard queries (original RQL)
- Give me the Persons Giorgos
- Provenance queries (extended RQL)
- Give me the Persons per PUB
- Give me the Persons per TAPP, PUBGiorgos
- Give me the sources per which Author is a
subclass of PersonPUB - Give me all the individual sourcesTAPP, PUB
Paper
Person
writes
PaperTAPP
Author
Paper10
Giorgos
18Validity and Redundancy Elimination
- Two invariants for RDF/S graphs
- Valid (per some validity rules)
- Redundant-free (space considerations)
- The invariants allow optimized execution of
queries - These invariants are imposed during change
- Improve query speed, but make updates more
difficult - Trade-off between having query overhead or update
overhead
19Updating With RDF/S Graphsets
- Updates supported through an extended version of
RUL - INSERT and DELETE
- Only for data (class and property instances)
- Implicit or explicit knowledge
- Take into account and update graphset
(provenance) information - Main considerations
- Apply the change (INSERT or DELETE)
- Respect invariants
- Non-redundancy (INSERT) and validity (DELETE)
- Make minimal changes (under coherence viewpoint)
- No unnecessary loss of information
- Take into account and preserve graphset
(provenance) information - Applicable upon quadruples
20Conclusion
- Objective assign provenance information to RDF/S
graphs to capture why provenance - Triple-based representation
- Turned triples into quadruples and used named
graphs to record the origin - Inference (per RDFS)
- Composed named graphs
- Coherence semantics in updates (deletes)
- Used graphsets for composed named graphs (cannot
use an operator) - Proposed query and update languages for graphsets
- Based on RQL, RUL
- Can be used to query/update provenance
information - Provided syntax and semantics, as well as an
implementation - Demo at http//139.91.183.303026/RULdemo/named_g
raph_demo/
21Thank You
22EXTRA SLIDES
23RDF/S Graphset Properties
- Three types of triples in a graphset
- Explicitly assigned triples
- Implicitly assigned triples (from the constituent
named graphs) - Implications of the above (per RDFS)
Paper
Person
PT
writes
PaperTAPP
Author
PT
Paper10
Giorgos
24Inserts and Deletes General Process
- INSERT
- Validity respected
- Must verify non-redundancy
- Process
- If INSERT is redundant ignore it
- Remove all redundant information (after insert)
- DELETE
- Must verify validity
- Non-redundancy respected
- Issues with inference and the coherence viewpoint
- Process
- If DELETE is void ignore it
- Make explicit all originally redundant
information that will be lost otherwise - Restore validity by removing property instances
if necessary