Provenance: overview - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Provenance: overview

Description:

the history or pedigree of a work of art, manuscript, rare book, etc. ... Recording and Using Provenance in a Protein Compressibility Experiment. ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 53
Provided by: lucmo4
Category:

less

Transcript and Presenter's Notes

Title: Provenance: overview


1
Provenance overview
  • Professor Luc Moreau
  • L.Moreau_at_ecs.soton.ac.uk
  • University of Southampton
  • www.ecs.soton.ac.uk/lavm

2
Provenance PASOA Teams
  • University of Southampton
  • Luc Moreau, Paul Groth, Simon Miles, Victor Tan,
    Miguel Branco, Sofia Tsasakou, Sheng Jiang, Steve
    Munroe, Zheng Chen
  • IBM UK (EU Project Coordinator)
  • John Ibbotson, Neil Hardman, Alexis Biller
  • University of Wales, Cardiff
  • Omer Rana, Arnaud Contes, Vikas Deora, Ian
    Wootten, Shrija Rajbhandari
  • Universitad Politecnica de Catalunya (UPC)
  • Steven Willmott, Javier Vazquez
  • SZTAKI
  • Laszlo Varga, Arpad Andics,
  • Tamas Kifor
  • German Aerospace
  • Andreas Schreiber, Guy Kloss,
  • Frank Danneman

3
Contents
  • Motivation
  • Provenance Concepts
  • Provenance Architecture
  • Standardisation
  • Conclusions

4
Motivation
5
Scientific Research
Academic Peer Review
6
Business Regulations
Accounting
Banking
7
Health Care Management
European Recommendation R(97)5 on the protection
of medical data
8
e-Science datasets
  • How to undertake peer-reviewing and validation of
    e-Scientific results?

9
Compliance to Regulations
  • The next-compliance problem
  • Can we be certain that by ensuring compliance to
    a new regulation, we do not break previous
    compliance?

10
Current Solutions
  • Proprietary, Monolithic
  • Silos, Closed
  • Do not inter-operate with other applications
  • Not adaptable to new regulations

11
Provenance
  • Oxford English Dictionary
  • the fact of coming from some particular source or
    quarter origin, derivation
  • the history or pedigree of a work of art,
    manuscript, rare book, etc.
  • concretely, a record of the passage
  • of an item through its various
  • owners.
  • Concept vs representation

12
Provenance in Computer Systems
  • Our definition of provenance in the context of
    applications for which process matters to end
    users
  • The provenance of a piece of data is the
    process that led to that piece of data
  • Our aim is to conceive a computer-based
    representation of provenance that allows us to
    perform useful analysis and reasoning to support
    our use cases

13
Our Approach
  • Define core concepts pertaining to provenance
  • Specify functionality required to become
    provenance-aware
  • Define open data models and protocols that allow
    systems to inter-operate
  • Standardise data models and protocols
  • Provide a reference implementation
  • Provide reasoning capability

14
Context (1)
  • Aerospace engineering maintain a historical
    record of design processes, up to 99 years.

Organ transplant management tracking of previous
decisions, crucial to maximise the efficiency in
matching and recovery rate of patients
15
Context (2)
Bioinformatics verification and auditing of
experiments (e.g. for drug approval)
High Energy Physics tracking, analysing,
verifying data sets in the ATLAS Experiment of
the Large Hadron Collider (CERN)
16
Provenance Concepts
17
Provenance Lifecycle
Core Interfaces to Provenance Store
Provenance Store
Query and Reason over Provenance of Data
Administer Store and its contents
18
Nature of Documentation
  • We represent the provenance of some data by
    documenting the process that led to the data
  • documentation can be complete or partial
  • it can be accurate or inaccurate
  • it can present conflicting or consensual views of
    the actors involved
  • it can provide operational details of execution
    or it can be abstract.

19
p-assertion
  • A given element of process documentation will be
    referred to as a p-assertion
  • p-assertion is an assertion that is made by an
    actor and pertains to a process.

20
Service Oriented Architecture
  • Broad definition of service as component that
    takes some inputs and produces some outputs.
  • Services are brought together to solve a given
    problem typically via a workflow definition that
    specifies their composition.
  • Interactions with services take place with
    messages that are constructed according to
    services interface specification.
  • The term actor denotes either a client or a
    service in a SOA.
  • A process is defined as execution of a workflow

21
Process Documentation (1)
From these p-assertions, we can derive that M3
was sent by Actor 1 and received by Actor 2 (and
likewise for M4)
Actor 2
Actor 1
M1
M3
M4
M2
If actors are black boxes, these assertions are
not very useful because we do not know
dependencies between messages
22
Process Documentation (2)
These assertions help identify order of
messages, but not how data was computed
Actor 2
Actor 1
M1
M3
M4
M2
23
Process Documentation (3)
These assertions help identify how data is
computed, but provide no information about
non-functional characteristics of the
computation (time, resources used, etc)
Actor 2
Actor 1
M1
M3
M4
M2
24
Process Documentation (4)
Actor 2
Actor 1
M1
M3
M4
M2
25
Types of p-assertions (1)
  • Interaction p-assertion is an assertion of the
    contents of a message by an actor that has sent
    or received that message

26
Types of p-assertions (2)
  • Relationship p-assertion is an assertion, made
    by an actor, that describes how the actor
    obtained an output message sent in an
    interaction by applying some function to input
    messages from other interactions (likewise for
    data)

27
Types of p-assertions (3)
  • Actor state p-assertion assertion made by an
    actor about its internal state in the context of
    a specific interaction

I used sparc processor I used algorithm
x version x.y.z
28
Data flow
  • Interaction p-assertions allow us to specify a
    flow of data between actors
  • Relationship p-assertions allow us to
    characterise the flow of data inside an actor
  • Overall data flow (internal external)
    constitutes a DAG, which characterises the
    process that led to a result

29
Provenance Architecture
30
Interfaces to Provenance Store
Provenance Store
Query and Reason over Provenance of Data
Administer Store and its contents
31
(No Transcript)
32
P-Assertion schemas
33
The p-structure
  • The p-structure is a common logical structure of
    the provenance store shared by all asserting and
    querying actors
  • Hierarchical
  • Indexed by interactions (interaction 1 message
    exchange)

34
Recording Protocol (Groth04-06)
  • Abstract machines
  • DS Properties
  • Termination
  • Liveness
  • Safety
  • Statelessness
  • Documentation Properties
  • Immutability
  • Attribution
  • Datatype safety
  • Foundation for adding necessary cryptographic
    techniques

35
Querying Functionality (Miles06)
  • Process Documentation Query Interface allows for
    navigation of the documentation of execution
  • Allows us to view the provenance store (i.e. the
    p-structure) as if containing XML data structures
  • Independent of technology used for running
    application and internal store representation
  • Seamless navigation of application dependent and
    application independent process documentation

36
Querying Functionality (Miles06)
  • Provenance Query Interface allows us to obtain
    the provenance of some specific data
  • A recognition that there is not one provenance
    for a piece of data, but there may be different,
    depending on the end-users interest
  • Hence, provenance is seen as the result of a
    query
  • Identify a piece of data at a specific execution
    point
  • Scope of the process of interest
  • Filter in/out p-assertions according to actors,
    process, types of relationships, etc

37
Standardisation
38
Standardisation Options
39
Purpose of Standardisation
Application
Application
Provenance Stores
Allow for multiple applications to document their
execution. Applications may be running in
different institutions.
40
Purpose of Standardisation
Application
Provenance Store
Provenance Store
Provenance Store
Allow for multiple stores from multiple IT
providers
41
Purpose of Standardisation
Provenance Store
Provenance Store
Query Provenance of Data
Allow for multiple stores from multiple IT
providers
42
Purpose of Standardisation
Convert in standard data format
Allow for legacy, monolithic applications to
expose their contents (according to standard
schema)
43
Purpose of Standardisation
Application
Allow third parties to host provenance stores,
which are trusted by application owners but also
auditors
44
Compliance Oriented Architectures
  • Separate execution documentation from compliance
    verification
  • Allows for multiple compliance verifications
  • Allows for validation to take place across
    multiple applications, possibly run by different
    institutions (in particular, allows for
    outsourcing and subcontracting).
  • Approach is suitable for e-scientific
    peer-reviewing and business compliance
    verification

45
Organ Transplant Scenario
Hospital
Electronic Healthcare Management Service
Testing Lab
46
Hospital Actors
User Interface
Donor Data Collector
Brain Death Manager
47
Whats on the CD
  • PReServ (Paul Groth Simon Miles)
  • Offer recording and querying interfaces
  • Available from www.pasoa.org
  • Soon ogsa-dai based version available from
    www.gridprovenance.org
  • Is being used in a bioinformatics application
    (cf. hpdc05, iswc05)

48
Conclusions
49
To Sum Up
Finance
Distribution
Aerospace
Standardising the documentation of Business
Processes
Healthcare
Automobile
Pharmaceutical
  • Compliance check
  • Rerun/Reproduce
  • Analyse

Query
Slide from John Ibbotson
50
Conclusions
  • Crucial topic for many applications
  • Full architectural specification
  • An implementation available for download
  • Methodology to make application provenance-aware
  • www.pasoa.org
  • www.gridprovenance.org

51
Publications
  • Paul Groth, Simon Miles, Weijian Fang, Sylvia C.
    Wong, Klaus-Peter Zauner, and Luc Moreau.
    Recording and Using Provenance in a Protein
    Compressibility Experiment. In Proceedings of the
    14th IEEE International Symposium on High
    Performance Distributed Computing (HPDC'05), July
    2005.
  • Paul Groth, Michael Luck, and Luc Moreau. A
    protocol for recording provenance in
    service-oriented Grids. In Proceedings of the 8th
    International Conference on Principles of
    Distributed Systems (OPODIS'04), Grenoble,
    France, December 2004.
  • Paul Groth, Michael Luck, and Luc Moreau.
    Formalising a protocol for recording provenance
    in Grids. In Proceedings of the UK OST e-Science
    second All Hands Meeting 2004 (AHM'04),
    Nottingham, UK, September 2004.
  • Simon Miles, Paul Groth, Miguel Branco, and Luc
    Moreau. The requirements of recording and using
    provenance in e-Science experiments. Technical
    report, University of Southampton, 2005.
  • Luc Moreau, Syd Chapman, Andreas Schreiber, Rolf
    Hempel, Omer Rana, Lazslo Varga, Ulises Cortes,
    and Steven Willmott. Provenance-based Trust for
    Grid Computing --- Position Paper. In , 2003.
  • Paul Townend, Paul Groth, and Jie Xu. A
    Provenance-Aware Weighted Fault Tolerance Scheme
    for Service-Based Applications. In Proc. of the
    8th IEEE International Symposium on
    Object-oriented Real-time distributed Computing
    (ISORC 2005), May 2005.
  • Paul Groth, Simon Miles, Victor Tan, and Luc
    Moreau. Architecture for Provenance Systems.
    Technical report, University of Southampton,
    October 2005.

52
Questions
Write a Comment
User Comments (0)
About PowerShow.com