Provenance: Problem, Architectural issues, Towards Trust - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Provenance: Problem, Architectural issues, Towards Trust

Description:

The Grid problem is defined as coordinated resource sharing and problem solving ... the result, especially after the virtual organisation has been disbanded? ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 26
Provided by: lucmo4
Category:

less

Transcript and Presenter's Notes

Title: Provenance: Problem, Architectural issues, Towards Trust


1
Provenance Problem, Architectural issues,
Towards Trust
  • Luc Moreau
  • L.Moreau_at_ecs.soton.ac.uk
  • University of Southampton

2
Contents
  • A definition of provenance
  • Example 1 Aerospace engineering
  • Example 2 Organ transplant management
  • Example 3 Bioinformatics grid
  • Provenance architecture
  • Towards Trust
  • Conclusion

3
The Grid and Virtual Organisations
  • The Grid problem is defined as coordinated
    resource sharing and problem solving in dynamic,
    multi-institutional virtual organisations
    FKT01.
  • Effort is required to allow users to place their
    trust in the data produced by such virtual
    organisations
  • Understanding how a given service is likely to
    modify data flowing into it, and how this data
    has been generated is crucial.

4
Provenance and Virtual Organisations
  • Given a set of services in an open grid
    environment that decide to form a virtual
    organisation with the aim to produce a given
    result
  • How can we determine the process that
    generated the result, especially after the
    virtual organisation has been disbanded?
  • The lack of information about the origin of
    results does not help users to trust such open
    environments.

5
Provenance and Workflows
  • Workflow enactment has become popular in the Grid
    and Web Services communities
  • Workflow enactment can be seen as a scripted form
    of virtual organisation.
  • The problem is similar how can we determine the
    origin of enactment results.

6
Provenance Definition
  • Provenance is an annotation able to explain how a
    particular result has been derived.
  • In a service-oriented architecture, provenance
    identifies what data is passed between services,
    what services are available, and what results are
    generated for particular sets of input values,
    etc.
  • Using provenance, a user can trace the process
    that led to the aggregation of services producing
    a particular output.

7
Provenance in Aerospace Engineering
  • Provenance requirement to maintain a historical
    record of outputs from each sub-system involved
    in simulations.
  • Aircrafts provenance data need to be kept for up
    to 99 years when sold to some countries.
  • Currently, little direct support is available for
    this.

8
Provenance in Organ Transplant Management
  • Decision support systems for organ and tissue
    transplant, rely on a wide range of data sources,
    patient data, and doctors and surgeons
    knowledge
  • Heavily regulated domain European, national,
    regional and site specific rules govern how
    decisions are made.
  • Application of these rules must be ensured, be
    auditable and may change over time
  • Provenance allows tracking previous decisions,
    which is crucial in maximising the efficiency in
    matching and recovery rate of patients

9
Provenance in a Bioinformatics Grid (myGrid)
  • myGrid builds a personalised problem-solving
    environment that helps bioinformaticians find,
    adapt, construct and execute in silico
    experiments
  • Keep the scientist informed as to the provenance
    of data relevant to their experiment space
  • Provenance in Drugs Discovery process
  • FDA requirement on drug companies to keep a
  • record of provenance of drug discovery as
    long
  • as the drug is in use (up to 50 years
    sometimes).

10
What is the problem?
  • Provenance recording should be part of the
    infrastructure, so that users can elect to enable
    it when they execute their complex tasks over the
    Grid or in Web Services environments.
  • Currently, the Web Services protocol stack and
    the Open Grid Services Architecture do not
    provide any support for recording provenance.

11
Architectural Vision
12
Architectural Vision
  • Provenance gathering is a collaborative process
    that involves multiple entities, including the
    workflow enactment engine, the enactment engine's
    client, the service directory, and the invoked
    services.
  • Provenance data will be submitted to one or more
    provenance repositories acting as storage for
    provenance data.
  • Upon user's requests, some analysis, navigation
    and reasoning over provenance data can be
    undertaken.

13
Architectural Vision
  • Storage could be achieved by a provenance
    service.
  • Provenance service would provide support for
    analysis, navigation or reasoning over provenance
  • Client side support for submitting provenance
    data to the provenance service.

14
A First Prototype (Szomszor,Moreau 03)
  • A service-oriented architecture for provenance
    support in Grid and Web Services environments,
    based on the idea of a provenance service
  • A client-side API for recording provenance data
    for Web Service invocation
  • A data model for storing provenance data
  • A server-side interface for querying provenance
    data
  • Two components making use of provenance
    provenance browsing and provenance validation.

15
Prototype Overview
16
Prototype Sequence Diagram
17
Prototype Provenance Data Model
18
Prototype Provenance Browser
19
Discussion
  • In order for provenance data to be useful, we
    expect such a protocol to support some
    classical properties of distributed algorithms.
  • Using mutual authentication, an invoked service
    can ensure that it submits data to a specific
    provenance server, and vice-versa, a provenance
    server can ensure that it receives data from a
    given service.
  • With non-repudiation, we can retain evidence of
    the fact that a service has committed to
    executing a particular invocation and has
    produced a given result.
  • We anticipate that cryptographic techniques will
    be useful to ensure such properties

20
Towards Trust
21
Towards Trust
  • Using the provenance of data, trust metrics of
    the data can be derived from
  • Trust the user places in invoked services
  • Trust the user places in the input data
  • Trust the user places in the enacted workflow
  • Trust the user places in the provenance service.

22
  • The purpose of project PASOA to investigate
    provenance in Grid architectures
  • Funded by EPSRC under the fundamental computer
    science for e-Science call
  • In collaboration with Cardiff
  • www.pasoa.org

23
Conclusion
  • Provenance is a rather unexplored domain
  • Strategic to bring trust in open environment
  • Necessity to design a configurable architecture
    capable of support multiple requirements from
    very different application domains.
  • Need to further investigate the algorithmic
    foundations of provenance, which will lead to
    scalable and secure industrial solutions.

24
Publications
  • SM03 Martin Szomszor and Luc Moreau. Recording
    and reasoning over data provenance in web and
    grid services. In International Conference on
    Ontologies, Databases and Applications of
    SEmantics (ODBASE'03), volume 2888 of Lecture
    Notes in Computer Science, pages 603-620,
    Catania, Sicily, Italy, November 2003.
  • MCS03 Luc Moreau, Syd Chapman, Andreas
    Schreiber, Rolf Hempel, Omer Rana, Lazslo Varga,
    Ulises Cortes, and Steven Willmott.
    Provenance-based trust for grid computing -
    position paper. 2003.

25
Acknowledgements
  • Martin Szomzor, Southampton
  • Syd Chapman, IBM
  • Omer Rana, Cardiff
  • Andreas Schreiber and Rolf Hempel, DLR
  • Lazslo Varga, SZTAKI
  • Ulises Cortes and Steven Willmott, UPC
  • Mark Greenwood, Carole Goble, Manchester
Write a Comment
User Comments (0)
About PowerShow.com