Capturing provenance data - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Capturing provenance data

Description:

Capturing provenance data. Dr Alison McKay (in place of Dr Richard Bagshaw) ... to present the DAME provenance research ... mean by 'provenance data'? What are ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 25
Provided by: Aus6
Category:

less

Transcript and Presenter's Notes

Title: Capturing provenance data


1
Capturing provenance data
  • Dr Alison McKay (in place of Dr Richard Bagshaw)
  • University of Leeds, School of Mechanical
    Engineering

2
Purpose of presentation
  • to present the DAME provenance research
  • to discuss the experiences of deploying this
    technology in a Grid based systems

3
Outline of presentation
  • What do we mean by provenance data?
  • What are we aiming for?
  • What does achieving this goal entail?
  • What progress has been made to date?
  • What remains to be done?

4
  • Provenance Data
  • Recording the history of data and its place of
    origin

5
DAME Provenance Architecture
Workflow Definition (BPEL)
6
Outline of presentation
  • What do we mean by provenance data?
  • What are we aiming for?
  • What does achieving this goal entail?
  • What progress has been made to date?
  • What remains to be done?

7
RR Integrated Product Development process

8
DAME provenance data users
Legal Implications
Contractual Obligations
Audit Trail
Troubleshooting
Re-run diagnosis
Provenance Requirement
9
Potential benefits
  • failure mode curves
  • Position and shape depend on
  • engine type (from PDM/SDM)
  • engine state (eg, age)
  • events (eg, from QUOTE data)

Time
this line shows when failure occurs its
position and shape depends upon its operating
environment
10
Specific tasks to be supported
  • Create an audit trail (Who, What, Where, Why,
    When, Which, hoW)
  • Re-execute a workflow process
  • repeat a workflow process (same Grid resources
    services, sequence and data)
  • rerun a workflow process (same Grid resources
    services and sequence on different data)

11
Outline of presentation
  • What are we aiming for?
  • What does achieving this goal entail?
  • What progress has been made to date?
  • What remains to be done?

12
Initial requirements
  • Support the re-execution of workflows with new
    data
  • Provide provenance data for the Workflow
    Advisor
  • Provide a viewer to captured provenance data

As opposed to repeating a given workflow using
the same data and resources
13
DSS perspective on requirements
  • Origin of data fully traceable
  • (Including time and date stamps)
  • Processed data traceable through application
    software
  • Any human interaction/annotations must be
    captured

14
Research issues
Specify
Define
Execute / deploy
Product
Product Data Management system
Service Data Manager
Process
Workflow process definition
Workflow execution data
15
Process definition (as defined)
GRID resource
callee
id
start
GRID resource usage
date_and_ time
resource
name
caller
end
description
process
outcome
why_used
executed_by
of
id
process definition
process element
(1)
description
process relationship
composition relationship
(1)
related
relating

process element relationship
connection relationship
16
Process definition (as executed)
Case
Workflow
Resource
Case_id User_id Open_date Close_date Flight_start_
date Deadline_date Tail_number Airline Airport Sta
nd Quote_diagnosis Quote_status Engineer Engineer_
active Engineer_why Analyst Analyst_active Analyst
_why Expert Expert_active Expert_why
Workflow_sequence_number Workflow_id Workflow_auth
or_id Workflow_name Workflow_description Workflow_
start_date Workflow_end_date Workflow_ip_data_type
Workflow_op_data_type Workflow_diagnosis Workflow
_status
Resource_sequence_number Resource_id Resource_name
Resource_type Resource_description Resource_start
_time Resource_end_time Resource_location Resource
_configuration Resource_version_number Resource_st
atus Resource_req_no_of_processors Resource_req_me
mory Resource_req_operating_system Resource_req_op
_sys_ver_number
17
MyGrid Workflow Provenance
  • Workflow instance capture
  • Workflow overview
  • Workflow ID, Status, Start Time, End Time, O/All
    input and outputs, Service List.
  • Service Invocations
  • Status, Start Time, End Time, WSDLURI, DataSets x
    2.
  • Inputs and Outputs
  • ID, Name, Type, Value

18
Outline of presentation
  • What do we mean by provenance data?
  • What are we aiming for?
  • What does achieving this goal entail?
  • What progress has been made to date?
  • What remains to be done?

19
Data interface GRID resource
20
BOM data viewer
21
Outline of presentation
  • What do we mean by provenance data?
  • What are we aiming for?
  • What does achieving this goal entail?
  • What progress has been made to date?
  • What remains to be done?

22
Remaining tasks
  • Support the re-execution of workflows with new
    data
  • Provide provenance data for the Workflow Advisor
  • Provide a viewer for captured provenance data
  • Provide audit trail for accountability purposes

23
Provenance research issues
  • Provenance requirements and scope
  • Provenance data security
  • Data storage format
  • Centralised provenance data
  • Stop points for audit trails
  • Repeatability of GRID resources

24
Longer term research
Specify
Define
Execute / deploy
Product
Product Data Management system
Requirements definition
Service Data Manager
Process
Workflow process definition
Workflow execution data
Workflow process specification
Write a Comment
User Comments (0)
About PowerShow.com