Title: Capturing provenance data
1Capturing provenance data
- Dr Alison McKay (in place of Dr Richard Bagshaw)
- University of Leeds, School of Mechanical
Engineering
2Purpose of presentation
- to present the DAME provenance research
- to discuss the experiences of deploying this
technology in a Grid based systems
3Outline of presentation
- What do we mean by provenance data?
- What are we aiming for?
- What does achieving this goal entail?
- What progress has been made to date?
- What remains to be done?
4- Provenance Data
- Recording the history of data and its place of
origin
5DAME Provenance Architecture
Workflow Definition (BPEL)
6Outline of presentation
- What do we mean by provenance data?
- What are we aiming for?
- What does achieving this goal entail?
- What progress has been made to date?
- What remains to be done?
7RR Integrated Product Development process
8DAME provenance data users
Legal Implications
Contractual Obligations
Audit Trail
Troubleshooting
Re-run diagnosis
Provenance Requirement
9Potential benefits
- failure mode curves
- Position and shape depend on
- engine type (from PDM/SDM)
- engine state (eg, age)
- events (eg, from QUOTE data)
Time
this line shows when failure occurs its
position and shape depends upon its operating
environment
10Specific tasks to be supported
- Create an audit trail (Who, What, Where, Why,
When, Which, hoW) - Re-execute a workflow process
- repeat a workflow process (same Grid resources
services, sequence and data) - rerun a workflow process (same Grid resources
services and sequence on different data)
11Outline of presentation
- What are we aiming for?
- What does achieving this goal entail?
- What progress has been made to date?
- What remains to be done?
12Initial requirements
- Support the re-execution of workflows with new
data - Provide provenance data for the Workflow
Advisor - Provide a viewer to captured provenance data
As opposed to repeating a given workflow using
the same data and resources
13DSS perspective on requirements
- Origin of data fully traceable
- (Including time and date stamps)
- Processed data traceable through application
software - Any human interaction/annotations must be
captured
14Research issues
Specify
Define
Execute / deploy
Product
Product Data Management system
Service Data Manager
Process
Workflow process definition
Workflow execution data
15Process definition (as defined)
GRID resource
callee
id
start
GRID resource usage
date_and_ time
resource
name
caller
end
description
process
outcome
why_used
executed_by
of
id
process definition
process element
(1)
description
process relationship
composition relationship
(1)
related
relating
process element relationship
connection relationship
16Process definition (as executed)
Case
Workflow
Resource
Case_id User_id Open_date Close_date Flight_start_
date Deadline_date Tail_number Airline Airport Sta
nd Quote_diagnosis Quote_status Engineer Engineer_
active Engineer_why Analyst Analyst_active Analyst
_why Expert Expert_active Expert_why
Workflow_sequence_number Workflow_id Workflow_auth
or_id Workflow_name Workflow_description Workflow_
start_date Workflow_end_date Workflow_ip_data_type
Workflow_op_data_type Workflow_diagnosis Workflow
_status
Resource_sequence_number Resource_id Resource_name
Resource_type Resource_description Resource_start
_time Resource_end_time Resource_location Resource
_configuration Resource_version_number Resource_st
atus Resource_req_no_of_processors Resource_req_me
mory Resource_req_operating_system Resource_req_op
_sys_ver_number
17MyGrid Workflow Provenance
- Workflow instance capture
- Workflow overview
- Workflow ID, Status, Start Time, End Time, O/All
input and outputs, Service List. - Service Invocations
- Status, Start Time, End Time, WSDLURI, DataSets x
2. - Inputs and Outputs
- ID, Name, Type, Value
18Outline of presentation
- What do we mean by provenance data?
- What are we aiming for?
- What does achieving this goal entail?
- What progress has been made to date?
- What remains to be done?
19Data interface GRID resource
20BOM data viewer
21Outline of presentation
- What do we mean by provenance data?
- What are we aiming for?
- What does achieving this goal entail?
- What progress has been made to date?
- What remains to be done?
22Remaining tasks
- Support the re-execution of workflows with new
data - Provide provenance data for the Workflow Advisor
- Provide a viewer for captured provenance data
- Provide audit trail for accountability purposes
23Provenance research issues
- Provenance requirements and scope
- Provenance data security
- Data storage format
- Centralised provenance data
- Stop points for audit trails
- Repeatability of GRID resources
24Longer term research
Specify
Define
Execute / deploy
Product
Product Data Management system
Requirements definition
Service Data Manager
Process
Workflow process definition
Workflow execution data
Workflow process specification