Title: Introduction to Kepler
1Introduction to Kepler
- Deana Pennington
- University of New Mexico
- February 4, 2005
2Scientific Workflows
- Model the way scientists work with their data now
- Mentally coordinate export and import of data
among software systems - Workflows emphasize data flow
- Metadata-driven data ingestion
- Output generation includes creating appropriate
metadata
Archive output to EcoGrid with workflow metadata
Query EcoGrid to find data
3Scientific Workflows are
- Not linear
- Involve multiple data sets
- Involve multiple analytical steps
4Productivity Example
Biomass
Temp
Soil
Et al.
f (
C
Concept
5Technology-enabled
Semantic Mediation System Kepler Workflow System
Ontologies
Executable Workflow
C
Concept
Workflow design Seamless execution
Automate TS
TS
DS
AS
AS
AS
AS
TS
TS
SS
DS
TS
Transformation Step
TS
SS
DS
TS
Semi-automatic data integration
SS
Sharing Step
6Automated Workflows
- Scripts Single platform
- Visual modeling Single environment
- Workflows
- Cross-platform
- Cross-environment
- Distributed data analyses
7Kepler today
- Supports scientific workflows
- Ecology, molecular bio, geology,
- Variety of analytical components (including
spatial data transformations) - Support for R scripts and Matlab scripts
- Real-time data access via Antelope ORB
- EcoGrid access to heterogeneous data
- EML Data support
- Experimental data, survey data, spatial raster
and vector data, etc. - DarwinCore Data support
- Museum collections
- EcoGrid registry to discover data sources
- Ontology-based browsing for analytical components
- Exploit semantics to improve the user experience
- Demonstration workflows
- Ecological Niche Modeling
- Promoter Identification Workflow
- Geologic Map Information Integration
- Real-time Revelle example of data access
8Kepler next year
- Usability engineering
- Full evaluation and user-oriented customization
of all UI components - Distributed computing/grid computing
- Large jobs, lots of machines
- Detached execution
- Smart data and component discovery
- Support annotating data sources
- Component repository / downloadable components
- Automated data and service integration and
transformation using ontologies - Complete EcoGrid access
- Full EML support
- Support for large data and 3rd-party transfer
- More data sources and types of data sources
(e.g., JDBC, GEON data)
9Starting point Ptolemy II
- Electrical engineering community
- Large mathematical library
Source Edward Lee et al. http//ptolemy.eecs.berk
eley.edu/ptolemyII/
10KeplerContributors, Projects, Sponsors
- Ilkay Altintas SDM
- Chad Berkley SEEK
- Shawn Bowers SEEK
- Tobin Fricke ROADNet
- Jeffrey Grethe BIRN
- Christopher H. Brooks Ptolemy II
- Zhengang Cheng SDM
- Dan Higgins SEEK
- Efrat Jaeger GEON
- Matt Jones SEEK
- Edward A. Lee Ptolemy II
- Kai Lin GEON
- Ashraf Memon GEON
- Bertram Ludaescher BIRN, GEON, SDM, SEEK
- Steve Mock NMI
- Steve Neuendorffer Ptolemy II
- Jing Tao SEEK
- Mladen Vouk SDM
- Xiaowen Xin SDM
E-Science Link-up Project
11Grid-enabled data queries
Grid get
- Grid-enabled data
- Any registered node
- Metadata driven
- Ontology-based
12EcoGrid Sources
13EML Metadata in Kepler
14Kepler Workflow System
- Grid-enabled analyses
- Any registered node
- Any platform (Unix, Windows, Mac)
- Any environment (C, SAS, GIS)
- Local programs
- Web application
- Web service
15Biodiversity Indices in Kepler
16R in Kepler
Source Dan Higgins, Kepler/SEEK
17(No Transcript)
18Director/Actor Metaphor
Actor
Director
Actor
Actors know HOW to act..know their part Directors
know WHEN they should act
Actor
Examples Process Network procedural, single
point in time Synchronized Data Flow subset of
Process Net Continuous Time all points in time
- Models of computation
- Behavioral polymorphism
19Actors
actor name
data
parameters
Input data
Output data
ports
20Actors
actor name
data
parameters
Input data
Output data
ports
2 output ports
21Right-click menu
22Editing parameters
Double-click or right-click
0 to many
23Configuring Ports
Right-click
String Int Double array
User-defined
24Procedure
- Open a new workflow
- Add a director
- Search for data (optional)
- Add data source (optional)
- Add an actor
- Edit parameters
- Add ports (if needed)
- Configure ports
- Add another actor
- Hook up input/output ports
25 26Acknowledgements
This material is based upon work supported by the
National Science Foundation under awards 0225676
for SEEK and 0225673 (AWSFL008-DS3) for GEON and
by the Department of Energy under Contract No.
DE-FC02-01ER25486 for SciDAC/SDM and by DARPA
under Contract No. F33615-00-C-1703 for Ptolemy.
Any opinions, findings and conclusions or
recomendations expressed in this material are
those of the author(s) and do not necessarily
reflect the views of the National Science
Foundation (NSF). The National Center for
Ecological Analysis and Synthesis, a Center
funded by NSF (Grant Number 0072909), the
University of California, and the UC Santa
Barbara campus. The Andrew W. Mellon
Foundation. PBI Collaborators NCEAS, University
of New Mexico (Long Term Ecological Research
Network Office), San Diego Supercomputer Center,
University of Kansas (Center for Biodiversity
Research) Kepler contributors SEEK, Ptolemy II,
SDM/SciDAC, GEON