Title: e-Science and Grid The VL-e approach
1e-Science and GridThe VL-e approach
L.O. (Bob) Hertzberger Computer Architecture and
Parallel Systems GroupDepartment of Computer
ScienceUniversiteit van Amsterdam bob_at_science.uva
.nl
2Background informationexperimental sciences
- Experiments become increasingly more complex
- Driven by detector developments
- Resolution increases
- Automation robotization increases
- Results in an increase in amount and complexity
of data - Something has to be done to harness this
development - Virtualization of experimental resources
e-Science
3The Application data crisis
- Scientific experiments start to generate lots of
data - medical imaging (fMRI) 1 GByte per
measurement (day) - Bio-informatics queries 500 GByte per database
- Satellite world imagery 5 TByte/year
- Current particle physics 1 PByte per year
- LHC physics (2007) 10-30 PByte per year
- Data is often very distributed
4Paradigm shift in Life science
- Past experiments where hypothesis driven
- Evaluate hypothesis
- Complement existing knowledge
- Present experiments are data driven
- Discover knowledge from large amounts of data
- Apply statistical techniques
5The what of e-Science
- e-Science is the application domain Science of
Grid Web - More than only coping with data explosion
- A multi-disciplinary activity combining human
expertise knowledge between - A particular domain scientist
- ICT scientist
- e-Science demands a different approach to
experimentation because computer is integrated
part of experiment - Consequence is a radical change in design for
experimentation - e-Science should apply and integrate Web/Grid
methods where and whenever possible
6Grid and Web ServicesConvergence
Grid
Web
Definition of Web Service Resource
Framework(WSRF) makes explicit distinction
between service and stateful entities acting
upon service i.e. the resources Means that
Grid and Web communities can move forward on a
common base
Ref Foster
7Grid service offerings
- Capability to run programs and scripts on remote
sites on demand - Ability to exchange and replicate large bulk-data
sets - Replica location services for files based on
logical names - Job monitoring using a distributed relational
information system - Resource brokering and transparent access to
remote facilities - Management of user groups, roles and access rights
8Relation to European Grid infrastructures
- Common European e-Infrastructure middleware
(EGEE) for core grid services - Based on successful EU DataGrid, CrossGrid, and
LCG software suite - Already deployed worldwide on a O(100) site
production facility - Support through EGEE Regional Operations Centre
(SARA and NIKHEF) - EGEE Enabling Grids for E-science in Europe (EU
FP6)
9Levels of Grid abstraction
Semantic/Knowledge Web/Grid
Information Web/Grid
Data Grid
Computational Grid
10e-Science Objectives
- It should enhance the scientific process by
- Stimulating collaboration by sharing data
information - Improve re-use of data information
- Combing data and information from different
modalities - Sensor data information fusion
- Realize the combination of real life (model
based) simulation experiments - It should result in
- Computer aided support for rapid prototyping of
ideas - Stimulate the creativity process
- It should realize that by creating applying
- New computing methodologies and an infrastructure
stimulating this - We try to do this via the Virtual Lab for
e-Science (VL-e) project -
-
-
11Virtual Lab for e-Science research Philosophy
- Multidisciplinary research development of
related ICT infrastructure - Generic application support
- Application cases are drivers for computer
computational science and engineering research -
-
12VL-e project
Data Intensive Science/ HEP
Bio- Informatics
Medical Diagnosis Imaging
Bio- Diversity
Food Informatics
Dutch Telescience
VL-e Application Oriented Services
Management of comm. computing
13Virtual Lab for e-Science research Philosophy
- Multidisciplinary research and development of
related ICT infrastructure - Generic application support
- Application cases are drivers for computer
computational science and engineering research - Problem solving partly generic and partly
specific - Re-use of components via generic solutions
whenever possible -
-
14Application Specific Part
Application Specific Part
Application Specific Part
Potential Generic part
Potential Generic part
Potential Generic part
Management of comm. computing
Virtual Laboratory Application Oriented Services
Management of comm. computing
Management of comm. computing
15Generic e-Science aspects
- Virtual Reality Visualization user interfaces
- Imaging
- Modeling Simulation
- Interactive Problem Solving
- Data information management
- Data modeling
- dynamic work flow management
- Content (knowledge) management
- Semantic aspects
- Meta data modeling
- Ontologies
- Wrapper technology
- Design for Experimentation
16Virtual Lab for e-Science research Philosophy
- Multidisciplinary research and development of
related ICT infrastructure - Generic application support
- Application cases are drivers for computer
computational science and engineering research - Problem solving partly generic and partly
specific - Re-use of components via generic solutions
whenever possible - Rationalization of experimental process among
others the experimental pipeline - Reproducible comparable
-
-
-
17Issues for a reproducible scientific experiment
experiment
interpretation
Much of this is lost when an experiment is
completed.
18Scientific Workflow Management Systems in an
e-Science environment
- Functionalities
- Automating experiment routines
- Rapid prototyping of experimental computing
systems - Hiding integration details between resources
- Managing experiment lifecycle
- Cross different layers of middleware for
managing - Data
- Computing
- Information
- Knowledge.
19Virtual Lab for e-Science research Philosophy
- Multidisciplinary research and development of
related ICT infrastructure - Generic application support
- Application cases are drivers for computer
computational science and engineering research - Problem solving partly generic and partly
specific - Re-use of components via generic solutions
whenever possible - Rationalization of experimental process
- Reproducible comparable
- Two research experimentation environments
- Proof of concept for application experimentation
- Rapid prototyping for computer computational
science experimentation -
-
-
20The VL-e infrastructure
Application specific service
Medical Application
Telescience
Bio ASP
Application Potential Generic service
Virtual Lab. services
Virtual Lab. rapid prototyping (interactive
simulation)
Test Cert. VL-software
Virtual Laboratory
Additional Grid Services (OGSA services)
Test Cert. Grid Middleware
Grid Middleware
Grid Network Services
Network Service (lambda networking)
Surfnet
Test Cert. Compatibility
VL-e Experimental Environment
VL-e Certification Environment
VL-e Proof of Concept Environment
21Infrastructure for Applications
- Applications are a driving force of the PoC
- Experience shows applications value stability
- Foster two-way interaction to make this happen
22VL-e PoC environment
- Latest certified stable software environment of
core grid and VL-e services - Core infrastructure built around clusters and
storage at SARA and NIKHEF (production quality) - Good basis for Tier-1
- Controlled extension to other platforms and
distributions - On the user end install needed servers user
interface systems, storage elements for data
disclosure, grid-secured DB access - Focus on stability and scalability
23Hosted services for VL-e
- Key services and resources are offered centrally
for all applications in VL-e - Mass data and number crunching on the large
resources at SARA - Storage for data replication distribution
- Persistent strategic storage on tape
- Resource brokers, resource discovery, user group
management
24Why such a complex scheme?
- software is part of the infrastructure
- stability of core software needed to develop the
new scientific applications - enable distributed systems management (who runs
what version when?)
the grid is one big error amplifier computers
make mistakes like humans, only much, much
faster
25Building a scalable infrastructure
- With good code, stable releases supportyou can
build large working systems, useful to science
26Conclusions
- e-Science is a lot more more than trying to cope
with data explosion alone - Implementation of e-Science systems requires
further rationalization and standardization of
experimentation process - e-Science success demands the realization of an
environment allowing - application driven experimentation
- rapid dissemination of feed back of these new
methods - We try to do that via development of Proof of
Concept - Good basis for HEP Tier-1