Title: Grid computing and e-Science
1Grid computing and e-Science
- Lecturer PhD. Ph?m Tr?n Vu
- Presenter Phan Quang Thi?n
- Tr?n Phu?c Hi?p
- Nguy?n Minh Nh?t
-
2Outline
- Whats e-science
- New modes of scientific inquiry
- Fault diagnosis and prognostic system
- Grid service for diagnostic problem
- Distributed Aircraft Maintenance
Environment(DAME) project - Conclusion
3Whats e-Science?
- e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it. - John Taylor
- Director General of Research Councils
- Office of Science and Technology
- Purpose of the UK e-Science initiative is to
allow scientists to do faster, better or
different research
4Cyberinfrastructure/e-Infrastructure and the
Grid (from NSF)
- At the heart of the cyberinfrastructure vision
is the development of a cultural community that
supports peer-to-peer collaboration and new modes
of education based upon broad and open access to
leadership computing data and information
resources online instruments and observatories
and visualization and collaboration services. - Dr. Arden L. Bement, Jr. , Director of National
Science Foundation - Includes not only computers but also data storage
resources and specialized facilities - Long term goal is to develop the middleware
services that allow scientists to routinely build
the infrastructure for their Virtual
Organisations
5New Modes Of Scientific
- Data-intensive science
- Simulation-Based Science
- Remote Access to Experimental Apparatus
6Data-intensive science
- Worldwide, scientists and engineers are
producing, accessing, analyzing, integrating and
storing terabytes of digital data daily through
experimentation, observation and simulation - These vast amount of data needs to be
preprocessed and distributed for further
analysis.
7Data-intensive science (Cont)
Annual data storage 12-14 PetaBytes/year
Each of the four LHC experiments will
generate several petabytes of experimental data
per year
8Simulation-Based Science
- The Japanese Earth Simulator was in 2003 running
numerical simulations of Earths climate at a
sustained rate of 40 teraflop/sec. - The U.S. Encyclopedia of Life (EOL) project.
- http//www.eol.org/
- The UK Comb-e-Chem project
- The goal of this project is to synthesize large
numbers of new compounds by high-throughput
combinatorial methods and then map their
structure and properties.
Structure Properties
Knowledge Prediction
9Remote Access to Experimental Apparatus
- The advance of technology is also producing
revolutionary new experimental apparatus. - Allow remote participants to design, execute, and
monitor experiments.
10Remote Access to Experimental Apparatus (Cont)
- Sharing engineering research equipment, data
resources, and leading edge computing resources. - Remote access to perform teleobservation and
- teleoperation of experiments.
11Virtual organizations for distributed communities
- The convergence of information, grid, and
networking technologies with contemporary
communications now enables science and
engineering communities to pursue their research
and learning goals in real-time and without
regard to geography. - The size and/or complexity of the problem
requires that people in several organizations
collaborate and share computing resources, data,
instruments - Virtual organization
- A set of individuals and/or institutions
defined by such sharing rules - In other words, VOs are dynamic
federations of heterogeneous organizational
entities sharing data, metadata, processing and
security infrastructure
12Framing New Infrastructures
- If you need huge Computing Power and/or Data
Storage - If do not have a supercomputer in your
institution - If you have access to a reasonable network
connection - ? Grid (Distributed Computing) could be a good
solution
13Client Server ad hock model
Scientist
14The Grid Model - Information Utilities
MIDLEWARE
Scientist
15Scientists
Need something here
16use Web 2.0 here
Grid
17The social process of science
Undergraduate Students
Digital Libraries
scientists
Graduate Students
experimentation
Data, Metadata Provenance WorkflowsOntologies
18An e-Science Grid Framework
19Scientific Workflows
- Capture individual data transformation and
analysis steps - Large monolithic applications broken down to
smaller jobs - Smaller jobs can be independent or connected by
some control flow/ data flow dependencies - Usually expressed as a Directed Acyclic Graph of
tasks - Allows the scientists to modularize their
application - Scaled up execution over several computational
resources
20Workflow
- Workflows orchestrate processes on the Grid
- Workflows are a processing model that incorporate
tasks, data, and rules. - Workflow management systems execute tasks on the
Grid using data once the tasks dependencies are
satisfied based on rules.
21Workflow (cont)
- A decision system that develops strategies for
reliable and efficient execution in a variety of
environments. - Reliable and scalable execution of
- dependent tasks
- Reliable, scalable execution of independent tasks
(locally, across the network), priorities,
scheduling
- Cyberinfrastructure Local machine, cluster, PBS
(Condor) pool, Grid
22Execute Environment
- Globus and Condor Services for job scheduling
- Globus Services for data transfer and Cataloging
- Information Services
- - information about data location
- - information about the execution sites
23The Grid Problem
- Everyday researchers doing everyday research
- BUT heroic Grid infrastructure not being
adopted - A data-centric perspective, like researchers
- BUT Grid gives APIs to computation not data
- Collaborative and participatory
- BUT Grid has deeply rooted service provider
mindset - Better not Perfect
- BUT Grid aims to provide well-engineered
perfect solution - Giving autonomy to researchers
- BUT Grid imposes institutional control (at this
time) - About pervasive computing
- BUT Grid is about portals, not the next
generation of users
24Summary
- e-Science is about doing new science
- Grid is just one part of the solution
- Users are not just consumers of infrastructure.
Empower them. - Think Web 2.0 on top of Grid and other services
- Workflows make e-Science easier, and Web 2 makes
workflows easier.
25Diagnosis and prognostic system
- Computer-based fault diagnosis and prognostic
(DP) - Arise in many domains medicine, engineering,
transport, and aero-space
26Operational Scenario
Engine flight data
London Airport
Airline office
New York Airport
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
27Diagnosis and prognostic (DP) System
- Data-centric
- Require complex interactions among agents
- Distributed
- Need to provide supporting and qualifying
evidence for the DP offered - Safety and business critical and high
dependability requirements
28Data Centricity
- Integrating data from several different system
for root cause determination - Require vast data repositories
- The types of data can also be highly diverse
- Not only sensor data but also non-declarative
knowledge - The interpretation of the knowledge can vary
among the entities
29Data Centricity
- Grid computing
- Knowledge and semantics (chapter 23)
- Solutions for the management and archiving of
large data repositories - Remote collection and distribution of data
- Coherent integration of information from diverse
databases (chapter 22)
30Multiple stakeholders
- Involve a number of stakeholders
- The system owner
- Experts
- The commercial service provider
- .
- Grid computing
- Interaction of diverse parts is inherent within
the Grid computing model
31Distribution
- Data storage, data mining, and fault diagnosis
may take place at different location - Across diverse IT systems
- The system can also be highly dynamic involving
a number of disparate entities (virtual, change
often)
32Distribution
- Grid computing
- The standardization of communication and
application protocols in the Grid paradigm - Grid portal support effective interactions with
users
33Data Provenance
- Transparency and trust results
- Steps to arrive at a decision
- Grid computing
- Develop open data communication protocols
- Meta-labeling schemes
34Dependability
- Guaranteed service availability
- Data security
- System security
35Dependability
- Grid computing
- Offer a security model to secure distributed
computing (chapter 21) - Address data access and data confidentiality
- The concept of guaranteed service and
quality-of-service (chapter 18)
36The aero-engine DP problem
- Modern aero-engine must operate with extremely
high reliability - Combine advanced mechanical engineering systems
with electronic control systems - Using engine sensor
- Prognostic applications
37DAME project
Engine flight data
London Airport
Airline office
New York Airport
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
38DAME project
- Principal challenges
- Vast data repositories
- Advanced pattern-matching and data-mining
methods with suitable response times - Collaboration among a number of diverse actors
39DAME service
DAME Diagnostics
Portal
...
Case Based
Modelling/
Decision
QUOTE
Support
Reasoning
Grid Services Management
Simulation
Novel
Data
The Grid
l
a
e
n
Data-Mining
c
s
a
a
o
i
t
a
e
t
i
t
v
r
t
t
a
n
a
a
a
a
w
r
a
r
i
t
e
D
D
D
a
g
P
e
a
S
p
n
R
D
O
E
Vibration
Shaft Speed
Fuel Flow
40Core services and tools
- Engine data service
- Data storage and mining service
- Engine modeling service
- Case-based reasoning support
- Maintenance interface service
41Engine data service
- Control the interaction between QUOTE system and
its communication to ground station - Establish the link to the Grid data repositories.
- Many replication of this service highly
transient
42Data storage and mining service
- Consists of the AURA patter-matching engine
system - Use specialized methods to rapidly search both
raw and archived engine data - Resemble data-mining service
43Engine modeling service
- Infer the current state of the engine
- Perform model-based data fusion
44Case-based reasoning support
- Use case-based reasoning to improve the knowledge
base - Capture fault DP methods in a procedural way
- Manage workflows associated with DP operations
- Build and maintain the DAME knowledge base
45Maintenance interface service
- Organize all interaction with stake-holders
involved in taking remedial actions - Capture information that helps validate or refine
the output from the preceding DP processes
46(No Transcript)
47(No Transcript)
48Conclusion
- Ambitious vision for the future of science and
engineering - The realization of this vision will require
long-term investments of financial resources - Should not underestimate the difficulty of the
technical challenges before realize the vision - The realization of these goals is extremely
important for the future of science and
engineering
49Q A
50Reference
- I. Foster and C. Kesselman, The Grid 2
Blueprint for a New Computing Infrastructure.
Morgab Kaufmann Publishers, 1999. - Cyberinfrastructure Vision for 21st Century
Discovery (NSF) - National e-Science centre http//www.nesc.ac.uk/
action/esi/ - Dame homepage http//www.cs.york.ac.uk/dame/