Title: Grid and e-Science Technologies
1Grid and e-Science Technologies
- Simon CoxTechnical DirectorSouthampton Regional
e-Science Centre
2Summary
- The Grid problem Resource sharing coordinated
problem solving in dynamic, multi-institutional
virtual organizations - Grid architecture Protocol, service definition
for interoperability resource sharing - Grid Middleware
- Globus Toolkit a source of protocol and API
definitionsand reference implementations - Open Grid Services Architecture represents next
step in evolution - Condor High throughput Computing
- Web Services W3C leveraging e-business
- e-Science Projects applying Grid concepts to
applications
3Grid Computing
4The Grid Problem
- Flexible, secure, coordinated resource sharing
among dynamic collections of individuals,
institutions, and resource - - The Anatomy of the Grid Enabling Scalable
Virtual Organizations by Foster, Kesselman and
Tuecke - Enable communities (virtual organizations) to
share geographically distributed resources as
they pursue common goals - assuming the absence
of - central location
- central control
- omniscience
- existing trust
5Why Grids? (1) e-Science
- A biochemist exploits 10,000 computers to screen
100,000 compounds in an hour - 1,000 physicists worldwide pool resources for
peta-op analyses of petabytes of data - Civil engineers collaborate to design, execute,
analyze shake table experiments - Climate scientists visualize, annotate, analyze
terabyte simulation datasets - An emergency response team couples real time
data, weather model, population data
6Grid Communities ApplicationsData Grids for
High Energy Physics
www.griphyn.org www.ppdg.net
www.eu-datagrid.org
7Network for EarthquakeEngineering Simulation
- NEESgrid national infrastructure to couple
earthquake engineers with experimental
facilities, databases, computers, each other - On-demand access to experiments, data streams,
computing, archives, collaboration
NEESgrid Argonne, Michigan, NCSA, UIUC, USC
8Online Access to Scientific Instruments
Advanced Photon Source
wide-area dissemination
desktop VR clients with shared controls
real-time collection
archival storage
tomographic reconstruction
DOE X-ray grand challenge ANL, USC/ISI, NIST,
U.Chicago
9Why Grids? (2) e-Business
- Engineers at a multinational company collaborate
on the design of a new product - A multidisciplinary analysis in aerospace couples
code and data in four companies - An insurance company mines data from partner
hospitals for fraud detection - An application service provider offloads excess
load to a compute cycle provider - An enterprise configures internal external
resources to support e-Business workload
10(No Transcript)
11Grids Why Now?
- Moores law Þ highly functional end-systems
- Ubiquitous Internet Þ universal connectivity
- Network exponentials produce dramatic changes in
geometry and geography - 9-month doubling double Moores law!
- 1986-2001 x340,000 2001-2010 x4000?
- New modes of working and problem solving
emphasize teamwork, computation - New business models and technologies facilitate
outsourcing
12Elements of the Problem
- Resource sharing
- Computers, storage, sensors, networks,
- Heterogeneity of device, mechanism, policy
- Sharing conditional negotiation, payment,
- Coordinated problem solving
- Integration of distributed resources
- Compound quality of service requirements
- Dynamic, multi-institutional virtual
organisations - Dynamic overlays on classic org structures
- Map to underlying control mechanisms
http//www.globus.org/research/papers/anatomy.pdf
13The Grid World Current Status
- Dozens of major Grid projects in scientific
technical computing/research education - Deployment, application, technology
- Some consensus on key concepts and technologies
- Open source Globus Toolkit a de facto standard
for major protocols services - Far from complete or perfect, but out there,
evolving rapidly, and large tool/user base - Global Grid Forum a significant force
- Industrial interest emerging rapidly
http//www.gridforum.org
14Grid Middleware
- (coordinate and authenticate use of grid
services) - Globus (and GGF grid-computing protocols)
- Security Infrastructure (GSI)
- Resource Allocation Mechanism (GRAM)
- Resource Information System (GRIS)
- Index Information Service (GIIS)
- Grid-FTP
- Metadirectory service (MDS 2.0) coupled to LDAP
server - Condor (distributed high performance throughput
system) - Condor-G allows us to handle dispatching jobs to
our Globus system - Active collaboration from with the Condor
development team at University of Wisconsin
(Miron Livny)
15The Globus ProjectMaking Grid computing a reality
- Close collaboration with real Grid projects in
science and industry - Development and promotion of standard Grid
protocols to enable interoperability and shared
infrastructure - Development and promotion of standard Grid
software APIs and SDKs to enable portability and
code sharing - The Globus Toolkit Open source, reference
software base for building grid infrastructure
and applications - Global Grid Forum Development of standard
protocols and APIs for Grid computing
http//www.gridforum.org http//www.globus.org
16Four Key Protocols
- The Globus Toolkit centers around four key
protocols - Connectivity layer
- Security Grid Security Infrastructure (GSI)
- Resource layer
- Resource Management Grid Resource Allocation
Management (GRAM) - Information Services Grid Resource Information
Protocol (GRIP) - Data Transfer Grid File Transfer Protocol
(GridFTP)
17The Globus Toolkit in One Slide
- Grid protocols (GSI, GRAM, ) enable resource
sharing within virtual orgs toolkit provides
reference implementation ( Globus Toolkit
services) - Protocols (and APIs) enable other tools and
services for membership, discovery, data mgmt,
workflow,
18Globus Toolkit Evaluation ()
- Good technical solutions for key problems, e.g.
- Authentication and authorization
- Resource discovery and monitoring
- Reliable remote service invocation
- High-performance remote data access
- This good engineering is enabling progress
- Good quality reference implementation,
multi-language support, interfaces to many
systems, large user base, industrial support - Growing community code base built on tools
19Globus Toolkit Evaluation (-)
- Protocol deficiencies, e.g.
- Heterogeneous basis HTTP, LDAP, FTP
- No standard means of invocation, notification,
error propagation, authorization, termination, - Significant missing functionality, e.g.
- Databases, sensors, instruments, workflow,
- Virtualization of end systems (hosting envs.)
- Little work on total system properties, e.g.
- Dependability, end-to-end QoS,
- Reasoning about system properties
20(No Transcript)
21What is Condor?
- Condor converts collections of distributively
owned workstations and dedicated clusters into a
distributed high-throughput computing facility. - Condor uses ClassAd Matchmaking to make sure that
everyone is happy. - Features
- Unix and NT
- Operational since 1986
- Manages more than 1300 CPUs at UW-Madison
- Software available free on the web
- More than 150 Condor installations worldwide in
academia and industry - Non-dedicated resources
- Job checkpoint and migration
22What is High-Throughput Computing?
- High-performance CPU cycles/second under ideal
circumstances. - How fast can I run simulation X on this
machine? - High-throughput CPU cycles/day (week, month,
year?) under non-ideal circumstances. - How fast can I run simulation X on this
machine? - How many times can I run simulation X in the
next month using all available machines?
23Some HTC Challenges
- Condor does whatever it takes to run your jobs,
even if some machines - Crash (or are disconnected)
- Run out of disk space
- Dont have your software installed
- Are frequently needed by others
- Are far away managed by someone else
24What is ClassAd Matchmaking?
- Condor uses ClassAd Matchmaking to make sure that
work gets done within the constraints of both
users and owners. - Users (jobs) have constraints
- I need an Alpha with 256 MB RAM
- Owners (machines) have constraints
- Only run jobs when I am away from my desk and
never run jobs owned by Bob.
25Condor Pool Architecture
26Mathematicians Solve NUG30
- Looking for the solution to the NUG30 quadratic
assignment problem - An informal collaboration of mathematicians and
computer scientists - Condor-G delivered 3.46E8 CPU seconds in 7 days
(peak 1009 processors) in U.S. and Italy (8 sites)
14,5,28,24,1,3,16,15, 10,9,21,2,4,29,25,22, 13,26,
17,30,6,20,19, 8,18,7,27,12,11,23
MetaNEOS Argonne, Iowa, Northwestern, Wisconsin
27What Is Condor-G?
- Enhanced version of Condor that provides robust
job management for Globus Toolkit - Robust replacement for globusrun
- Provides extensive fault-tolerance
- Brings Condors job management features to Globus
jobs - Two Parts
- Globus Universe
- GlideIn
- Excellent example of applying the general purpose
Globus Toolkit to solve a particular problem
(i.e. high-throughput computing) on the Grid
28Why Use Condor-G
- Condor
- Designed to run jobs within a single
administrative domain - Globus Toolkit
- Designed to run jobs across many administrative
domains - Condor-G
- Combine the strengths of both
29Web Services
- Increasingly popular standards-based framework
for accessing network applications - W3C standardization Microsoft, IBM, Sun, others
- XML and XML Schema
- Representing data in a portable format
- WSDL Web Services Description Language
- Interface Definition Language for Web services
- SOAP Simple Object Access Protocol
- XML-based RPC protocol common WSDL target
- WSDL (/ WS-Inspection)
- Conventions for locating service descriptions
- UDDI Universal Description, Discovery,
Integration - Directory for Web services
30(No Transcript)
31New GlobusOpen Grid Services Architecture
(OGSA)
- Service orientation to virtualize resources
- From Web services
- Standard interface definition mechanisms
multiple protocol bindings, multiple
implementations, local/remote transparency - Building on Globus Toolkit
- Grid service semantics for service interactions
- Management of transient instances ( state)
- Factory, Registry, Discovery, other services
- Reliable and secure transport
- Multiple hosting targets J2EE, .NET, C,
http//www.globus.org/research/papers/ogsa.pdf htt
p//www.globus.org/research/papers/gsspec.pdf
32OGSA Service Model
- System comprises (a typically few) persistent
services (potentially many) transient services - All services adhere to specified Grid service
interfaces and behaviours - Reliable invocation, lifetime management,
discovery, authorization, notification,
upgradeability, concurrency, manageability - Interfaces for managing Grid service instances
- Factory, registry, discovery, lifetime, etc.
- gt Reliable, secure management of distributed
state
33Using OGSAto Construct Grid Environments
In each case, Registry handle is effectively the
unique name for the virtual organization.
34Evolution of Globus
- Initial exploration (1996-1999 Globus 1.0)
- Extensive application experiments core protocols
- Data Grids (1999-?? Globus 2.0)
- Large-scale data management and analysis
- Open Grid Services Architecture (2001-??, Globus
3.0) - Integration with Web services, hosting
environments, resource virtualization - Databases, higher-level services
- Radically scalable systems (2003-??)
- Sensors, wireless, ubiquitous computing
35Summary
- The Grid problem Resource sharing coordinated
problem solving in dynamic, multi-institutional
virtual organizations - Grid architecture Protocol, service definition
for interoperability resource sharing - Grid Middleware
- Globus Toolkit a source of protocol and API
definitionsand reference implementations - Open Grid Services Architecture represents next
step in evolution - Condor High throughput Computing
- Web Services W3C leveraging e-business
- e-Science Projects applying Grid concepts to
applications