Title: The Anatomy of the Grid Enabling Scalable Virtual Organizations
1 The Anatomy of the GridEnabling Scalable
Virtual Organizations
- Ian Foster
- Mathematics and Computer Science Division
- Argonne National Laboratory
- and
- Department of Computer Science
- The University of Chicago
- http//www.mcs.anl.gov/foster
2Grids are hot
Computational Data Information Access Knowledge
APGrid
TeraGrid
but what are they really about?
3Issues I Propose to Address
- Problem statement
- Architecture
- Globus Toolkit
- Futures
4The Grid Problem
- Resource sharing coordinated problem solving
in dynamic, multi-institutional virtual
organizations
5Elements of the Problem
- Resource sharing
- Computers, storage, sensors, networks,
- Sharing always conditional issues of trust,
policy, negotiation, payment, - Coordinated problem solving
- Beyond client-server distributed data analysis,
computation, collaboration, - Dynamic, multi-institutional virtual orgs
- Community overlays on classic org structures
- Large or small, static or dynamic
6Grid Communities ApplicationsData Grids for
High Energy Physics
Image courtesy Harvey Newman, Caltech
7Grid Communities and ApplicationsNetwork for
Earthquake Eng. Simulation
- NEESgrid national infrastructure to couple
earthquake engineers with experimental
facilities, databases, computers, each other - On-demand access to experiments, data streams,
computing, archives, collaboration
NEESgrid Argonne, Michigan, NCSA, UIUC, USC
8Grid Communities and ApplicationsMathematicians
Solve NUG30
- Communityan informal collaboration of
mathematicians and computer scientists - Condor-G delivers 3.46E8 CPU seconds in 7 days
(peak 1009 processors) in U.S. and Italy (8
sites) - Solves NUG30 quadratic assignment problem
14,5,28,24,1,3,16,15, 10,9,21,2,4,29,25,22, 13,26,
17,30,6,20,19, 8,18,7,27,12,11,23
MetaNEOS Argonne, Iowa, Northwestern, Wisconsin
9Grid Communities and ApplicationsHome Computers
Evaluate AIDS Drugs
- Community
- 1000s of home computer users
- Philanthropic computing vendor (Entropia)
- Research group (Scripps)
- Common goal advance AIDS research
10Grid Architecture
11Why Discuss Architecture?
- Descriptive
- Provide a common vocabulary for use when
describing Grid systems - Guidance
- Identify key areas in which services are required
- Prescriptive
- Define standard Intergrid protocols and APIs to
facilitate creation of interoperable Grid systems
and portable applications
12What Sorts of Standards?
- Need for interoperability when different groups
want to share resources - E.g., IP lets me talk to your computer, but how
do we establish maintain sharing? - How do I discover, authenticate, authorize,
describe what I want to do, etc., etc.? - Need for shared infrastructure services to avoid
repeated development, installation, e.g. - One port/service for remote access to computing,
not one per tool/application - X.509 enables sharing of Certificate Authorities
13So, in Defining Grid Architecture, We Must
Address
- Development of Grid protocols services
- Protocol-mediated access to remote resources
- New services e.g., resource brokering
- On the Grid speak Intergrid protocols
- Mostly (extensions to) existing protocols
- Development of Grid APIs SDKs
- Facilitate application development by supplying
higher-level abstractions - The (hugely successful) model is the Internet
- The Grid is not a distributed OS!
14The Role of Grid Services(aka Middleware) and
Tools
net
15Layered Grid Architecture(By Analogy to Internet
Architecture)
16Protocols, Services, and InterfacesOccur at Each
Level
Applications
Languages/Frameworks
Collective Service APIs and SDKs
Collective Service Protocols
Collective Services
Resource APIs and SDKs
Resource Service Protocols
Resource Services
Connectivity APIs
Connectivity Protocols
Local Access APIs and Protocols
Fabric Layer
17Where Are We With Architecture?
- No official standards exist
- Nor is it clear what this would mean
- But
- Globus Toolkit has emerged as the de facto
standard for several important Connectivity,
Resource, and Collective protocols - GGF has an architecture working group
- Technical specifications are being developed for
architecture elements e.g., security, data,
resource management, information
18The Globus Toolkit
19Grid Services Architecture (1)Fabric Layer
- Just what you would expect the diverse mix of
resources that may be shared - Individual computers, Condor pools, file systems,
archives, metadata catalogs, networks, sensors,
etc., etc. - Few constraints on low-level technology
connectivity and resource level protocols form
the neck in the hourglass - Globus toolkit provides a few selected components
(e.g., bandwidth broker)
20Grid Services Architecture (2)Connectivity
Layer Protocols Services
- Communication
- Internet protocols IP, DNS, routing, etc.
- Security Grid Security Infrastructure (GSI)
- Uniform authentication authorization mechanisms
in multi-institutional setting - Single sign-on, delegation, identity mapping
- Public key technology, SSL, X.509, GSS-API
- Supporting infrastructure Certificate
Authorities, key management, etc.
21CREDENTIAL
Single sign-on via grid-id
Assignment of credentials to user proxies
Globus Credential
Mutual user-resource authentication
Site 2
Authorization
Authenticated interprocess communication
Mappingtolocal ids
Certificate
22GSI Futures
- Scalability in numbers of users resources
- Credential management
- Online credential repositories (MyProxy)
- Account management
- Authorization
- Policy languages
- Community authorization
- Protection against compromised resources
- Restricted delegation, smartcards
23GSI FuturesCommunity Authorization
1. CAS request, with
user/group
CAS
resource names
membership
Does the
and operations
collective policy
resource/collective
authorize this
2. CAS reply, with
membership
request for this
capability
and resource CA info
user?
collective policy
information
User
Resource
3. Resource request,
authenticated with
Is this request
capability
authorized by
the
local policy
capability?
information
4. Resource reply
Is this request
authorized for
the CAS?
24Grid Services Architecture (3)Resource Layer
Protocols Services
- Resource management GRAM
- Remote allocation, reservation, monitoring,
control of compute resources - Data access GridFTP
- High-performance data access transport
- Information MDS (GRRP, GRIP)
- Access to structure state information
- others emerging catalog access, code
repository access, accounting, - All integrated with GSI
25GRAM Resource Management Protocol
- Grid Resource Allocation Management
- Allocation, monitoring, control of computations
- Simple HTTP-based RPC
- Job request
- Returns a job contact Opaque string that can
be passed between clients, for access to job - Job cancel, Job status, Job signal
- Event notification (callbacks) for state changes
- Pending, active, done, failed, suspended
- Servers for most schedulers C and Java APIs
26Resource Management Futures
- GRAM-2 protocol (ETA late 2001)
- Advance reservations multiple resource types
- Recoverable requests, timeout, etc.
- Use of SOAP (RPC using HTTP XML)
- Policy evaluation points for restricted proxies
27Data Access Transfer
- GridFTP extended version of popular FTP protocol
for Grid data access and transfer - Secure, efficient, reliable, flexible,
extensible, parallel, concurrent, e.g. - Third-party data transfers, partial file
transfers - Parallelism, striping (e.g., on PVFS)
- Reliable, recoverable data transfers
- Reference implementations
- Existing clients and servers wuftpd, nicftp
- Flexible, extensible libraries
28Grid Services Architecture (4)Collective Layer
Protocols Services
- Index servers aka metadirectory services
- Custom views on dynamic resource collections
assembled by a community - Resource brokers (e.g., Condor Matchmaker)
- Resource discovery and allocation
- Replica management and replica selection
- Optimize aggregate data access performance
- Co-reservation and co-allocation services
- End-to-end performance
- Etc., etc.
29The Grid Information Problem
- Large numbers of distributed sensors with
different properties - Need for different views of this information,
depending on community membership, security
constraints, intended purpose, sensor type
30The Globus Toolkit Solution MDS-2
- Registration enquiry protocols, information
models, query languages - Provides standard interfaces to sensors
- Supports different directory structures
supporting various discovery/access strategies
31Resource Management Architecture
RSL specialization
RSL
Application
Information Service
Queries
Info
Ground RSL
DUROC MPICH-G2
Simple ground RSL
Local resource managers
GRAM
GRAM
GRAM
LSF
Condor
NQE
32Data Grid Architecture(See talk by Sudharshan
Vazhkudai)
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MDS
Selected Replica
Replica Selection
GridFTP commands
Performance Information Predictions
NWS
Disk Cache
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
- Virtual data transparency wrt location
and materialization (www.griphyn.org)
33Grid Futures
34Large GridProjectsare in Place
- DOE ASCI DISCOM
- DOE Particle Physics Data Grid
- DOE Earth Systems Grid
- DOE Science Grid
- DOE Fusion Collaboratory
- European Data Grid
- Egrid (see talk by G. Allen et al.)
- NASA Information Power Grid
- NSF National Technology Grid
- NSF Network for Earthquake Eng Simulation
- NSF Grid Application Development Software
- NSF Grid Physics Network
35Problem Evolution
- Past-present O(102) high-end systems Mb/s
networks centralized (or entirely local) control - I-WAY (1995) 17 sites, week-long 155 Mb/s
- GUSTO (1998) 80 sites, long-term experiment
- NASA IPG, NSF NTG O(10) sites, production
- Present O(104-106) data systems, computers Gb/s
networks scaling, decentralized control - Scalable resource discovery restricted
delegation community policy GriPhyN Data Grid
100s of sites, O(104) computers complex policies - Future O(106-109) data, sensors, computers Tb/s
networks highly flexible policy, control
36The FutureAll Software is Network-Centric
- We dont build or buy computers anymore, we
borrow or lease required resources - When I walk into a room, need to solve a problem,
need to communicate - A computer is a dynamically, often
collaboratively constructed collection of
processors, data sources, sensors, networks - Similar observations apply for software
37And Thus
- Reduced barriers to access mean that we do much
more computing, and more interesting computing,
than today gt Many more components ( services)
massive parallelism - All resources are owned by others gt Sharing (for
fun or profit) is fundamental trust, policy,
negotiation, payment - All computing is performed on unfamiliar systems
gt Dynamic behaviors, discovery, adaptivity,
failure
38Summary
- The Grid problem Resource sharing coordinated
problem solving in dynamic, multi-institutional
virtual organizations - Grid architecture Emphasize protocol and service
definition to enable interoperability and
resource sharing - Globus Toolkit as a source of protocol and API
definitions, reference implementations - For more info www.globus.org, www.griphyn.org,
www.gridforum.org