Title: Grid Computing with the Globus Toolkit 2.2
1Grid Computing withthe Globus Toolkit 2.2
- Jennifer Schopf
- for the
- The Globus ProjectArgonne National
LaboratoryUSC Information Sciences Institute - www.globus.org
2Overview
- Introduction to Grids
- Why Grids and Globus
- Some definitions
- The Globus Toolkit Core Services
- Grid security infrastructure
- Resource management
- Information infrastructure
- Data management services
- Recap and conclusions
3Grid Scenarios
- A biochemist exploits 10,000 computers to screen
100,000 compounds in an hour - 1,000 physicists worldwide pool resources for
peta-op analyses of petabytes of data - An emergency response team couples real time
data, weather model, population data - Engineers at a multinational company collaborate
on the design of a new product - An application service provider offloads excess
load to a compute cycle provider
4The Grid Problem
- Resource sharing coordinated problem solving
in dynamic, multi-institutional settings
5The Fundamental Concept
- Enable communities (virtual organizations)
to share geographically distributed resources as
they pursue common goalsin the absence of
central control, omniscience, trust relationships
6Globus Toolkit
- Globus Toolkit is the source of many of the
protocols described in Grid architecture - Adopted by almost all major Grid projects
worldwide as a source of infrastructure - Open source, open architecture framework
encourages community development - Active RD program continues to move technology
forward - Developers at ANL, USC/ISI, NCSA, LBNL, and other
institutions
www.globus.org
7The Globus Toolkit
- Tools enabling resource sharing
- GSI (Grid Security Infrastructure)
- Authentication based on Grid-wide credential
- Single sign-on, delegation
- Authorization
- GRAM (Grid Resource Allocation and Management)
- Tool for remote job and resource management
- MDS (Monitoring and Discovery Service)
- Grid-wide information on the state of resources
- Data Services
- GASS
- GridFTP
- Replica Management
- Protocols and APIs
8Globus Applications and Deployments
- Application projects include
- GriPhyN, PPDG, NEES, EU DataGrid, ESG, Fusion
Collaboratory, etc., etc. - Infrastructure deployments include
- DISCOM, NASA IPG, NSF TeraGrid, DOE Science Grid,
EU DataGrid, etc., etc. - UK Grid Center, U.S. GRIDS Center
- Technology projects include
- Data Grids, Access Grid, Portals, CORBA,
MPICH-G2, Condor-G, GrADS, etc., etc.
9Grid Communities ApplicationsData Grids for
High Energy Physics
Image courtesy Harvey Newman, Caltech
10Globus Toolkit v2.2
- GT2.0 released in April 2002
- GridFTP replica management additions
- Repackaged
- GRAM reliability improvements
- Numerous small changes and bug fixes
- We follow the Linux versioning model
- Even number releases are stable releases
- Odd number releases are experimental
- This talk covers v2.2
11Some Important Definitions
- Resource
- Network protocol
- Application Programmer Interface (API)
12Resource
- An entity that is to be shared
- E.g., computers, storage, data, software
- Does not have to be a physical entity
- E.g., Condor pool, distributed file system,
- Defined in terms of interfaces, not devices
- E.g. scheduler such as LSF and PBS define a
compute resource - Open/close/read/write define access to a
distributed file system, e.g. NFS, AFS, DFS
13Network Protocol
- A formal description of message formats and a set
of rules for message exchange - Rules may define sequence of message exchanges
- Protocol may define state-change in endpoint,
e.g., file system state change - Good protocols designed to do one thing
- Protocols can be layered
- Examples of protocols
- IP, TCP, TLS (was SSL), HTTP, Kerberos
14Application Programmer Interface
- A specification for a set of routines to
facilitate application development - Refers to definition, not implementation
- E.g., there are many MPI implementations
- Spec often language-specific (or IDL)
- Routine name, number, order and type of
arguments mapping to language constructs - Behavior or function of routine
- Examples
- GSS API (security), MPI (message passing)
15A Protocol can have Multiple APIsE.g., TCP/IP
- TCP/IP APIs include BSD sockets, Winsock, System
V streams, - The protocol provides interoperability programs
using different APIs can exchange information - I dont need to know remote users API
Application
Application
WinSock API
Berkeley Sockets API
TCP/IP Protocol Reliable byte streams
16An API can have Multiple ProtocolsE.g., Message
Passing Interface
- MPI provides portability any correct program
compiles runs on a platform - Does not provide interoperability all processes
must link against same SDK - E.g., MPICH and LAM versions of MPI
17Overview
- Introduction to Grids
- The opportunity
- Some definitions
- The Globus Toolkit Core Services
- Grid security infrastructure
- Resource management
- Information infrastructure
- Data management services
- Recap and conclusions
18Security Terminology
- Authentication
- Establishing identity (who are you?)
- Authorization
- Establishing permissions (what can you do?)
- Accounting
- What resources have you used?
- Certificate Authority (CA)
- Who says you are who you are?
19Why Grid Security is Hard
- Resources being used may be extremely valuable
the problems being solved extremely sensitive - Resources are often located in distinct
administrative domains - Each resource may have own policies procedures
- The set of resources used by a single computation
may be large, dynamic, and/or unpredictable - Not just client/server
- It must be broadly available applicable
- Standard, well-tested, well-understood protocols
- Integration with wide variety of tools
20Globus Security in a Nutshell
- Authentication based on Grid-wide credential
- X.509 certificates
- Single sign-on, delegation
- proxies
- Authorization
- Gridmap file to map certificate to local account
with defined permissions
21General Approach
- Define Grid security protocols APIs
- Protocol-mediated access to remote resources
- Integrate and extend existing standards
- On the Grid speak Grid protocols speak GSI
- Develop a reference implementation
- Open source Globus Toolkit
- Client and server SDKs, services, tools
- Grid-enable wide variety of tools
- FTP, SSH, Condor, Globus Toolkit, SRB, MPI, CVS,
- Learn through deployment and applications
22GSI Today
- GSI successfully addresses wide variety of Grid
security issues - Broad acceptance, deployment, integration with
tools - GSI adopted by 100s of sites, 1000s of users
- Globus CA has issued gt6000 certs (user host),
with gt1500 currently active - Other CAs now in existence
- NCSA, NPACI, NASA IPG, CERN/HEP
- Standardization on-going in Grid Forum, IETF
- For more information
- www.gridforum.org/security
- Grid Security Infrastructure (GSI) Roadmap
23Current and Future Work
- Ease of use
- CA operation, credential mgt, account mgt, proxy
refresh (with Condor) - Authorization
- Policy languages, community authorization
- Protection (despite compromised resources)
- Restricted delegation, smartcards
- Flexible communication support
- GSS-API extensions
- Independent Data Units (UDP, IP multicast)
24Overview
- Introduction to Grids
- The opportunity
- Some definitions
- The Globus Toolkit Core Services
- Grid security infrastructure
- Resource management
- Information infrastructure
- Data management services
- Recap and conclusions
25Resource Management Problem
- Enabling secure, controlled remote access to
computational resources and management of remote
computation - Authentication and authorization
- Resource discovery characterization
- Reservation and allocation
- Computation monitoring and control
- Addressed by new protocols services
- GRAM protocol as a basic building block
- Resource brokering co-allocation services
- GSI for security, MDS for discovery
26GRAM Components
MDS client API calls to locate resources
Client
MDS Grid Index Info Server
Site boundary
MDS client API calls to get resource info
1
GRAM client API calls to request resource
allocation and process creation.
MDS Grid Resource Info Server
Query current status of resource
GRAM client API state change callbacks
Grid Security Infrastructure
Local Resource Manager
4
5
Allocate create processes
6
Request
Job Manager
Create
7
2
Gatekeeper
Process
3
Parse
Monitor control
Process
RSL Library
Process
27GRAM Protocol
- Simple HTTP-based RPC
- Job request
- Returns a job contact Opaque string that can
be passed between clients, for access to job - Job cancel
- Job status
- Job signal
- Event notification (callbacks) for state changes
- Pending, active, done, failed, suspended
28Resource Specification Language
- Common notation for exchange of information
between components - Syntax similar to MDS/LDAP filters
- RSL provides two types of information
- Resource requirements Machine type, number of
nodes, memory, etc. - Job configuration Directory, executable, args,
environment - Globus Toolkit provides an API/SDK for
manipulating RSL
29Resource Specification Language
- Much of the power of GRAM is in the RSL
- Common language for specifying job requests
- A conjunction of (attributevalue) pairs
- GRAM understands a well defined set of attributes
30Some RSL Attributes For GRAM
- (executablestring)
- Program to run
- A file path (absolute or relative) or URL
- (directorystring)
- Directory in which to run (default is HOME)
- (argumentsarg1 arg2 arg3...)
- List of string arguments to program
- (environment(E1 v1)(E2 v2))
- List of environment variable name/value pairs
31Job Submission Interfaces
- Globus Toolkit includes several command line
programs for job submission - globus-job-run Interactive jobs
- globus-job-submit Batch/offline jobs
- globusrun Flexible scripting infrastructure
- Others are building better interfaces
- General purpose
- Condor-G, PBS, GRD, Hotpage, etc
- Application specific
- ECCE, Cactus, Web portals
32Globus Toolkit Implementation
- Gatekeeper
- Single point of entry
- Authenticates user, maps to local security
environment, runs service - In essence, a secure inetd
- Job manager
- A gatekeeper service
- Layers on top of local resource management system
(e.g., PBS, LSF, etc.) - Handles remote interaction with the job
33Overview
- Introduction to Grids
- The opportunity
- Some definitions
- The Globus Toolkit Core Services
- Grid security infrastructure
- Resource management
- Information infrastructure
- Data management services
- Recap and conclusions
34Grid Information Services
35Information Services Facts of Life
- Information is always old
- Time in flight, changing system state
- Need to provide quality metrics
- Distributed system state is hard to obtain
- Complexity of global snapshot
- Components will fail
- Scalability and overhead
- Many different usage scenarios
- Heterogeneous policy, different information,
organizations,
36Basic Grid Questions
- Resource Discovery
- What resources are relevant?
- Bootstraps selection process
- Resource Status Query
- How do resources compare (now)?
- Refines selection knowledge
- Resource Control
- Did I acquire the resources?
- Not an information service task
37Globus Information ServiceMonitoring and
Discovery Service (MDS)
- MDS includes
- Registration enquiry protocols
- Information models
- Provides or supports
- Standard interfaces to sensors
- Different directory structures
- Various discovery/access strategies
38MDS-2 Base Features
- Virtual organizations (VOs)
- Collab. between individuals and institutions
- Enable sharing, community wide goals
- Support community-specific discovery
- Dynamic in nature
- Scalability
- Many resources, people, VOs
- Independence-
- Resources, VOs shouldn't affect one another
- Graceful degradation of service
- Tolerate partitions, prune failures
39Information Service Approach
- Define basic classes of information service
- Resource description services
- Aggregate directory services
- Provide basic protocols for interoperability
- Resource inquiry protocol
- Resource registration protocol
40MDS-2 Architecture
Customized Aggregate Directories
Users
D
D
Inquiry Protocol
Registration Protocol
R
R
R
R
Standard Resource Description Services
41Two Types of Information Service
- Resource description services
- Supplies information about a specific resource
(GRIS Grid Resource Information Service) - Aggregate directory service
- Supplies collection of information gathered from
multiple description servers (GIIS Grid Index
Information Service) - Customized naming and indexing
- Support VO concept
42Two Classes of Protocols
- Grid resource inquiry protocol (GRIP)
- Used to query and respond to information requests
- Grid Resource Registration Protocol (GRRP)
- Softstate protocol used to notify the existence
of a service
43GRIP Resource Inquiry Protocol
- Obtain information about resource
- Define data model for information, request and
response formats - Request may be general query (search)
- Can use different protocols for resource
description and aggregate directory - Advantageous to have uniform protocol
- Take a subtree and use it as any other resource
description service
44GRRP Resource Registration Protocol
- Soft-state protocol
- Periodic notification
- Service/resource is available
- Granularity metadata
- Automatic extension
- Add new resources to directories
- Invite resource to join new directory
- Self-cleaning
- Reduce occurrence of dead references
45MDS-2 Implementation
- Grid Resource Information Service (GRIS)
- Provides resource description
- Modular content gateway
- Grid Index Information Service (GIIS)
- Provides aggregate directory
- Hierarchical groups of resources
- Lightweight Dir. Access Protocol (LDAP)
- Standard with many client implementations
- Used for GRIP (and GRRP currently)
46Stock MDS-2.1 GRIS Providers
- globus-version reports Globus software
- grid-info-host reports host OS info
- grid-info-host-interfaces reports host NICs
- grid-info-host-load reports host CPU status
- grid-info-host-filesystem reports host disk
status - globus-gram-reporter reports Globus job status
- Also information from Ganglia (cluster monitoring
software), GridFTP Server data, software install
data, and more
47Extensible GIIS Framework
- Modular registration actions
- 1) Re-use registration protocol decoding
- 2) Specialize directory update (e.g. prefetch
indexed data) - Modular query actions
- 1) Re-use query protocol decoding
- 2) Specialize query handling (e.g. utilize
precomputed indices) - Provide caching proxy as part of release
- Send a request to index, collect info and cache
it locally so next time a faster response
48Globus MDS-2
- Service scales with Grid growth
- Loose consistency model tolerates failures
- Interoperability by protocols
49More Information
- MDS-2
- Distributed information service
- HPDC 2001 Paper Grid Information Services for
Distributed Resource Sharing - MDS 2.2
- Refined protocols, security
- Fully extensible implementation
- http//www.globus.org/mds
-
50Overview
- Introduction to Grids
- The opportunity
- Some definitions
- The Globus Toolkit Core Services
- Grid security infrastructure
- Resource management
- Information infrastructure
- Data management services
- Recap and conclusions
51Data Management Services
- Data transfer and access
- GASS Provides services mainly intended for use
with GRAM (file staging, I/O redirection) - GridFTP Provides high-performance, reliable data
transfer for modern WANs - Higher Level Data Services (not today)
- Replica Location Service Provides a distributed
catalog service for keeping track of replicated
datasets (Joint work with EDG) - Replica Management Provides services for
creating and managing replicated datasets - Chimera Virtual Data Service keep track of
provenance of data sets
52GASSRemote I/O and Staging
- Tell GRAM to pull executable from remote location
- Access files from a remote location
- stdin/stdout/stderr from a remote location
53What is GASS?Global Access to Secondary Storage
- (a) GASS file access API
- Replace open/close with globus_gass_open/close
read/write calls can then proceed directly - (b) RSL extensions
- URLs used to name executables, stdout, stderr
- (c) Remote cache management utility
- (d) Low-level APIs for specialized behaviors
54Example GASS Applications
- On-demand, transparent loading of data sets
- Caching of (small) data sets
- Automatic staging of code and data to remote
supercomputers - GridFTP better suited to staging of large data
sets - (Near) real-time logging of application output to
remote server
55GASS summary
- Simple service for small file transfers
- User by Globus_run for automatic staging of code
and data to remote supercomputers - (Near) real-time logging of application output to
remote server - GridFTP better suited to staging of large data
sets
56GridFTP Basic Approach
- FTP is defined by several IETF RFCs
- Start with most commonly used subset
- Standard FTP get/put etc., 3rd-party transfer
- Implement standard but often unused features
- GSS binding, extended directory listing, simple
restart - Extend in various ways, while preserving
interoperability with existing servers - Parameter set/negotiate, parallel transfers
(multiple TCP streams), striped transfers
(multiple hosts), partial file transfers,
automatic manual TCP buffer setting, progress
monitoring, extended restart (via plug-ins)
57GridFTP Implementation Status
- Modified wu-ftpd server
- Most features
- Modified ncftp client
- Security, TCP buffer setting
- Modified HPSS Unitree ftpd server
- Security
- Globus Toolkit client and server SDKs, and
command line tools - Most features
- Prototype striped FTP server (aka DPSS2)
58GridFTP at SC2000 Long-Running Dallas-Chicago
Transfer
SciNet Power Failure
Other demos starting up (Congestion)
Parallelism Increases (Demos)
DNS Problems
Backbone problems on the SC Floor
Transition between files (not zero due to
averaging)
59A Model Architecture for Data Grids
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MDS
Selected Replica
Replica Selection
GridFTP commands
Performance Information Predictions
NWS
Disk Cache
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
60Data Management Summary
- Data transfer and access
- GASS Provides services mainly intended for use
with GRAM (file staging, I/O redirection) - GridFTP Provides high-performance, reliable data
transfer for modern WANs - Higher level replica and data tracking services
- Current and upcoming work
- Reliable file transfer
61The Globus Toolkit v2 in One Slide
- Grid protocols (GSI, GRAM, ) enable resource
sharing within virtual orgs toolkit provides
reference implementation ( Globus Toolkit
services)
- Protocols (and APIs) enable other tools and
services for membership, discovery, data mgmt,
workflow,
62Recap and Conclusions
63Globus ToolkitComponents Include
- Core protocols and services
- Grid Security Infrastructure
- Grid Resource Access Management
- MDS information monitoring
- GridFTP data access transfer
64The Grid World Current Status
- Dozens of major Grid projects in scientific
technical computing/research education - Considerable consensus on key concepts and
technologies - Open source Globus Toolkit a de facto standard
for major protocols services - Far from complete or perfect, but out there,
evolving rapidly, and large tool/user base - Industrial interest emerging rapidly
- Opportunity convergence of eScience and
eBusiness requirements technologies
65Globus Toolkit
- Globus Toolkit is the source of many of the
protocols described in Grid architecture - Adopted by almost all major Grid projects
worldwide as a source of infrastructure - Open source, open architecture framework
encourages community development - Active RD program continues to move technology
forward - Developers at ANL, USC/ISI, NCSA, LBNL, and other
institutions
www.globus.org
66Globus Toolkit 2 Evaluation ()
- Good technical solutions for key problems, e.g.
- Authentication and authorization
- Resource discovery and monitoring
- Reliable remote service invocation
- High-performance remote data access
- This good engineering is enabling progress
- Good quality reference implementation,
multi-language support, interfaces to many
systems, large user base, industrial support - Growing community code base built on tools
67Globus Toolkit 2 Evaluation (-)
- Protocol deficiencies, e.g.
- Heterogeneous basis HTTP, LDAP, FTP
- No standard means of invocation, notification,
error propagation, authorization, termination, - Significant missing functionality, e.g.
- Databases, sensors, instruments, workflow,
- Virtualization of end systems (hosting envs.)
- Little work on total system properties, e.g.
- Dependability, end-to-end QoS,
- Reasoning about system properties
68Globus Toolkit v3
- GT3 provides online negotiation of access to
services in a standard way, based on OGSA
specification for grid services - GT3 enables the creation of dynamic, extensible
systems - GT3 embraces state-of-the art protocols and
leverages community standards - GT3 is evolutionary, not revolutionary
- Not changing higher-level functionality, changing
protocols
69Acknowledgments
- Globus Project
- Ian Foster, Steve Tuecke _at_ ANL
- Carl Kesselman _at_ USC/ISI
- The talented team of scientists and engineers at
ANL, USC/ISI, elsewhere (see http//www.globus.org
) - Support from DOE, NASA, NSF, IBM, Microsoft
70Further Information
- My email Jennifer Schopf (jms_at_mcs.anl.gov)
- GT2
- General information at www.globus.org
- Technical discussion list discuss_at_globus.org
- Related Publications
- The Grid A New Infrastructure for 21st Century
Science - Anatomy of the Grid Foster, Kesselman, Tuecke
- Proposes an abstract architecture in which
intergrid protocols enable interoperability among
different grids - Physiology of the GridFoster, Kesselman, Nick,
Tuecke - Introduces the concept of an Open Grid Services
Architecture - Technical Papers
- www.globus.org/research/papers.html