Grid Computing with the Globus Toolkit 2.2 - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Grid Computing with the Globus Toolkit 2.2

Description:

TCP/IP APIs include BSD sockets, Winsock, System V streams, ... The protocol provides interoperability: programs using different APIs can exchange information ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 63
Provided by: jennife62
Category:

less

Transcript and Presenter's Notes

Title: Grid Computing with the Globus Toolkit 2.2


1
Grid Computing withthe Globus Toolkit 2.2
  • Jennifer Schopf
  • for the
  • The Globus ProjectArgonne National
    LaboratoryUSC Information Sciences Institute
  • www.globus.org

2
Overview
  • Introduction to Grids
  • Why Grids and Globus
  • Some definitions
  • The Globus Toolkit Core Services
  • Grid security infrastructure
  • Resource management
  • Information infrastructure
  • Data management services
  • Recap and conclusions

3
Grid Scenarios
  • A biochemist exploits 10,000 computers to screen
    100,000 compounds in an hour
  • 1,000 physicists worldwide pool resources for
    peta-op analyses of petabytes of data
  • An emergency response team couples real time
    data, weather model, population data
  • Engineers at a multinational company collaborate
    on the design of a new product
  • An application service provider offloads excess
    load to a compute cycle provider

4
The Grid Problem
  • Resource sharing coordinated problem solving
    in dynamic, multi-institutional settings

5
The Fundamental Concept
  • Enable communities (virtual organizations)
    to share geographically distributed resources as
    they pursue common goalsin the absence of
    central control, omniscience, trust relationships

6
Globus Toolkit
  • Globus Toolkit is the source of many of the
    protocols described in Grid architecture
  • Adopted by almost all major Grid projects
    worldwide as a source of infrastructure
  • Open source, open architecture framework
    encourages community development
  • Active RD program continues to move technology
    forward
  • Developers at ANL, USC/ISI, NCSA, LBNL, and other
    institutions

www.globus.org
7
The Globus Toolkit
  • Tools enabling resource sharing
  • GSI (Grid Security Infrastructure)
  • Authentication based on Grid-wide credential
  • Single sign-on, delegation
  • Authorization
  • GRAM (Grid Resource Allocation and Management)
  • Tool for remote job and resource management
  • MDS (Monitoring and Discovery Service)
  • Grid-wide information on the state of resources
  • Data Services
  • GASS
  • GridFTP
  • Replica Management
  • Protocols and APIs

8
Globus Applications and Deployments
  • Application projects include
  • GriPhyN, PPDG, NEES, EU DataGrid, ESG, Fusion
    Collaboratory, etc., etc.
  • Infrastructure deployments include
  • DISCOM, NASA IPG, NSF TeraGrid, DOE Science Grid,
    EU DataGrid, etc., etc.
  • UK Grid Center, U.S. GRIDS Center
  • Technology projects include
  • Data Grids, Access Grid, Portals, CORBA,
    MPICH-G2, Condor-G, GrADS, etc., etc.

9
Grid Communities ApplicationsData Grids for
High Energy Physics
Image courtesy Harvey Newman, Caltech
10
Globus Toolkit v2.2
  • GT2.0 released in April 2002
  • GridFTP replica management additions
  • Repackaged
  • GRAM reliability improvements
  • Numerous small changes and bug fixes
  • We follow the Linux versioning model
  • Even number releases are stable releases
  • Odd number releases are experimental
  • This talk covers v2.2

11
Some Important Definitions
  • Resource
  • Network protocol
  • Application Programmer Interface (API)

12
Resource
  • An entity that is to be shared
  • E.g., computers, storage, data, software
  • Does not have to be a physical entity
  • E.g., Condor pool, distributed file system,
  • Defined in terms of interfaces, not devices
  • E.g. scheduler such as LSF and PBS define a
    compute resource
  • Open/close/read/write define access to a
    distributed file system, e.g. NFS, AFS, DFS

13
Network Protocol
  • A formal description of message formats and a set
    of rules for message exchange
  • Rules may define sequence of message exchanges
  • Protocol may define state-change in endpoint,
    e.g., file system state change
  • Good protocols designed to do one thing
  • Protocols can be layered
  • Examples of protocols
  • IP, TCP, TLS (was SSL), HTTP, Kerberos

14
Application Programmer Interface
  • A specification for a set of routines to
    facilitate application development
  • Refers to definition, not implementation
  • E.g., there are many MPI implementations
  • Spec often language-specific (or IDL)
  • Routine name, number, order and type of
    arguments mapping to language constructs
  • Behavior or function of routine
  • Examples
  • GSS API (security), MPI (message passing)

15
A Protocol can have Multiple APIsE.g., TCP/IP
  • TCP/IP APIs include BSD sockets, Winsock, System
    V streams,
  • The protocol provides interoperability programs
    using different APIs can exchange information
  • I dont need to know remote users API

Application
Application
WinSock API
Berkeley Sockets API
TCP/IP Protocol Reliable byte streams
16
An API can have Multiple ProtocolsE.g., Message
Passing Interface
  • MPI provides portability any correct program
    compiles runs on a platform
  • Does not provide interoperability all processes
    must link against same SDK
  • E.g., MPICH and LAM versions of MPI

17
Overview
  • Introduction to Grids
  • The opportunity
  • Some definitions
  • The Globus Toolkit Core Services
  • Grid security infrastructure
  • Resource management
  • Information infrastructure
  • Data management services
  • Recap and conclusions

18
Security Terminology
  • Authentication
  • Establishing identity (who are you?)
  • Authorization
  • Establishing permissions (what can you do?)
  • Accounting
  • What resources have you used?
  • Certificate Authority (CA)
  • Who says you are who you are?

19
Why Grid Security is Hard
  • Resources being used may be extremely valuable
    the problems being solved extremely sensitive
  • Resources are often located in distinct
    administrative domains
  • Each resource may have own policies procedures
  • The set of resources used by a single computation
    may be large, dynamic, and/or unpredictable
  • Not just client/server
  • It must be broadly available applicable
  • Standard, well-tested, well-understood protocols
  • Integration with wide variety of tools

20
Globus Security in a Nutshell
  • Authentication based on Grid-wide credential
  • X.509 certificates
  • Single sign-on, delegation
  • proxies
  • Authorization
  • Gridmap file to map certificate to local account
    with defined permissions

21
General Approach
  • Define Grid security protocols APIs
  • Protocol-mediated access to remote resources
  • Integrate and extend existing standards
  • On the Grid speak Grid protocols speak GSI
  • Develop a reference implementation
  • Open source Globus Toolkit
  • Client and server SDKs, services, tools
  • Grid-enable wide variety of tools
  • FTP, SSH, Condor, Globus Toolkit, SRB, MPI, CVS,
  • Learn through deployment and applications

22
GSI Today
  • GSI successfully addresses wide variety of Grid
    security issues
  • Broad acceptance, deployment, integration with
    tools
  • GSI adopted by 100s of sites, 1000s of users
  • Globus CA has issued gt6000 certs (user host),
    with gt1500 currently active
  • Other CAs now in existence
  • NCSA, NPACI, NASA IPG, CERN/HEP
  • Standardization on-going in Grid Forum, IETF
  • For more information
  • www.gridforum.org/security
  • Grid Security Infrastructure (GSI) Roadmap

23
Current and Future Work
  • Ease of use
  • CA operation, credential mgt, account mgt, proxy
    refresh (with Condor)
  • Authorization
  • Policy languages, community authorization
  • Protection (despite compromised resources)
  • Restricted delegation, smartcards
  • Flexible communication support
  • GSS-API extensions
  • Independent Data Units (UDP, IP multicast)

24
Overview
  • Introduction to Grids
  • The opportunity
  • Some definitions
  • The Globus Toolkit Core Services
  • Grid security infrastructure
  • Resource management
  • Information infrastructure
  • Data management services
  • Recap and conclusions

25
Resource Management Problem
  • Enabling secure, controlled remote access to
    computational resources and management of remote
    computation
  • Authentication and authorization
  • Resource discovery characterization
  • Reservation and allocation
  • Computation monitoring and control
  • Addressed by new protocols services
  • GRAM protocol as a basic building block
  • Resource brokering co-allocation services
  • GSI for security, MDS for discovery

26
GRAM Components
MDS client API calls to locate resources
Client
MDS Grid Index Info Server
Site boundary
MDS client API calls to get resource info
1
GRAM client API calls to request resource
allocation and process creation.
MDS Grid Resource Info Server
Query current status of resource
GRAM client API state change callbacks
Grid Security Infrastructure
Local Resource Manager
4
5
Allocate create processes
6
Request
Job Manager
Create
7
2
Gatekeeper
Process
3
Parse
Monitor control
Process
RSL Library
Process
27
GRAM Protocol
  • Simple HTTP-based RPC
  • Job request
  • Returns a job contact Opaque string that can
    be passed between clients, for access to job
  • Job cancel
  • Job status
  • Job signal
  • Event notification (callbacks) for state changes
  • Pending, active, done, failed, suspended

28
Resource Specification Language
  • Common notation for exchange of information
    between components
  • Syntax similar to MDS/LDAP filters
  • RSL provides two types of information
  • Resource requirements Machine type, number of
    nodes, memory, etc.
  • Job configuration Directory, executable, args,
    environment
  • Globus Toolkit provides an API/SDK for
    manipulating RSL

29
Resource Specification Language
  • Much of the power of GRAM is in the RSL
  • Common language for specifying job requests
  • A conjunction of (attributevalue) pairs
  • GRAM understands a well defined set of attributes

30
Some RSL Attributes For GRAM
  • (executablestring)
  • Program to run
  • A file path (absolute or relative) or URL
  • (directorystring)
  • Directory in which to run (default is HOME)
  • (argumentsarg1 arg2 arg3...)
  • List of string arguments to program
  • (environment(E1 v1)(E2 v2))
  • List of environment variable name/value pairs

31
Job Submission Interfaces
  • Globus Toolkit includes several command line
    programs for job submission
  • globus-job-run Interactive jobs
  • globus-job-submit Batch/offline jobs
  • globusrun Flexible scripting infrastructure
  • Others are building better interfaces
  • General purpose
  • Condor-G, PBS, GRD, Hotpage, etc
  • Application specific
  • ECCE, Cactus, Web portals

32
Globus Toolkit Implementation
  • Gatekeeper
  • Single point of entry
  • Authenticates user, maps to local security
    environment, runs service
  • In essence, a secure inetd
  • Job manager
  • A gatekeeper service
  • Layers on top of local resource management system
    (e.g., PBS, LSF, etc.)
  • Handles remote interaction with the job

33
Overview
  • Introduction to Grids
  • The opportunity
  • Some definitions
  • The Globus Toolkit Core Services
  • Grid security infrastructure
  • Resource management
  • Information infrastructure
  • Data management services
  • Recap and conclusions

34
Grid Information Services
35
Information Services Facts of Life
  • Information is always old
  • Time in flight, changing system state
  • Need to provide quality metrics
  • Distributed system state is hard to obtain
  • Complexity of global snapshot
  • Components will fail
  • Scalability and overhead
  • Many different usage scenarios
  • Heterogeneous policy, different information,
    organizations,

36
Basic Grid Questions
  • Resource Discovery
  • What resources are relevant?
  • Bootstraps selection process
  • Resource Status Query
  • How do resources compare (now)?
  • Refines selection knowledge
  • Resource Control
  • Did I acquire the resources?
  • Not an information service task

37
Globus Information ServiceMonitoring and
Discovery Service (MDS)
  • MDS includes
  • Registration enquiry protocols
  • Information models
  • Provides or supports
  • Standard interfaces to sensors
  • Different directory structures
  • Various discovery/access strategies

38
MDS-2 Base Features
  • Virtual organizations (VOs)
  • Collab. between individuals and institutions
  • Enable sharing, community wide goals
  • Support community-specific discovery
  • Dynamic in nature
  • Scalability
  • Many resources, people, VOs
  • Independence-
  • Resources, VOs shouldn't affect one another
  • Graceful degradation of service
  • Tolerate partitions, prune failures

39
Information Service Approach
  • Define basic classes of information service
  • Resource description services
  • Aggregate directory services
  • Provide basic protocols for interoperability
  • Resource inquiry protocol
  • Resource registration protocol

40
MDS-2 Architecture
Customized Aggregate Directories
Users
D
D
Inquiry Protocol
Registration Protocol
R
R
R
R
Standard Resource Description Services
41
Two Types of Information Service
  • Resource description services
  • Supplies information about a specific resource
    (GRIS Grid Resource Information Service)
  • Aggregate directory service
  • Supplies collection of information gathered from
    multiple description servers (GIIS Grid Index
    Information Service)
  • Customized naming and indexing
  • Support VO concept

42
Two Classes of Protocols
  • Grid resource inquiry protocol (GRIP)
  • Used to query and respond to information requests
  • Grid Resource Registration Protocol (GRRP)
  • Softstate protocol used to notify the existence
    of a service

43
GRIP Resource Inquiry Protocol
  • Obtain information about resource
  • Define data model for information, request and
    response formats
  • Request may be general query (search)
  • Can use different protocols for resource
    description and aggregate directory
  • Advantageous to have uniform protocol
  • Take a subtree and use it as any other resource
    description service

44
GRRP Resource Registration Protocol
  • Soft-state protocol
  • Periodic notification
  • Service/resource is available
  • Granularity metadata
  • Automatic extension
  • Add new resources to directories
  • Invite resource to join new directory
  • Self-cleaning
  • Reduce occurrence of dead references

45
MDS-2 Implementation
  • Grid Resource Information Service (GRIS)
  • Provides resource description
  • Modular content gateway
  • Grid Index Information Service (GIIS)
  • Provides aggregate directory
  • Hierarchical groups of resources
  • Lightweight Dir. Access Protocol (LDAP)
  • Standard with many client implementations
  • Used for GRIP (and GRRP currently)

46
Stock MDS-2.1 GRIS Providers
  • globus-version reports Globus software
  • grid-info-host reports host OS info
  • grid-info-host-interfaces reports host NICs
  • grid-info-host-load reports host CPU status
  • grid-info-host-filesystem reports host disk
    status
  • globus-gram-reporter reports Globus job status
  • Also information from Ganglia (cluster monitoring
    software), GridFTP Server data, software install
    data, and more

47
Extensible GIIS Framework
  • Modular registration actions
  • 1) Re-use registration protocol decoding
  • 2) Specialize directory update (e.g. prefetch
    indexed data)
  • Modular query actions
  • 1) Re-use query protocol decoding
  • 2) Specialize query handling (e.g. utilize
    precomputed indices)
  • Provide caching proxy as part of release
  • Send a request to index, collect info and cache
    it locally so next time a faster response

48
Globus MDS-2
  • Service scales with Grid growth
  • Loose consistency model tolerates failures
  • Interoperability by protocols

49
More Information
  • MDS-2
  • Distributed information service
  • HPDC 2001 Paper Grid Information Services for
    Distributed Resource Sharing
  • MDS 2.2
  • Refined protocols, security
  • Fully extensible implementation
  • http//www.globus.org/mds
  •  

50
Overview
  • Introduction to Grids
  • The opportunity
  • Some definitions
  • The Globus Toolkit Core Services
  • Grid security infrastructure
  • Resource management
  • Information infrastructure
  • Data management services
  • Recap and conclusions

51
Data Management Services
  • Data transfer and access
  • GASS Provides services mainly intended for use
    with GRAM (file staging, I/O redirection)
  • GridFTP Provides high-performance, reliable data
    transfer for modern WANs
  • Higher Level Data Services (not today)
  • Replica Location Service Provides a distributed
    catalog service for keeping track of replicated
    datasets (Joint work with EDG)
  • Replica Management Provides services for
    creating and managing replicated datasets
  • Chimera Virtual Data Service keep track of
    provenance of data sets

52
GASSRemote I/O and Staging
  • Tell GRAM to pull executable from remote location
  • Access files from a remote location
  • stdin/stdout/stderr from a remote location

53
What is GASS?Global Access to Secondary Storage
  • (a) GASS file access API
  • Replace open/close with globus_gass_open/close
    read/write calls can then proceed directly
  • (b) RSL extensions
  • URLs used to name executables, stdout, stderr
  • (c) Remote cache management utility
  • (d) Low-level APIs for specialized behaviors

54
Example GASS Applications
  • On-demand, transparent loading of data sets
  • Caching of (small) data sets
  • Automatic staging of code and data to remote
    supercomputers
  • GridFTP better suited to staging of large data
    sets
  • (Near) real-time logging of application output to
    remote server

55
GASS summary
  • Simple service for small file transfers
  • User by Globus_run for automatic staging of code
    and data to remote supercomputers
  • (Near) real-time logging of application output to
    remote server
  • GridFTP better suited to staging of large data
    sets

56
GridFTP Basic Approach
  • FTP is defined by several IETF RFCs
  • Start with most commonly used subset
  • Standard FTP get/put etc., 3rd-party transfer
  • Implement standard but often unused features
  • GSS binding, extended directory listing, simple
    restart
  • Extend in various ways, while preserving
    interoperability with existing servers
  • Parameter set/negotiate, parallel transfers
    (multiple TCP streams), striped transfers
    (multiple hosts), partial file transfers,
    automatic manual TCP buffer setting, progress
    monitoring, extended restart (via plug-ins)

57
GridFTP Implementation Status
  • Modified wu-ftpd server
  • Most features
  • Modified ncftp client
  • Security, TCP buffer setting
  • Modified HPSS Unitree ftpd server
  • Security
  • Globus Toolkit client and server SDKs, and
    command line tools
  • Most features
  • Prototype striped FTP server (aka DPSS2)

58
GridFTP at SC2000 Long-Running Dallas-Chicago
Transfer
SciNet Power Failure
Other demos starting up (Congestion)
Parallelism Increases (Demos)
DNS Problems
Backbone problems on the SC Floor
Transition between files (not zero due to
averaging)
59
A Model Architecture for Data Grids
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MDS
Selected Replica
Replica Selection
GridFTP commands
Performance Information Predictions
NWS
Disk Cache
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
60
Data Management Summary
  • Data transfer and access
  • GASS Provides services mainly intended for use
    with GRAM (file staging, I/O redirection)
  • GridFTP Provides high-performance, reliable data
    transfer for modern WANs
  • Higher level replica and data tracking services
  • Current and upcoming work
  • Reliable file transfer

61
The Globus Toolkit v2 in One Slide
  • Grid protocols (GSI, GRAM, ) enable resource
    sharing within virtual orgs toolkit provides
    reference implementation ( Globus Toolkit
    services)
  • Protocols (and APIs) enable other tools and
    services for membership, discovery, data mgmt,
    workflow,

62
Recap and Conclusions
63
Globus ToolkitComponents Include
  • Core protocols and services
  • Grid Security Infrastructure
  • Grid Resource Access Management
  • MDS information monitoring
  • GridFTP data access transfer

64
The Grid World Current Status
  • Dozens of major Grid projects in scientific
    technical computing/research education
  • Considerable consensus on key concepts and
    technologies
  • Open source Globus Toolkit a de facto standard
    for major protocols services
  • Far from complete or perfect, but out there,
    evolving rapidly, and large tool/user base
  • Industrial interest emerging rapidly
  • Opportunity convergence of eScience and
    eBusiness requirements technologies

65
Globus Toolkit
  • Globus Toolkit is the source of many of the
    protocols described in Grid architecture
  • Adopted by almost all major Grid projects
    worldwide as a source of infrastructure
  • Open source, open architecture framework
    encourages community development
  • Active RD program continues to move technology
    forward
  • Developers at ANL, USC/ISI, NCSA, LBNL, and other
    institutions

www.globus.org
66
Globus Toolkit 2 Evaluation ()
  • Good technical solutions for key problems, e.g.
  • Authentication and authorization
  • Resource discovery and monitoring
  • Reliable remote service invocation
  • High-performance remote data access
  • This good engineering is enabling progress
  • Good quality reference implementation,
    multi-language support, interfaces to many
    systems, large user base, industrial support
  • Growing community code base built on tools

67
Globus Toolkit 2 Evaluation (-)
  • Protocol deficiencies, e.g.
  • Heterogeneous basis HTTP, LDAP, FTP
  • No standard means of invocation, notification,
    error propagation, authorization, termination,
  • Significant missing functionality, e.g.
  • Databases, sensors, instruments, workflow,
  • Virtualization of end systems (hosting envs.)
  • Little work on total system properties, e.g.
  • Dependability, end-to-end QoS,
  • Reasoning about system properties

68
Globus Toolkit v3
  • GT3 provides online negotiation of access to
    services in a standard way, based on OGSA
    specification for grid services
  • GT3 enables the creation of dynamic, extensible
    systems
  • GT3 embraces state-of-the art protocols and
    leverages community standards
  • GT3 is evolutionary, not revolutionary
  • Not changing higher-level functionality, changing
    protocols

69
Acknowledgments
  • Globus Project
  • Ian Foster, Steve Tuecke _at_ ANL
  • Carl Kesselman _at_ USC/ISI
  • The talented team of scientists and engineers at
    ANL, USC/ISI, elsewhere (see http//www.globus.org
    )
  • Support from DOE, NASA, NSF, IBM, Microsoft

70
Further Information
  • My email Jennifer Schopf (jms_at_mcs.anl.gov)
  • GT2
  • General information at www.globus.org
  • Technical discussion list discuss_at_globus.org
  • Related Publications
  • The Grid A New Infrastructure for 21st Century
    Science
  • Anatomy of the Grid Foster, Kesselman, Tuecke
  • Proposes an abstract architecture in which
    intergrid protocols enable interoperability among
    different grids
  • Physiology of the GridFoster, Kesselman, Nick,
    Tuecke
  • Introduces the concept of an Open Grid Services
    Architecture
  • Technical Papers
  • www.globus.org/research/papers.html
Write a Comment
User Comments (0)
About PowerShow.com