Title: Grid Technologies Research and Development
1Grid TechnologiesResearchand Development
- Ian Foster
- Argonne National Laboratory
- The University of Chicago
2Credits
- Globus project Co-PI Carl Kesselman, USC
- Globus resarchers and developers at ANL, USC/ISI,
NCSA, and elsewhere - Steve Tuecke, Randy Butler, Steve Fitzgerald,
Brian Toonen, Gregor von Laszewski, and many
others - Research supported by DARPA, DOE, NSF, NASA
equipment from Cisco Systems
3Grid Services ArchitectureAn Emerging Grid
Computing Framework
a rich variety of applications ...
Applns
Appln Toolkits
Remote data toolkit
Remote sensors toolkit
Async. collab. toolkit
Remote viz toolkit
Remote comp. toolkit
...
Protocols, authentication, policy, resource
management, instrumentation, discovery, etc.,
etc.
Grid Services
Grid Fabric
Archives, networks, computers, display devices,
etc. associated local services
4Overview
- Why Grid Services?
- Review of existing Grid services
- Security
- Information/directory
- Resource management
- Data access
- Our current research focus areas
- Grid Forum and b-Grid project
5Creating a Usable Grid Grid Services
(Middleware)
- Standard grid services that
- Provide uniform, high-level access to a wide
range of resources (including networks) - Address interdomain issues of security, policy,
etc. - Permit application-level management and
monitoring of end-to-end performance - Middleware-level and higher-level APIs and tools
targeted at application programmers - Map between application and Grid
6The Challenge of Heterogeneity
- Group
- Institutions, people policies
- Resources
- Hardware computers, archives, networks, ...
- Interface
- Software, mechanisms
- Distance
- Local, campus, metropolitan, wide area
- Scale
- Single CPU, cluster, supercomputer, ...
7Grid Services Approach
- Define and deploy standard Grid services that
encapsulate heterogeneity - Simple Cost of joining Grid is low
- Noncoercive Sites retain local control
- Uniform Cost of using Grid is low
- Use a Grid information service to represent
structure and status of Grid elements - Resource discovery
- Application configuration and optimization
- Build Grid-enabled tools to enable applications
8Grid Services
- Security authentication, authorization
- Information publication, delivery
- Resource management reservation, allocation,
monitoring, control - Data data access, replica management, metadata
access - fault detection, executable management,
accounting, others
9Grid Services (1)Grid Security Infrastructure
- Define uniform authentication and authorization
mechanisms that allow cooperating sites to accept
credentials while retaining local control - Benefit Only one A/A infrastructure needs to be
maintained at each site enables inter-site
resource sharing interoperability - Requires
- Authentication/authorization standards
- Certification authority policies
10Authentication
- Grid Security Infrastructure
- Single sign-on via global credential, PKI
mechanisms, mapping to local credentials - Delegation
- No plaintext passwords
- Retains local control over policy
- Deployed across PACI and NASA sites
- GSS-API binding, used by ssh, SecureCRT, gsiftp,
Globus, Condor, others - GAA (Generic Authorization Access Control)
interface provides hooks for policy
11Security Architecture
Protocol 1 user proxy creation
Host
User Proxy
User
Crp
Protocol 2 resource allocation
Cp
Protocol 3 Resource allocation from a process
12Grid Services (2)Grid Information Service
- Effective resource use predicated on knowledge of
system components - Publish structure and state info, dynamic
performance info, software info, etc., etc. - Selection and scheduling of resources
- Resource discovery find me an X with property Y
available at time T - Auto-configuration tell me what I need to know
to use A efficiently/securely/... - Gateways to other data sources required
13Information ServicesTechnical Approaches
- Infrastructure based on common protocols
- LDAP as unifying communication protocol
- Gateways to alternative information sources and
organization - Research questions include
- Unifying metadata representation
- How to support range of access modes
- Scalability of collection and publication methods
- Index methods and discovery
14Distributed Information Services
RootServers
ReferralServer
Replicated servers
mds.globus.org389
NCSA
NASA
DOE
NPACI
Remos
NWS
SNMP
Organization Servers
Index Server(s)
15(No Transcript)
16Grid Services (3)Resource Management
- Issues include
- Locating and selecting resources
- Allocating resources
- Authentication, process creation
- Other activities required to prepare a resource
for use monitoring, control - End-to-end management/co-allocation
- Diverse resources CPU, disk, network
- Reservation
17Resource Management Services
- Globus Resource Allocation Manager (GRAM)
- Uniform interface to resource management
- Integration with security, policy
- Co-allocation services
- Coordinated allocation across multiple resources
- Globus Arch. for Reservation and Allocation
- Network and CPU quality of service
- Immediate and advance reservations
- Resource brokers e.g., Condor
18Resource Management Architecture
Info service location selection
Metacomputing Directory Service
Resource Broker
What computers? What speed? When available?
20 Mb/sec
GRAM
Globus Resource Allocation Managers
50 processors storage from 1020 to 1040 pm
Fork LSF EASYLL Condor etc.
19Local Resource Management
MDS client API calls to locate resources
GRAM Client
MDS
Update MDS with resource state information
GRAM client API calls to request resource
allocation and process creation.
Site boundary
GramReporter
Query current status of resource
Gatekeeper
Local Resource Manager
Create
Authentication
Allocate create processes
Request
Job Manager
Globus Security Infrastructure
Process
Parse
Monitor control
Process
RSL Library
Process
20Advanced Resource Management
- Provide end-to-end Quality of Service to
applications. This requires - Discovery and selection of resources
- Allocation of resources
- Advance reservation of resources
Workstation
Workstation
Supercomputer
Supercomputer
Workstation
Workstation
21GARA and Differentiated Services
Server
Client GARA API
Diffserv Resource Manager
Diffserv Resource Manager
22Scheduling Bulk Transferand High-Priority
Transfers
23Integrated Policy Management
- Required to control reservation and scheduling
- Determine who can to what to whom
- Integral part of resource management
- Resource ?application, application?resource
- Next step after authentication
- Need to integrate with and augment existing
approaches - Access control lists, capabilities, usage
certificates
24Policy Technical Approaches
- Single API to alternative mechanisms
- Similar to security infrastructure
- Integration with Globus security model and Globus
resource management components - Basic policy mechanism in current system
- Research questions
- Reusable policy structures for resource
specification/management - Policy aware resource discovery/scheduling
25Grid Services (4)Storage and I/O Services
- Access to remote data (GASS)
- Uniform access to diverse storage management
systems - Cache management
- High-speed, secure transport gsiftp
- Integration with metadata storage systems
- Communication (Nexus, GlobusIO)
- Application-level interfaces to comm services
- Multiple methods reliable/unreliable, IP/other,
unicast/multicast - Quality of service interfaces
26(No Transcript)
27Current Technology Focus Areas
- Advanced resource management techniques
- GARA Globus Arch. for Resv. Allocation
- High-end data-intensive applications
- Data Grid
- Interfaces to commodity technologies
- CoG Kit Commodity Grid Toolkits
- Distance visualization
- NOVA Network Optimized Visualization Arch.
- With supporting work on info/instr., policy,
accounting, authentication/authorization, etc.
28The Grid Forumhttp//www.gridforum.org
- IETF-like community forum for discussion
definition of Grid infrastructure - First two meetings (June 16-18, Oct 18-20)
attracted 150 people - 9 working groups established in security,
information infrastructure, resource management,
accounting, etc. - Next mtg San Diego March 22-24 2000
- See also European Grid Forum
- www.egrid.org
29b-Grid(Broadband Experimental Terascale Access)
- A proposal to NSF to plan ( build) a national
infrastructure for computer systems research - dedicated to research
- of a scale that permits realistic experimentation
- of a scale that encourages participation by
adventurous applications groups - a place for computer and application scientists
to tackle problems together - Initial plan is for O(20) Linux clusters, each
with O(30) nodes, O(2 TB) disk, Gb/s network - http//dsl.cs.uchicago.edu/beta
30Summary Where We Are
- Solid technology base for security, resource
management, information services - Globus v1.1 completed, with all core services
complete, robust, and documented - Many tool projects are leveraging this
considerable investment in infrastructure - Substantial deployment activities and application
experiments - New RD in commodity grids, resource management,
distance viz, data grids - http//www.globus.org
31Case Study 1Online Instrumentation
Advanced Photon Source
wide-area dissemination
desktop VR clients with shared controls
real-time collection
archival storage
tomographic reconstruction
DOE X-ray source grand challenge ANL, USC/ISI,
NIST, U.Chicago
32Case Study 2Distributed Supercomputing
- Starting point SF-Express parallel simulation
code - Globus mechanisms for
- Resource allocation
- Distributed startup
- I/O and configuration
- Fault detection
- 100K vehicles (2002 goal) using 13 computers,
1386 nodes, 9 sites
NCSA Origin
Caltech Exemplar
CEWES SP
Maui SP
SF-Express Distributed Interactive Simulation
Caltech, USC/ISI
33OVERFLOW with latency-tolerant algorithms
MPICH-G Grid-enabled message passing
Globus services
Security
Directory
Scheduling
Process mgmt
Communication
ARC SGI O2000 (California)
Argonne SGI O2000 (Illinois)
OVERFLOW simulation NASA Ames
34Case Study 3Collaborative Engineering
- Manipulate shared virtual space, with
- Simulation components
- Multiple flows Control, Text, Video, Audio,
Database, Simulation, Tracking, Haptics,
Rendering - Uses Globus comms (un)reliable uni/multicast
- Future Security, QoS, allocation, reservation
CAVERNsoft UIC Electronic Visualization
Laboratory
35Case Study 4High-Throughput Computing
- Schedule many independent tasks (e.g., parameter
study) - Uses Globus security, discovery, data access,
scheduling - Future Reservation, accounting, code management,
etc.
Deadline
Cost
Available Machines
Nimrod-G Monash University
36Case Study 5Problem Solving Environment
- Problem solving environment for comp. chemistry
- Globus services used for authentication, remote
job submission, monitoring, and control - Future distributed data archive, resource
discovery, charging
ECCE Pacific Northwest National Laboratory