Title: Globus and PPDG
1Globus and PPDG
- Jennifer Schopf
- Argonne National Lab
2Outline
- Overview of what Globus is about
- Whats is new
- New users (many!), Packaging, Condor-G, GridFTP,
Replica catalog, Replica management, - Review of PPDG-related activities
- Future plans
- CAS, Gram-2
3The Globus Project in a Nutshell
- Core group at ANL, UC, USC/ISI
- Close collaborators who contribute code
- Condor _at_ UW, LBNL, NCSA
- Large community who submit ports, bug fixes,
etc.and use the code - Support DARPA, DOE, NASA, NSF, Microsoft
- We do
- Research (Design, prototype, evaluate new Grid
tech) - Development (Protocols, services, APIs, SDKs,
tools) - Deployment (Work with groups to deploy Grid
infrastructure) - Applications (The driver for all of the above)
4Layered Grid Architecture(By Analogy to Internet
Architecture)
5Grid Services ArchitectureFabric Layer
Protocols Services
- Just what you would expect the diverse mix of
resources that may be shared - Individual computers, Condor pools, file systems,
archives, metadata catalogs, networks, sensors,
etc., etc. - Few constraints on low-level technology
connectivity and resource level protocols form
the neck in the hourglass - Defined by interfaces not physical characteristics
6Grid Services ArchitectureConnectivity Layer
Protocols Services
- Communication
- Internet protocols IP, DNS, routing, etc.
- Security Grid Security Infrastructure (GSI)
- Uniform authentication authorization mechanisms
in multi-institutional setting - Single sign-on, delegation, identity mapping
- Public key technology, SSL, X.509, GSS-API
- Supporting infrastructure Certificate
Authorities, key management, etc.
GSI www.globus.org
7Grid Services ArchitectureResource Layer
Protocols Services
- Grid Resource Access and Mgmt (GRAM)
- Remote allocation, reservation, monitoring,
control of compute resources - GridFTP protocol (FTP extensions)
- High-performance data access transport
- Grid Resource Information Service (GRIS)
- Access to structure state information
- Network reservation, monitoring, control
- All integrated with GSI authentication,
authorization, policy, delegation
8Grid Services ArchitectureCollective Layer
Protocols Services
- Index servers aka metadirectory services
- Custom views on dynamic resource collections
assembled by a community - Resource brokers (e.g., Condor Matchmaker)
- Resource discovery and allocation
- Replica catalogs
- Co-reservation and co-allocation services
- Etc., etc.
9Globus Toolkit Components
- Core protocols and services
- Grid Security Infrastructure
- Grid Resource Access Management
- MDS information monitoring
- GridFTP data access transfer
- Other services
- Community Authorization Service
- DUROC co-allocation service
- Other Data Grid technologies
- Replica catalog, replica management service
10Whats New
- Lots of new users
- Condor-G (next talk)
- Packaging
- GridFTP
- Replica catalog
- Replica manager
11Globus Applications and Deployments
- Application projects include
- GriPhyN, PPDG, NEES, EU DataGrid, ESG, Fusion
Collaboratory, etc., etc. - Infrastructure deployments include
- DISCOM, NASA IPG, NSF TeraGrid, DOE Science Grid,
EU DataGrid, etc., etc. - UK Grid Center, U.S. GRIDS Center
- Technology projects include
- Data Grids, Access Grid, Portals, CORBA,
MPICH-G2, Condor-G, GrADS, etc., etc.
12Packaging Work
- Enables modular binary and source distributions
- NCSA (Bletzinger, Blau) has developed Grid
Packaging Toolkit (GPT) for packaging Grid
software. - Set of scripts and Perl modules which manage
packaging metadata - Includes scripts that package software
- A set of files metadata that allow the files to
be managed as a unit - Metadata stored in ASCII files encoded via XML
- ANL and ISI have worked closely with NCSA on
applying GPT to the Globus Toolkit. - A (very large) set of Globus Toolkit packages
- Convenience scripts for building and installing
- Alpha 1, 2, 3 users have tested, provided
feedback.
13GridFTP Basic Approach
- FTP is defined by several IETF RFCs
- Start with most commonly used subset
- Standard FTP get/put etc., 3rd-party transfer
- Implement standard but often unused features
- GSS binding, extended directory listing, simple
restart - Extend in various ways, while preserving
interoperability with existing servers - Parameter set/negotiate, parallel transfers
(multiple TCP streams), striped transfers
(multiple hosts), partial file transfers,
automatic manual TCP buffer setting, progress
monitoring, extended restart (via plug-ins)
14Our Approach to Replica Management
- Identify replica cataloging and reliable
replication as two fundamental services - Layer on other Grid services GSI, transport,
information service - Use LDAP as catalog format and protocol, for
consistency - Use as a building block for other tools
- Advantage
- These services can be used in a wide variety of
situations
15Replica Manager Components
- Replica catalog definition
- LDAP object classes for representing
logical-to-physical mappings in an LDAP catalog - Low-level replica catalog API
- globus_replica_catalog library
- Manipulates replica catalog add, delete, etc.
- High-level reliable replication API
- globus_replica_manager library
- Combines calls to file transfer operations and
calls to low-level API functions create,
destroy, etc.
16Activities in Progress with PPDG
- GDMP knowledge building
- Replica Catalog Performance Analysis
- SC2001 Demo Coordination
- PPDG GriPhyN Coordination
17GDMP/Globus Coordination
- Studying
- How to support GDMP deployment in ATLAS testgrid,
and RC integration - How to integrate with demos and PPDG deployments
- How to integrate into DGRA
- How to integrate with reliable file transfer
- Now running on two test machines at ANL.
18Replica Catalog Performance Analysis
- Measuring performance at large capacities
- Measuring OpenLDAP, Netscape 4.13, Oracle ODS
- Will measure direct relational mapping
- Initial results show problems with current schema
- Will examine remedies in LDAP SQL
19SC2001 Demo Coordination
- Help in coordinating sites
- Adding resources from ANL MCS
- Adding visualization facilities
- Adding Grid Info Services
- Coordinating themes and messages between PPDG and
GriPhyN
20PPDG GriPhyN Coordination
- Focal point is replica catalog work
- Joint effort to understand and catalog data
- Define data file types, quantities, usage
- Need recording process of data production to
facilitate data reproduction - Coordination of monitoring, resource management
work - Coordination of research and production
integration - get virtual data on PPDG horizon
21Globus Toolkit Status
- New release coming soon
- In alpha3 now, beta this month, 4Q01 release
- New packaging enables modular binary and source
distributions - GRAM 1.5 enhanced robustness
- MDS-2.1 security, better performance, etc.
- GridFTP Replica Mgt Data management
- GSI extensions, Community Authorization Svc
- Prototype now, 4Q01 alpha, 1Q02 release.
- Java, other Commodity Grid toolkits
22Data Services to Come
- More flexible catalog
- Arbitrary file mapping
- Hierarchical catalog
- Reliable File Transfer
- Performance
- Ease of installation and use
- Data replica selection
- Automatic data propagation
23Replica Catalog Servicesas Building Blocks
Examples
- Combine with information service to build replica
selection services - E.g. find best replica using performance info
from NWS and MDS - Use of LDAP as common protocol for info and
replica services makes this easier - Combine with application managers to build data
distribution services - E.g., build new replicas in response to frequent
accesses
24GSI FuturesCommunity Authorization
1. CAS request, with
user/group
CAS
resource names
membership
Does the
and operations
collective policy
resource/collective
authorize this
2. CAS reply, with
membership
request for this
capability
and resource CA info
user?
collective policy
information
User
Resource
3. Resource request,
authenticated with
Is this request
capability
authorized by
the
local policy
capability?
information
4. Resource reply
Is this request
authorized for
the CAS?
25Globus Current Research Includes
- Resource management
- GRAM-2, Reservation, co-allocation, policy, etc.
- Security
- (Restricted) delegation, CAS, PKI usability
(online credential repositories, etc.) - Data Grid technologies
- Replication, robust/fast transport, etc.
- Tools
- Workflow, resource discovery selection
26More Information
- Globus
- http//www.globus.org
- Globus data grid work in general
- http//www.globus.org/datagrid/
- Interfacing GDMP and Globus
- http//www.globus.org/research/papers/gdmp_hpdc_fi
nal_version.pdf - Getting started with the Globus Replica Cat
- http//www.globus.org/datagrid/deliverables/replic
aGettingStarted.pdf