Title: Grid Computing and the Globus Toolkit
1Grid Computing and the Globus Toolkit
- Jennifer M. Schopf
- Argonne National Lab
- National eScience Centre
- http//www.mcs.anl.gov/jms/Talks/
2What is a Grid?
- Resource sharing
- Computers, storage, sensors, networks,
- Sharing always conditional issues of trust,
policy, negotiation, payment, - Coordinated problem solving
- Beyond client-server distributed data analysis,
computation, collaboration, - Dynamic, multi-institutional virtual orgs
- Community overlays on classic org structures
- Large or small, static or dynamic
3Why Is this Hard or Different?
- Lack of central control
- Where things run
- When they run
- Shared resources
- Contention, variability
- Communication
- Different sites implies different sys admins,
users, institutional goals, and often strong
personalities
4So Why Do It?
- Computations that need to be done with a time
limit - Data that cant fit on one site
- Data owned by multiple sites
- Applications that need to be run bigger, faster,
more
5What Kinds of Applications?
- Computation intensive
- Interactive simulation (climate modeling)
- Large-scale simulation and analysis (galaxy
formation, gravity waves, event simulation) - Engineering (parameter studies, linked models)
- Data intensive
- Experimental data analysis (e.g., physics)
- Image sensor analysis (astronomy, climate)
- Distributed collaboration
- Online instrumentation (microscopes, x-ray)
Remote visualization (climate studies, biology) - Engineering (large-scale structural testing)
6Key Common Feature
- The size and/or complexity of the problem
requires that people in several organizations
collaborate and share computing resources, data,
instruments
7The Globus Approach
8The Role of the Globus Toolkit
- A collection of solutions to problems that come
up frequently when building collaborative
distributed applications - Heterogeneity
- A focus, in particular, on overcoming
heterogeneity for application developers - Standards
- We capitalize on and encourage use of existing
standards (IETF, W3C, OASIS, GGF) - GT also includes reference implementations of
new/proposed standards in these organizations
9Globus is an Hour Glass
Higher-Level Services and Users
- Local sites have an their own policies, installs
heterogeneity! - Queuing systems, monitors, network protocols, etc
- Globus unifies
- Build on Web services
- Use WS-RF, WS-Notification to represent/access
state - Common management abstractions interfaces
Standard GT4 Interfaces
Local heterogeneity
10On April 29, 2005 the Globus Alliance
releasedthe finest version of the Globus Toolkit
to date!
Dont take our word for it! Read the UK eScience
Evaluation of GT4 www.nesc.ac.uk/technical_papers/
UKeS-2005-03.pdf (Reachable from www.globus.org,
under News)
11How it Really Happens
- Implementations are provided by a mix of
- Application-specific code
- Off the shelf tools and services
- Tools and services from the Globus Toolkit
- Tools and services from the Grid community
(compatible with GT) - Glued together by
- Application development
- System integration
12A Typical eScience Use of GlobusNetwork for
Earthquake Eng. Simulation
Links instruments, data, computers, people
13Without the Globus Toolkit
ComputeServer
A
SimulationTool
ComputeServer
B
WebBrowser
WebPortal
RegistrationService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
C
ChatTool
DataCatalog
Database service
D
CredentialRepository
Database service
E
Certificate authority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
14With the Globus Toolkit
ComputeServer
GlobusGRAM
SimulationTool
ComputeServer
GlobusGRAM
WebBrowser
CHEF
Globus IndexService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
GlobusDAI
CHEF ChatTeamlet
GlobusMCS/RLS
Database service
GlobusDAI
MyProxy
Database service
GlobusDAI
CertificateAuthority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
15Globus is Grid Infrastructure
- Software for Grid infrastructure
- Service enable new existing resources
- E.g., GRAM on computer, GridFTP on storage
system, custom application service - Uniform abstractions mechanisms
- Tools to build applications that exploit Grid
infrastructure - Registries, security, data management,
- Open source open standards
- Each empowers the other
- Enabler of a rich tool service ecosystem
16The Globus ToolkitStandard Plumbing for the
Grid
- Not turnkey solutions, but building blocks
tools for application developers system
integrators - Some components (e.g., file transfer) go farther
than others (e.g., remote job submission) toward
end-user relevance - Easier to reuse than to reinvent
- Compatibility with other Grid systems comes for
free - Today the majority of the GT public interfaces
are usable by application developers and system
integrators - Relatively few end-user interfaces
- In general, not intended for direct use by end
users (scientists, engineers, marketing
specialists)
17Globus is a Building Block
- Basic components for grid functionality
- Highest-level services are often application
specific, we let applications concentrate there - Easier to reuse than to reinvent
- Compatibility with other Grid systems comes for
free - We provide basic infrastructure to get you one
step closer
18Standards
19Leveraging Existingand Proposed Standards
- WSRF and WS-N (GGF, OASIS)
- WS-Agreement, WSDL 2.0, WSDM
- GridFTP v1.0 (GGF)
- OGSI v1.0 (GGF)
- SSL/TLS v1 (from OpenSSL) (IETF)
- X.509 Proxy Certificates (IETF)
- SAML, XACML
20GT Protocols
- Web service protocols
- WSDL, SOAP
- WS Addressing, WSRF, WSN
- WS Security, SAML, XACML
- WS-Interoperability profile
- Non Web service protocols
- Standards-based, such as GridFTP
- Custom
21WSRF WS-Notification
- Naming and bindings (basis for virtualization)
- Every resource can be uniquely referenced, and
has one or more associated services for
interacting with it - Lifecycle (basis for fault resilient state
management) - Resources created by services following factory
pattern - Resources destroyed immediately or scheduled
- Information model (basis for monitoring
discovery) - Resource properties associated with resources
- Operations for querying and setting this info
- Asynchronous notification of changes to
properties - Service Groups (basis for registries collective
svcs) - Group membership rules membership management
- Base Fault type
22WS Core Enables FrameworksE.g., Resource
Management
Applications of the framework(Compute, network,
storage provisioning,job reservation
submission, data management,application service
QoS, )
WS-Agreement(Agreement negotiation)
WS Distributed Management(Lifecycle, monitoring,
)
WS-Resource Framework WS-Notification
() (Resource identity, lifetime, inspection,
subscription, )
Web services(WSDL, SOAP, WS-Security,
WS-ReliableMessaging, )
An evolution of Open Grid Services
Infrastructure (OGSI)
23WSRF vs XML/SOAP
- The definition of WSRF means that the Grid and
Web services communities can move forward on a
common base - Why Not Just Use XML/SOAP?
- WSRF and WS-N are just XML and SOAP
- WSRF and WS-N are just Web services
- Benefits of following the specs
- These patterns represent best practices that have
been learned in many Grid applications - There is a community behind them
- Why reinvent the wheel?
- Standards facilitate interoperability
24Globus is a Tool
- A Grid development environment
- Develop new OGSA-compliant Web Services
- Develop applications using Java or C/C Grid
APIs - Secure applications using basic security
mechanisms - A set of basic Grid functionality
- Services and clients
- Libraries
- Development tools and examples
- The prerequisites for many Grid community tools
25GT Domain Areas
- Core runtime
- Infrastructure for building new services
- Security
- Apply uniform policy across distinct systems
- Execution management
- Provision, deploy, manage services
- Data management
- Discover, transfer, access large data
- Monitoring
- Discover monitor dynamic services
26Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
27Our Goals for GT4
- Usability, reliability, scalability,
- Web service components have quality equal or
superior to pre-WS components - Documentation at acceptable quality level
- Consistency with latest standards (WS-, WSRF,
WS-N, etc.) and Apache platform - WS-I Basic Profile compliant
- WS-I Basic Security Profile compliant
- New components, platforms, languages
- And links to larger Globus ecosystem
28WSRF vs XML/SOAP
- The definition of WSRF means that the Grid and
Web services communities can move forward on a
common base - Why Not Just Use XML/SOAP?
- WSRF and WS-N are just XML and SOAP
- WSRF and WS-N are just Web services
- Benefits of following the specs
- These patterns represent best practices that have
been learned in many Grid applications - There is a community behind them
- Why reinvent the wheel?
- Standards facilitate interoperability
29GT2 Evolution To GT4
- ALL of GT2 functionality is in GT4
- What happened to the GT2 key protocols?
- Security Adapted X.509 proxy certs to integrate
with emerging WS standards - GRAM Took ad hoc protocols away and now use
WS-RF standards (ManagedJobFactory and related
service definitions) - GridFTP Using GridFTP standard from GGF,
Reliable File Transfer (RFT) supplies the
WSRF-compliant interface - MDS/LDAP Replaced LDAP extensions with WSRF
standards for notification, subscription, and
registration
30GT2 vs GT4
- Pre-WS Globus is in GT4 release
- Both WS and pre-WS components (ala 2.4.3) are
shipped - These do NOT interact, but both can run on the
same resource independently - Basic functionality is the same
- Run a job
- Transfer a file
- Monitoring
- Security
- Code base is completely different
31Why Use GT4?
- Performance and reliability
- Literally millions of tests and queries run
against GT4 services - Scalability
- Many lessons learned from GT2 have been addressed
in GT4 - Support
- This is our active code base, much more attention
- Additional functionality
- New features are here
- Additional GRAM interfaces to schedulers, MDS
Trigger service, GridFTP protocol interfaces, etc - Easier to contribute to
324.0 is not a typical .0 release,but the
culmination of months of testing
3.0.2
4.0.2
3.2.1
3.0.1
4.0.1
3.0.0
3.2.0
4.0.0
3.9.4
3.9.2
3.9.0
3.9.5
3.9.3
3.9.1
3.3.0
CVS trunk
Stable release branch
Development release
Stable release
33Versioning and Support
- Versioning
- Evens are production (4.0.x, 4.2.x),
- Odds are development (4.1.x)
- We support this version and the one previous
- Currently were at 4.0.2 so we support3.2 and
4.0 - There is also a 4.1.0 development release
34Several Possible Next Versions
- 4.0.3 stable release
- 100 same interfaces, bug fixes only
- Perhaps in the fall?
- 4.1.1 development release
- New functionality
- Likely 6-10 weeks?
- 4.2 - stable release
- When 4.1 has enough new functionality, and is
stable - 5.0 substantial code base change
- With any luck, not for years )
35Testing Overview
- Nightly builds and tests
- TestGrid at USC/ISI
- Stand up services for several weeks
- Perform stress tests
- TestGrid at LBNL
- Focus on WS Core performance and interoperability
tests - Performance and reliability testing is a major
focus - Component-specific approaches mostly
- Calls for Community Testing near release time -
we welcome new testing help!
36Tested Platforms
- Debian
- Fedora Core
- FreeBSD
- HP/UX
- IBM AIX
- Red Hat
- Sun Solaris
- SGI Altix (IA64 running Red Hat)
- SuSE Linux
- Tru64 Unix
- Apple MacOS X (no binaries)
- Windows Java components only
- List of binaries and known platform-specific
install bugs at - http//www.globus.org/toolkit/docs/4.0/admin/
docbook/ ch03.html
37Documentation Overview
- Current document significantly more detailed than
earlier versions - http//www.globus.org/toolkit/docs/4.0/
- Tutorials available for those of you building a
new service - http//www-unix.globus.org/toolkit/tutorials/BAS/
- Globus Toolkit 4 Programming Java Services (The
Morgan Kaufmann Series in Networking), by Borja
Sotomayor, Lisa Childers (Available through
Amazon, 19.99 or 20)
38Grid Packaging Technology (GPT)
- Collection of XML-based packaging tools
- Straight forward definition of complex dependency
and compatibility relationships between packages - Way for developers to define the packaging data
and include it as part of their source code
distribution - Automatic generation of binary packages
- Developer tools
- Convert a source distribution into a GPT package
- Patch-n-build capability similar to RPM spec
files so you can retain their own build system if
needed - User Tools
- Enable collections of packages to be built and/or
installed - Package manager for those systems that don't have
one - Developed at NCSA
39Virtual Data Toolkit (VDT)
- Grid middleware distribution focused on ease of
use (and installation) - Contents
- Globus Toolkit Condor, Condor-G
- Virtual Data Tools (Chimera, Pegasus, RLS)
- Utilities (GSI-OpenSSH, MyProxy, MonaLisa,
NetLogger, KX.509, etc.) - GriPhyN Virtual Data System (containing Chimera
and Pegasus) - Uses PACMAN for distribution, install,
configuration. - Used by GriPhyN, iVDGL, Open Science Grid, LCG,
UK NGS, and others
40Installation in a nutshell
- Quickstart guide is very useful
- http//www.globus.org/toolkit/docs/4.0/
admin/docbook/quickstart.html - Verify your prereqs!
- Security check spellings and permissions
- Globus is system software plan accordingly
41Now that youvedone your installation Lets
talk about what you get!
42Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
43GT4 Web Services Runtime
- Supports both GT (GRAM, RFT, Delegation, etc.)
user-developed services - Redesign to enhance scalability, modularity,
performance, usability - Leverages existing WS standards
- WS-I Basic Profile WSDL, SOAP, etc.
- WS-Security, WS-Addressing
- Adds support for emerging WS standards
- WS-Resource Framework, WS-Notification
- Java, Python, C hosting environments
- Java is standard Apache
44What does Core give you?
- Reference implementation of WSRF and WS-N
functions - Naming and bindings (basis for virtualization)
- Every resource can be uniquely referenced and has
one or more associated services for interacting - Lifecycle (basis for resilient state management)
- Resources created by svcs following a factory
pattern - Resource destroyed immediately or scheduled
- Information model (basis for monitoring
discovery) - Resource properties associated with resources
- Operations for querying and setting this info
- Asynchronous notification of changes to
properties - Service groups (basis for registries collective
svcs) - Group membership rules and membership management
- Base fault type
45Apache Axis Web Services Container
- Good news for Java WS developers GT4.0 works
with standard Axis and Tomcat - GT provides Axis-loadable libraries, handlers
- Includes useful behaviors such as inspection,
notification, lifetime mgmt (WSRF) - Others implement GRAM, etc.
- Major Globus contributions to Apache
- 50 of WS-Addressing code
- 15 of WS-Security code
- Many bug fixes
- WSRF code a possible next contribution
GT bits
App bits
Security Addressing
Axis
Modulo Axis and Tomcat release cycle issues
46GT4 Web Services Runtime
47WS Core Enables FrameworksE.g., Resource
Management
Applications of the framework(Compute, network,
storage provisioning,job reservation
submission, data management,application service
QoS, )
WS-Agreement(Agreement negotiation)
WS Distributed Management(Lifecycle, monitoring,
)
WS-Resource Framework WS-Notification
(Resource identity, lifetime, inspection,
subscription, )
Web services(WSDL, SOAP, WS-Security,
WS-ReliableMessaging, )
48WSRF/WSNs Compared(Humphrey et al, HPDC 2005)
49GetRP Test
- Distributed client and service on same LAN
- (times in milliseconds)
149.67
No Security
X509 Signing
HTTPS
25.57
181.96
17.1
140.5
55.6
81.39
10.05
8.23
N/A
2.34
14.8
11.46
12.91
2.85
50GT4 WS Core Performance
(1) Message-level security (times in milliseconds)
(2) Transport-level security (times in
milliseconds)
WSRF/WSNs Compared, HPDC 2005.
51Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
52Globus Security
- Control access to shared services
- Address autonomous management, e.g., different
policy in different work-groups - Support multi-user collaborations
- Federate through mutually trusted services
- Local policy authorities rule
- Allow users and application communities to set up
dynamic trust domains - Personal/VO collection of resources working
together based on trust of user/VO
53Virtual Organization (VO) Concept
- VO for each application or workload
- Carve out and configure resources for a
particular use and set of users
54GT4 Security
Users
55GT Authorization Framework
56GT4 Security
- Public-key-based authentication
- Transport- and message-level authentication
- Extensible authorization framework based on Web
services standards - SAML-based authorization callout
- Integrated policy decision engine
- XACML policy language, per-operation policies,
pluggable - Credential management service
- MyProxy (One time password support)
- Community Authorization Service
- Standalone delegation service
- Ability to map between Grid and local identity
57Security Tools
- Basic Grid Security Mechanisms
- Certificate Generation Tools
- Certificate Management Tools
- Getting users registered to use a Grid
- Getting Grid credentials to wherever theyre
needed in the system - Authorization/Access Control Tools
- Storing and providing access to system-wide
authorization information
58Other Security Services Include
- MyProxy
- Simplified credential management
- Web portal integration
- Single-sign-on support
- KCA kx.509
- Bridging into/out-of Kerberos domains
- SimpleCA
- Online credential generation
- PERMIS
- Authorization service callout
59A Cautionary Note
- Grid security mechanisms are tedious to set up
- If exposed to users, hand-holding is usually
required - These mechanisms can be hidden entirely from end
users, but still used behind the scenes - These mechanisms exist for good reasons.
- Many useful things cant be done without Grid
security - It is unlikely that an ambitious project could go
into production operation without security like
this - Most successful projects end up using Grid
security, but using it in ways that end users
dont see much
60GT4s Use of Security Standards
Supported, Supported, Fastest,
but slow but insecure so default
61GT-XACML Integration
- eXtensible Access Control Markup Language
- OASIS standard, open source implementations
- XACML sophisticated policy language
- Globus Toolkit ships with XACML runtime
- Included in every client and server built on GT
- Turned-on through configuration
- that can be called transparently from runtime
and/or explicitly from application - and we use the XACML-model for our Authz
Processing Framework
62Globus Certificate Service
- An online service that issues low-quality GSI
certificates - Intended for people who want to experiment with
Grid components that require certificates but do
not have any other means of acquiring
certificates. - These certificates are not to be used on
production systems. - Not a true Certificate Authority (CA)
- No revoking or reissuing certificates
- No verification of identities
- The service itself is not especially secure.
63Simple CA
- A convenient method of setting up a certificate
authority (CA). - The Certificate Authority can then be used to
issue certificates for users and services that
work with GSI and WS-Security. - Simple CA is intended for operators of small Grid
testing environments and users who are not part
of a larger Grid. - Most production Grids will not accept
certificates that are not signed by a well-known
CA, so the certificates generated by Simple CA
will usually not be sufficient to gain access to
production services.
64MyProxy
- MyProxy is a remote service that stores user
credentials. - Users can request proxies for local use on any
system on the network. - Web Portals can request user proxies for use with
back-end Grid services. - Grid administrators can pre-load credentials in
the server for users to retrieve when needed. - Greatly simplifies certificate management!
65CAS Community Authorization Service
- CAS allows resource providers to specify
course-grained access control policies in terms
of communities as a whole. - Fine-grained access control is delegated to the
community. - Resource providers maintain ultimate authority
over their resources (including per-user control
and auditing) but are spared most day-to-day
policy administration tasks.
66VOMS
- A community-level group membership system
- Database of user roles
- Administrative tools
- Client interface
- voms-proxy-init
- Uses client interface to produce an attribute
certificate (instead of proxy) that includes
roles capabilities signed by VOMS server - Works with non-VOMS services, but gives more info
to VOMS-aware services - Allows VOs to centrally manage user roles
67Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
68Execution Management (GRAM)
- Common WS interface to schedulers
- Unix, Condor, LSF, PBS, SGE,
- More generally interface for process execution
management - Lay down execution environment
- Stage data
- Monitor manage lifecycle
- Kill it, clean up
- A basis for application-driven provisioning
69GRAM - Basic Job Submission and Control Service
- A uniform service interface for remote job
submission and control - Includes file staging and I/O management
- Includes reliability features
- Supports basic Grid security mechanisms
- Available in Pre-WS and WS
- GRAM is not a scheduler.
- No scheduling
- No metascheduling/brokering
- Often used as a front-end to schedulers, and
often used to simplify metaschedulers/brokers
70GT4 WS GRAM
- 2nd-generation WS implementation optimized for
performance, flexibility, stability, scalability - Streamlined critical path
- Use only what you need
- Flexible credential management
- Credential cache delegation service
- GridFTP RFT used for data operations
- Data staging streaming output
- Eliminates redundant GASS code
71GRAM
- Intended for jobs where arbitrary programs,
stateful monitoring, credential management, and
file staging are important - If the application is lightweight, with modest
input/output, may be a better candidate for
hosting directly as a WSRF service
72GT4 WS GRAM Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
73GT4 WS GRAM Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
Delegated credential can be Made available to
the application
74GT4 WS GRAM Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
Delegated credential can be Used to authenticate
with RFT
75GT4 WS GRAM Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
Delegated credential can be Used to authenticate
with GridFTP
76Submitting a Sample Job
- Specify a remote host with F
- globusrun-ws submit F host2 c /bin/true
- The return code will be the jobs exit code if
supported by the scheduler
77Data Staging and Streaming
- Simplest stage-in/stage-out example is
stdout/stderr - globusrun-ws S s c /bin/date
- -S is short for -submit
- -s is short for streaming
- The output will be sent back to the terminal,
control will not return until the job is done
78Resource Specification Language
- For more complicated jobs, well use RSL to
specify the job - ltjobgt
- ltexecutablegt/bin/echolt/executablegt
- ltargumentgtthis is an example_string lt/argumentgt
- ltargumentgtGlobus was herelt/argumentgt
- ltstdoutgtGLOBUS_USER_HOME/stdoutlt/stdoutgt
- ltstderrgtGLOBUS_USER_HOME/stderrlt/stderrgt
- lt/jobgt
79Resource Specification Language
- ltjobgt
- ltexecutablegt/bin/echolt/executablegt
ltdirectorygt/tmplt/directorygt ltargumentgt12lt/argument
gt - ltenvironmentgtltnamegtPIlt/namegt ltvaluegt3.141lt/valuegtlt
/environmentgt - ltstdingt/dev/nulllt/stdingt
- ltstdoutgtstdoutlt/stdoutgt
- ltstderrgtstderrlt/stderrgt
- lt/jobgt
80Resource Specification Language
- ltjobgt
- ltexecutablegt/bin/echolt/executablegt
ltdirectorygt/tmplt/directorygt ltargumentgt12lt/argument
gt - ltenvironmentgtltnamegtPIlt/namegt ltvaluegt3.141lt/valuegtlt
/environmentgt - ltstdingt/dev/nulllt/stdingt
- ltstdoutgtstdoutlt/stdoutgt
- ltstderrgtstderrlt/stderrgt
- lt/jobgt
81Submitting Using XML
- Create the file containing the RSL
- You may validate the RSL ahead of time
- globusrun-ws validate f rslfile.xml
- If the file validates, submit using
-submit
82At Most Once Submission
- You may specify a UUID with your job submission
- If youre not sure the submission worked, you may
submit the job again with the same UUID - If the job has already been submitted, the new
submission will have no effect - If you do not specify a UUID, one will be
generated for you
83Staging Data
- GRAMs RSL allows many fileStageIn/fileStageOut
directives - The transfers will be executed by RFT
- May specify additional RFT options using the
RFTOptions tag - There is no GASS cache staging option anymore
84Staging Data Stage In
- GRAMs RSL allows many fileStageIn/fileStageOut
directives - ltfileStageIngt lttransfergt
- ltsourceUrlgt
- gsiftp//job.submitting.host2811/bin/echolt/source
Urlgt - ltdestinationUrlgtfile///GLOBUS_USER_HOME/my_ech
olt/destinationUrlgt - lt/transfergt lt/fileStageIngt
85Staging Data Stage Out
- ltfileStageOutgt
- lttransfergt
- ltsourceUrlgtfile//GLOBUS_USER_HOME/stdout
- lt/sourceUrlgt
- ltdestinationUrlgtgsiftp//job.submitting.host2811/
tmp/stdoutlt/destinationUrlgt - lt/transfergt
- lt/fileStageOutgt
86Staging Data - Cleanup
- ltfileCleanUpgt
- ltdeletiongt
- ltfilegtfile//GLOBUS_USER_HOME/my_echolt/filegt
- lt/deletiongt
- lt/fileCleanUpgt
87Staging Data - Credentials
- The GridFTP servers youre using may require
different credentials than the GRAM service
youre submitting to - The RSL allows you to specify separate
credentials for the executable and staging
components of the job
88RSL Substitutions
- GRAM will perform some variable substitutions for
you - GLOBUS_USER_HOME
- GLOBUS_USER_NAME
- GLOBUS_SCRATCH_DIR
- GLOBUS_LOCATION
- SCRATCH_DIR will be a compute-node local
high-speed storage if defined, or
GLOBUS_USER_HOME if not
89Batch Submission
- Your client does not have to stay attached to the
execution of the job - -batch will disconnect from the job and output an
EPR - You may redirect the EPR to a file with o
- Use the EPR file with monitor or -status
- You may also kill the job using -kill
90Specifying Scheduler Options
- RSL lets you specify various scheduler options
- what queue to submit to
- which project to select for accounting
- max CPU and wallclock time to spend
- min/max memory required
- All defined online under the schema document for
GRAM
91Choosing User Accounts
- You may be authorized to use more than one
account at the remote site - By default, the first listed in the grid-mapfile
will be used - You may request a specific user account using the
ltlocalUserIdgt element
92Multijobs
- You may specify more than one ltjobgt element in a
ltmultijobgt - At that point, you want to specify the
ltfactoryEndpointgt in the RSL rather than the
commandline - Will be used by MPICH-G to support MPI jobs
93WS GRAM Performance
- Time to submit a basic GRAM job
- Pre-WS GRAM lt 1 second
- WS GRAM 2 seconds
- Concurrent jobs
- Pre-WS GRAM 300 jobs
- WS GRAM 32,000 jobs
94CondorG
- The Condor project has produced a helper
front-end to GRAM - Managing sets of subtasks
- Reliable front-end to GRAM to manage
computational resources - Note this is not Condor which promotes
high-throughput computing, and use of idle
resources
95Chimera Virtual Data
- Captures both logical and physical steps in a
data analysis process. - Transformations (logical)
- Derivations (physical)
- Builds a catalog.
- Results can be used to replay analysis.
- Generation of DAG (via Pegasus)
- Execution on Grid
- Catalog allows introspection of analysis process.
Sloan Survey Data
Galaxy cluster size distribution
96Pegasus Workflow Transformation
- Converts Abstract Workflow (AW) into Concrete
Workflow (CW). - Uses Metadata to convert user request to logical
data sources - Obtains AW from Chimera
- Uses replication data to locate physical files
- Delivers CW to DAGman
- Executes using Condor
- Publishes new replication and derivation data in
RLS and Chimera (optional)
ChimeraVirtual DataCatalog
MetadataCatalog
t
DAGman
ReplicaLocationService
Condor
ComputeServer
StorageSystem
ComputeServer
StorageSystem
StorageSystem
ComputeServer
ComputeServer
97(No Transcript)
98Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
99GT4 Data Management
- Stage/move large data to/from nodes
- GridFTP, Reliable File Transfer (RFT)
- Alone, and integrated with GRAM
- Locate data of interest
- Replica Location Service (RLS)
- Replicate data for performance/reliability
- Distributed Replication Service (DRS)
- Provide access to diverse data sources
- File systems, parallel file systems, hierarchical
storage GridFTP - Databases OGSA DAI
100GridFTP
- A high-performance, secure, reliable data
transfer protocol optimized for high-bandwidth
wide-area networks - FTP with well-defined extensions
- Uses basic Grid security (control and data
channels) - Multiple data channels for parallel transfers
- Partial file transfers
- Third-party (direct server-to-server) transfers
- Reusable data channels
- Command pipelining
- GGF recommendation GFD.20
101GridFTP in GT4
Disk-to-disk onTeraGrid
- 100 Globus code
- No licensing issues
- Stable, extensible
- IPv6 Support
- XIO for different transports
- Striping ? multi-Gb/sec wide area transport
- Pluggable
- Front-end e.g., future WS control channel
- Back-end e.g., HPSS, cluster file systems
- Transfer e.g., UDP, NetBLT transport
102Striped Server
- Multiple nodes work together and act as a single
GridFTP server - An underlying parallel file system allows all
nodes to see the same file system and must
deliver good performance (usually the limiting
factor in transfer speed) - I.e., NFS does not cut it
- Each node then moves (reads or writes) only the
pieces of the file that it is responsible for. - This allows multiple levels of parallelism, CPU,
bus, NIC, disk, etc. - Critical if you want to achieve better than 1 Gbs
without breaking the bank
103Striped GridFTP Service
- A distributed GridFTP service that runs on a
storage cluster - Every node of the cluster is used to transfer
data into/out of the cluster - Head node coordinates transfers
- Multiple NICs/internal busses lead to very high
performance - Maximizes use of Gbit WANs
104(No Transcript)
105Typical Approach (without XIO)
Network Protocol
Network Protocol
Protocol API
Network Protocol
Application
POSIX IO
Proprietary API
106Globus XIO Approach
Network Protocol
Network Protocol
Driver
Network Protocol
Globus XIO
Driver
Application
Driver
107Drivers
- Make 1 API do many types of IO
- Specific drivers for specific protocols/devices
- Transform
- Manipulate or examine data
- Do not move data outside of process space
- Compression, Security, Logging
- Transport
- Moves data across a wire
- TCP, UDP, File IO, Device IO
- Typically move data outside of process space
108Stack
Example Driver Stack
- Transport
- Exactly one per stack
- Must be on the bottom
- Transform
- Zero or many per stack
- Control flows from user to the top of the stack,
to the transport driver.
Compression
Logging
TCP
109Copying Files (in a nutshell)
- globus-url-copy options srcURL dstURL
- guc gsiftp//localhost/foo file///bar
- Client/server, using FTP stream mode
- guc vb dbg tcp-bs 1048576 p 8
gsiftp//localhost/foo gsiftp//localhost/bar - 3rd party transfer, MODE E
- guc https//host.domain.edu/foo
ftp//host.domain.gov/bar - from secure http to ftp server
110The Options Improving Performance
- -p (parallelism or number of streams)
- rule of thumb 4-8, start with 4
- -tcp-bs (TCP buffer size)
- use either ping or traceroute to determine the
RTT between hosts - buffer size BW (Mbs) RTT (ms)
1000/8/lt(parallelism value 1)gt - If that is still too complicated use 2MB
- -vb if you want performance feedback
- -dbg if you have trouble
111Tuning GridFTP
- Many ways you can tune the performance
- Two sources of data are
- http//www.globus.org/toolkit/docs/4.0/data/gridft
p/rn01re01.html - http//www.nsf-middleware.org/OnTheGrid/
2004-09-MaxGridFTP.pdf
112Exercise Simple File Movement
- grid-proxy-init
- echo test gt /tmp/test
- look at servers started for each
- guc gsiftp//hostname/tmp/test file///tmp/test2
- get (from server to client)
- guc file///tmp/test2 gsiftp//hostname/tmp/test3
- put (from client to server)
- guc gsiftp//hostname1/tmp/test3
gsiftp//hostname2/tmp/test4 - Third party transfer (between two servers)
- guc dcpriv gsiftp//localhost/dev/zero
gsiftp//localhost/dev/null - transfer with encryption on data channel
113Troubleshooting
- Can I get connected?
- telnet to the port telnet hostname port
- 2811 is the default port
- You should get something like this
- 220 GridFTP Server gridftp.mcs.anl.gov 0.17
(gcc32dbg, 1108765962-1) ready. Development
Release - If not, you have firewall problems, or xinetd
config problems. You are never even starting the
server.
114Troubleshooting
- no proxy
- grid-proxy-destroy
- guc gsiftp//localhost/dev/zero file///dev/null
- add dbg
- grid-proxy-init
- guc gsiftp//localhost/dev/zero file///dev/null
- add dbg
115Troubleshooting
- Bad source file
- grid-proxy-init
- guc gsiftp//localhost2812/tmp/junk
file///tmp/empty - junk does not exist
- Note that an empty file named empty is created
- We need to fix this in globus-url-copy, but for
now it is there
116RFT - File Transfer Queuing
- A WSRF service for queuing file transfer requests
- Server-to-server transfers
- Checkpointing for restarts
- Database back-end for failovers
- Allows clients to requests transfers and then
disappear - No need to manage the transfer
- Status monitoring available if desired
117Reliable File TransferThird Party Transfer
- Fire-and-forget transfer
- Web services interface
- Many files directories
- Integrated failure recovery
- Has transferred 900K files
RFT Client
SOAP Messages
Notifications(Optional)
RFT Service
GridFTP Server
GridFTP Server
118Replica Location Service
- Identify location of files via logical to
physical name map - Distributed indexing of names, fault tolerant
update protocols - GT4 version scalable stable
- Managing 40 million files across 10 sites
Index
Index
119Reliable Wide Area Data Replication
LIGO Gravitational Wave Observatory
Birmingham
Replicating gt1 Terabyte/day to 8 sites gt30
million replicas so far MTBF 1 month
www.globus.org/solutions
120OGSA-DAI
- Grid Interfaces to Databases
- Data access
- Relational XML Databases, semi-structured files
- Data integration
- Multiple data delivery mechanisms, data
translation - Extensible Efficient framework
- Request documents contain multiple tasks
- A task execution of an activity
- Group work to enable efficient operation
- Extensible set of activities
- gt 30 predefined, framework for writing your own
- Moves computation to data
- Pipelined and streaming evaluation
- Concurrent task evaluation
121OGSA-DAI
- Provide service-based access to structured data
resources as part of Globus - Specify a selection of interfaces tailored to
various styles of data accessstarting with
relational and XML
122The OGSA-DAI Framework
Application
Client Toolkit
OGSA-DAI service
Engine
SQLQuery
Activities
GZip
GridFTP
XPath
readFile
XSLT
JDBC
Data Resources
XMLDB
File
MySQL
DB2
XIndice
SWISS PROT
SQL Server
Data- bases
123Extensibility Example
OGSA-DAI service
Engine
SQLQuery
SQLQuery
Multiple SQL GDS
JDBC
MySQL
124OGSA-DAI A Framework for Building Applications
- Supports data access, insert and update
- Relational MySQL, Oracle, DB2, SQL Server,
Postgres - XML Xindice, eXist
- Files CSV, BinX, EMBL, OMIM, SWISSPROT,
- Supports data delivery
- SOAP over HTTP
- FTP GridFTP
- E-mail
- Inter-service
- Supports data transformation
- XSLT
- ZIP GZIP
- Supports security
- X.509 certificate based security
125OGSA-DAI Other Features
- A framework for building data clients
- Client toolkit library for application developers
- A framework for developing functionality
- Extend existing activities, or implement your own
- Mix and match activities to provide functionality
you need - Highly extensible
- Customise our out-of-the-box product
- Provide your own services, client-side support,
and data-related functionality
126Data Replication Service (tech preview)
- Pull missing files to local site
Site B
Site A
List of required Files
Reliable File TransferService
Data Replication Service
Data Replication Service
Reliable File Transfer Service
GridFTP
Local ReplicaCatalog
Replica LocationIndex
Local Replica Catalog
ReplicaLocationIndex
GridFTP
127MCS - Metadata Catalog Service
- A stand-alone metadata catalog service
- WSRF service interface
- Stores system-defined and user-defined attributes
for logical files/objects - Supports manipulation and query
- Integrated with OGSA-DAI
- OGSA-DAI provides metadata storage
- When run with OGSA-DAI, basic Grid authentication
mechanisms are available
128Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
129Monitoring and Discovery System(MDS4)
- Grid-level monitoring system
- Aid user/agent to identify host(s) on which to
run an application - Warn on errors
- Uses standard interfaces to provide publishing of
data, discovery, and data access, including
subscription/notification - WS-ResourceProperties, WS-BaseNotification,
WS-ServiceGroup - Functions as an hourglass to provide a common
interface to lower-level monitoring tools
130Information Users Schedulers, Portals, Warning
Systems, etc.
WS standard interfaces for subscription,
registration, notification
Standard Schemas (GLUE schema, eg)
131MDS4 Components
- Information providers
- Monitoring is a part of every WSRF service
- Non-WS services are also be used
- Higher level services
- Index Service a way to aggregate data
- Trigger Service a way to be notified of changes
- Both built on common aggregator framework
- Clients
- WebMDS
- All of the tool are schema-agnostic, but
interoperability needs a well-understood common
language
132Information Providers
- Data sources for the higher-level services
- Some are built into services
- Any WSRF-compliant service publishes some data
automatically - WS-RF gives us standard Query/Subscribe/Notify
interfaces - GT4 services ServiceMetaDataInfo element
includes start time, version, and service type
name - Most of them also publish additional useful
information as resource properties
133Information Providers (2)
- Other sources of data
- Any executables
- Other (non-WS) services
- Interface to another archive or data store
- File scraping
- Just need to produce a valid XML document
134Information ProvidersGT4 Services
- Reliable File Transfer Service (RFT)
- Service status data, number of active transfers,
transfer status, information about the resource
running the service - Community Authorization Service (CAS)
- Identifies the VO served by the service instance
- Replica Location Service (RLS)
- Note not a WS
- Location of replicas on physical storage systems
(based on user registrations) for later queries
135Information ProvidersCluster and Queue Data
- Interfaces to Hawkeye, Ganglia, CluMon, Nagios
- Basic host data (name, ID), processor
information, memory size, OS name and version,
file system data, processor load data - Some condor/cluster specific data
- This can also be done for sub-clusters, not just
at the host level - Interfaces to PBS, Torque, LSF
- Queue information, number of CPUs available and
free, job count information, some memory
statistics and host info for head node of cluster
136Higher-Level Services
- Index Service
- Caching registry
- Trigger Service
- Warn on error conditions
- Archive Service
- Database store for history (in development)
- All of these have common needs, and are built on
a common framework
137Common Aggregator Framework
- Basic framework for higher-level functions
- Subscribe to Information Provider(s)
- Do some action
- Present standard interfaces
138Aggregator Framework Features
- 1) Common configuration mechanism
- Specify what data to get, and from where
- 2) Self cleaning
- Services have lifetimes that must be refreshed
- 3) Soft consistency model
- Published information is recent, but not
guaranteed to be the absolute latest - 4) Schema Neutral
- Valid XML document needed only
139MDS4 Index Service
- Index Service is both registry and cache
- Datatype and data provider info, like a registry
(UDDI) - Last value of data, like a cache
- In memory default approach
- DB backing store currently being developed to
allow for very large indexes - Can be set up for a site or set of sites, a
specific set of project data, or for
user-specific data only - Can be a multi-rooted hierarchy
- No global index
140MDS4 Trigger Service
- Subscribe to a set of resource properties
- Evaluate that data against a set of
pre-configured conditions (triggers) - When a condition matches, action occurs
- Email is sent to pre-defined address
- Website updated
- Similar functionality in Hawkeye
141WebMDS User Interface
- Web-based interface to WSRF resource property
information - User-friendly front-end to Index Service
- Uses standard resource property requests to query
resource property data - XSLT transforms to format and display them
- Customized pages are simply done by using HTML
form options and creating your own XSLT
transforms - Sample page
- http//mds.globus.org8080/webmds/webmds?infoinde
xinfoxslservicegroupxsl
142(No Transcript)
143Working with TeraGrid
- Large US project across 9 different sites
- Different hardware, queuing systems and lower
level monitoring packages - Starting to explore MetaScheduling approaches
- GRMS (Poznan)
- W. Smith (TACC)
- K. Yashimoto (SDSC)
- User Portal
- Need a common source of data with a standard
interface for basic scheduling info
144Data Collected
- Provide data at the subcluster level
- Sys admin defines a subcluster, we query one node
of it to dynamically retrieve relevant data - Can also list per-host details
- Interfaces to Ganglia, Hawkeye, CluMon, and
Nagios available now - Other cluster monitoring systems can write into a
.html file that we then scrape - Also collect basic queuing data, some TeraGrid
specific attributes
145(No Transcript)
146Scalability Experiments