Grid Computing and the Globus Toolkit - PowerPoint PPT Presentation

1 / 150
About This Presentation
Title:

Grid Computing and the Globus Toolkit

Description:

Sharing always conditional: issues of trust, policy, negotiation, payment, ... Glued together by... Application development. System integration. 12 ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 151
Provided by: jennife62
Category:

less

Transcript and Presenter's Notes

Title: Grid Computing and the Globus Toolkit


1
Grid Computing and the Globus Toolkit
  • Jennifer M. Schopf
  • Argonne National Lab
  • National eScience Centre
  • http//www.mcs.anl.gov/jms/Talks/

2
What is a Grid?
  • Resource sharing
  • Computers, storage, sensors, networks,
  • Sharing always conditional issues of trust,
    policy, negotiation, payment,
  • Coordinated problem solving
  • Beyond client-server distributed data analysis,
    computation, collaboration,
  • Dynamic, multi-institutional virtual orgs
  • Community overlays on classic org structures
  • Large or small, static or dynamic

3
Why Is this Hard or Different?
  • Lack of central control
  • Where things run
  • When they run
  • Shared resources
  • Contention, variability
  • Communication
  • Different sites implies different sys admins,
    users, institutional goals, and often strong
    personalities

4
So Why Do It?
  • Computations that need to be done with a time
    limit
  • Data that cant fit on one site
  • Data owned by multiple sites
  • Applications that need to be run bigger, faster,
    more

5
What Kinds of Applications?
  • Computation intensive
  • Interactive simulation (climate modeling)
  • Large-scale simulation and analysis (galaxy
    formation, gravity waves, event simulation)
  • Engineering (parameter studies, linked models)
  • Data intensive
  • Experimental data analysis (e.g., physics)
  • Image sensor analysis (astronomy, climate)
  • Distributed collaboration
  • Online instrumentation (microscopes, x-ray)
    Remote visualization (climate studies, biology)
  • Engineering (large-scale structural testing)

6
Key Common Feature
  • The size and/or complexity of the problem
    requires that people in several organizations
    collaborate and share computing resources, data,
    instruments

7
The Globus Approach
8
The Role of the Globus Toolkit
  • A collection of solutions to problems that come
    up frequently when building collaborative
    distributed applications
  • Heterogeneity
  • A focus, in particular, on overcoming
    heterogeneity for application developers
  • Standards
  • We capitalize on and encourage use of existing
    standards (IETF, W3C, OASIS, GGF)
  • GT also includes reference implementations of
    new/proposed standards in these organizations

9
Globus is an Hour Glass
Higher-Level Services and Users
  • Local sites have an their own policies, installs
    heterogeneity!
  • Queuing systems, monitors, network protocols, etc
  • Globus unifies
  • Build on Web services
  • Use WS-RF, WS-Notification to represent/access
    state
  • Common management abstractions interfaces

Standard GT4 Interfaces
Local heterogeneity
10
On April 29, 2005 the Globus Alliance
releasedthe finest version of the Globus Toolkit
to date!
Dont take our word for it! Read the UK eScience
Evaluation of GT4 www.nesc.ac.uk/technical_papers/
UKeS-2005-03.pdf (Reachable from www.globus.org,
under News)
11
How it Really Happens
  • Implementations are provided by a mix of
  • Application-specific code
  • Off the shelf tools and services
  • Tools and services from the Globus Toolkit
  • Tools and services from the Grid community
    (compatible with GT)
  • Glued together by
  • Application development
  • System integration

12
A Typical eScience Use of GlobusNetwork for
Earthquake Eng. Simulation
Links instruments, data, computers, people
13
Without the Globus Toolkit
ComputeServer
A
SimulationTool
ComputeServer
B
WebBrowser
WebPortal
RegistrationService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
C
ChatTool
DataCatalog
Database service
D
CredentialRepository
Database service
E
Certificate authority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
14
With the Globus Toolkit
ComputeServer
GlobusGRAM
SimulationTool
ComputeServer
GlobusGRAM
WebBrowser
CHEF
Globus IndexService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
GlobusDAI
CHEF ChatTeamlet
GlobusMCS/RLS
Database service
GlobusDAI
MyProxy
Database service
GlobusDAI
CertificateAuthority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
15
Globus is Grid Infrastructure
  • Software for Grid infrastructure
  • Service enable new existing resources
  • E.g., GRAM on computer, GridFTP on storage
    system, custom application service
  • Uniform abstractions mechanisms
  • Tools to build applications that exploit Grid
    infrastructure
  • Registries, security, data management,
  • Open source open standards
  • Each empowers the other
  • Enabler of a rich tool service ecosystem

16
The Globus ToolkitStandard Plumbing for the
Grid
  • Not turnkey solutions, but building blocks
    tools for application developers system
    integrators
  • Some components (e.g., file transfer) go farther
    than others (e.g., remote job submission) toward
    end-user relevance
  • Easier to reuse than to reinvent
  • Compatibility with other Grid systems comes for
    free
  • Today the majority of the GT public interfaces
    are usable by application developers and system
    integrators
  • Relatively few end-user interfaces
  • In general, not intended for direct use by end
    users (scientists, engineers, marketing
    specialists)

17
Globus is a Building Block
  • Basic components for grid functionality
  • Highest-level services are often application
    specific, we let applications concentrate there
  • Easier to reuse than to reinvent
  • Compatibility with other Grid systems comes for
    free
  • We provide basic infrastructure to get you one
    step closer

18
Standards
19
Leveraging Existingand Proposed Standards
  • WSRF and WS-N (GGF, OASIS)
  • WS-Agreement, WSDL 2.0, WSDM
  • GridFTP v1.0 (GGF)
  • OGSI v1.0 (GGF)
  • SSL/TLS v1 (from OpenSSL) (IETF)
  • X.509 Proxy Certificates (IETF)
  • SAML, XACML

20
GT Protocols
  • Web service protocols
  • WSDL, SOAP
  • WS Addressing, WSRF, WSN
  • WS Security, SAML, XACML
  • WS-Interoperability profile
  • Non Web service protocols
  • Standards-based, such as GridFTP
  • Custom

21
WSRF WS-Notification
  • Naming and bindings (basis for virtualization)
  • Every resource can be uniquely referenced, and
    has one or more associated services for
    interacting with it
  • Lifecycle (basis for fault resilient state
    management)
  • Resources created by services following factory
    pattern
  • Resources destroyed immediately or scheduled
  • Information model (basis for monitoring
    discovery)
  • Resource properties associated with resources
  • Operations for querying and setting this info
  • Asynchronous notification of changes to
    properties
  • Service Groups (basis for registries collective
    svcs)
  • Group membership rules membership management
  • Base Fault type

22
WS Core Enables FrameworksE.g., Resource
Management
Applications of the framework(Compute, network,
storage provisioning,job reservation
submission, data management,application service
QoS, )
WS-Agreement(Agreement negotiation)
WS Distributed Management(Lifecycle, monitoring,
)
WS-Resource Framework WS-Notification
() (Resource identity, lifetime, inspection,
subscription, )
Web services(WSDL, SOAP, WS-Security,
WS-ReliableMessaging, )
An evolution of Open Grid Services
Infrastructure (OGSI)
23
WSRF vs XML/SOAP
  • The definition of WSRF means that the Grid and
    Web services communities can move forward on a
    common base
  • Why Not Just Use XML/SOAP?
  • WSRF and WS-N are just XML and SOAP
  • WSRF and WS-N are just Web services
  • Benefits of following the specs
  • These patterns represent best practices that have
    been learned in many Grid applications
  • There is a community behind them
  • Why reinvent the wheel?
  • Standards facilitate interoperability

24
Globus is a Tool
  • A Grid development environment
  • Develop new OGSA-compliant Web Services
  • Develop applications using Java or C/C Grid
    APIs
  • Secure applications using basic security
    mechanisms
  • A set of basic Grid functionality
  • Services and clients
  • Libraries
  • Development tools and examples
  • The prerequisites for many Grid community tools

25
GT Domain Areas
  • Core runtime
  • Infrastructure for building new services
  • Security
  • Apply uniform policy across distinct systems
  • Execution management
  • Provision, deploy, manage services
  • Data management
  • Discover, transfer, access large data
  • Monitoring
  • Discover monitor dynamic services

26
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
27
Our Goals for GT4
  • Usability, reliability, scalability,
  • Web service components have quality equal or
    superior to pre-WS components
  • Documentation at acceptable quality level
  • Consistency with latest standards (WS-, WSRF,
    WS-N, etc.) and Apache platform
  • WS-I Basic Profile compliant
  • WS-I Basic Security Profile compliant
  • New components, platforms, languages
  • And links to larger Globus ecosystem

28
WSRF vs XML/SOAP
  • The definition of WSRF means that the Grid and
    Web services communities can move forward on a
    common base
  • Why Not Just Use XML/SOAP?
  • WSRF and WS-N are just XML and SOAP
  • WSRF and WS-N are just Web services
  • Benefits of following the specs
  • These patterns represent best practices that have
    been learned in many Grid applications
  • There is a community behind them
  • Why reinvent the wheel?
  • Standards facilitate interoperability

29
GT2 Evolution To GT4
  • ALL of GT2 functionality is in GT4
  • What happened to the GT2 key protocols?
  • Security Adapted X.509 proxy certs to integrate
    with emerging WS standards
  • GRAM Took ad hoc protocols away and now use
    WS-RF standards (ManagedJobFactory and related
    service definitions)
  • GridFTP Using GridFTP standard from GGF,
    Reliable File Transfer (RFT) supplies the
    WSRF-compliant interface
  • MDS/LDAP Replaced LDAP extensions with WSRF
    standards for notification, subscription, and
    registration

30
GT2 vs GT4
  • Pre-WS Globus is in GT4 release
  • Both WS and pre-WS components (ala 2.4.3) are
    shipped
  • These do NOT interact, but both can run on the
    same resource independently
  • Basic functionality is the same
  • Run a job
  • Transfer a file
  • Monitoring
  • Security
  • Code base is completely different

31
Why Use GT4?
  • Performance and reliability
  • Literally millions of tests and queries run
    against GT4 services
  • Scalability
  • Many lessons learned from GT2 have been addressed
    in GT4
  • Support
  • This is our active code base, much more attention
  • Additional functionality
  • New features are here
  • Additional GRAM interfaces to schedulers, MDS
    Trigger service, GridFTP protocol interfaces, etc
  • Easier to contribute to

32
4.0 is not a typical .0 release,but the
culmination of months of testing
3.0.2
4.0.2
3.2.1
3.0.1
4.0.1
3.0.0
3.2.0
4.0.0
3.9.4
3.9.2
3.9.0
3.9.5
3.9.3
3.9.1
3.3.0
CVS trunk
Stable release branch
Development release
Stable release
33
Versioning and Support
  • Versioning
  • Evens are production (4.0.x, 4.2.x),
  • Odds are development (4.1.x)
  • We support this version and the one previous
  • Currently were at 4.0.2 so we support3.2 and
    4.0
  • There is also a 4.1.0 development release

34
Several Possible Next Versions
  • 4.0.3 stable release
  • 100 same interfaces, bug fixes only
  • Perhaps in the fall?
  • 4.1.1 development release
  • New functionality
  • Likely 6-10 weeks?
  • 4.2 - stable release
  • When 4.1 has enough new functionality, and is
    stable
  • 5.0 substantial code base change
  • With any luck, not for years )

35
Testing Overview
  • Nightly builds and tests
  • TestGrid at USC/ISI
  • Stand up services for several weeks
  • Perform stress tests
  • TestGrid at LBNL
  • Focus on WS Core performance and interoperability
    tests
  • Performance and reliability testing is a major
    focus
  • Component-specific approaches mostly
  • Calls for Community Testing near release time -
    we welcome new testing help!

36
Tested Platforms
  • Debian
  • Fedora Core
  • FreeBSD
  • HP/UX
  • IBM AIX
  • Red Hat
  • Sun Solaris
  • SGI Altix (IA64 running Red Hat)
  • SuSE Linux
  • Tru64 Unix
  • Apple MacOS X (no binaries)
  • Windows Java components only
  • List of binaries and known platform-specific
    install bugs at
  • http//www.globus.org/toolkit/docs/4.0/admin/
    docbook/ ch03.html

37
Documentation Overview
  • Current document significantly more detailed than
    earlier versions
  • http//www.globus.org/toolkit/docs/4.0/
  • Tutorials available for those of you building a
    new service
  • http//www-unix.globus.org/toolkit/tutorials/BAS/
  • Globus Toolkit 4 Programming Java Services (The
    Morgan Kaufmann Series in Networking), by Borja
    Sotomayor, Lisa Childers (Available through
    Amazon, 19.99 or 20)

38
Grid Packaging Technology (GPT)
  • Collection of XML-based packaging tools
  • Straight forward definition of complex dependency
    and compatibility relationships between packages
  • Way for developers to define the packaging data
    and include it as part of their source code
    distribution
  • Automatic generation of binary packages
  • Developer tools
  • Convert a source distribution into a GPT package
  • Patch-n-build capability similar to RPM spec
    files so you can retain their own build system if
    needed
  • User Tools
  • Enable collections of packages to be built and/or
    installed
  • Package manager for those systems that don't have
    one
  • Developed at NCSA

39
Virtual Data Toolkit (VDT)
  • Grid middleware distribution focused on ease of
    use (and installation)
  • Contents
  • Globus Toolkit Condor, Condor-G
  • Virtual Data Tools (Chimera, Pegasus, RLS)
  • Utilities (GSI-OpenSSH, MyProxy, MonaLisa,
    NetLogger, KX.509, etc.)
  • GriPhyN Virtual Data System (containing Chimera
    and Pegasus)
  • Uses PACMAN for distribution, install,
    configuration.
  • Used by GriPhyN, iVDGL, Open Science Grid, LCG,
    UK NGS, and others

40
Installation in a nutshell
  • Quickstart guide is very useful
  • http//www.globus.org/toolkit/docs/4.0/
    admin/docbook/quickstart.html
  • Verify your prereqs!
  • Security check spellings and permissions
  • Globus is system software plan accordingly

41
Now that youvedone your installation Lets
talk about what you get!
42
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
43
GT4 Web Services Runtime
  • Supports both GT (GRAM, RFT, Delegation, etc.)
    user-developed services
  • Redesign to enhance scalability, modularity,
    performance, usability
  • Leverages existing WS standards
  • WS-I Basic Profile WSDL, SOAP, etc.
  • WS-Security, WS-Addressing
  • Adds support for emerging WS standards
  • WS-Resource Framework, WS-Notification
  • Java, Python, C hosting environments
  • Java is standard Apache

44
What does Core give you?
  • Reference implementation of WSRF and WS-N
    functions
  • Naming and bindings (basis for virtualization)
  • Every resource can be uniquely referenced and has
    one or more associated services for interacting
  • Lifecycle (basis for resilient state management)
  • Resources created by svcs following a factory
    pattern
  • Resource destroyed immediately or scheduled
  • Information model (basis for monitoring
    discovery)
  • Resource properties associated with resources
  • Operations for querying and setting this info
  • Asynchronous notification of changes to
    properties
  • Service groups (basis for registries collective
    svcs)
  • Group membership rules and membership management
  • Base fault type

45
Apache Axis Web Services Container
  • Good news for Java WS developers GT4.0 works
    with standard Axis and Tomcat
  • GT provides Axis-loadable libraries, handlers
  • Includes useful behaviors such as inspection,
    notification, lifetime mgmt (WSRF)
  • Others implement GRAM, etc.
  • Major Globus contributions to Apache
  • 50 of WS-Addressing code
  • 15 of WS-Security code
  • Many bug fixes
  • WSRF code a possible next contribution

GT bits
App bits
Security Addressing
Axis
Modulo Axis and Tomcat release cycle issues
46
GT4 Web Services Runtime
47
WS Core Enables FrameworksE.g., Resource
Management
Applications of the framework(Compute, network,
storage provisioning,job reservation
submission, data management,application service
QoS, )
WS-Agreement(Agreement negotiation)
WS Distributed Management(Lifecycle, monitoring,
)
WS-Resource Framework WS-Notification
(Resource identity, lifetime, inspection,
subscription, )
Web services(WSDL, SOAP, WS-Security,
WS-ReliableMessaging, )
48
WSRF/WSNs Compared(Humphrey et al, HPDC 2005)
49
GetRP Test
  • Distributed client and service on same LAN
  • (times in milliseconds)

149.67
No Security
X509 Signing
HTTPS
25.57
181.96
17.1
140.5
55.6
81.39
10.05
8.23
N/A
2.34
14.8
11.46
12.91
2.85
50
GT4 WS Core Performance
(1) Message-level security (times in milliseconds)
(2) Transport-level security (times in
milliseconds)
WSRF/WSNs Compared, HPDC 2005.
51
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
52
Globus Security
  • Control access to shared services
  • Address autonomous management, e.g., different
    policy in different work-groups
  • Support multi-user collaborations
  • Federate through mutually trusted services
  • Local policy authorities rule
  • Allow users and application communities to set up
    dynamic trust domains
  • Personal/VO collection of resources working
    together based on trust of user/VO

53
Virtual Organization (VO) Concept
  • VO for each application or workload
  • Carve out and configure resources for a
    particular use and set of users

54
GT4 Security
Users
55
GT Authorization Framework
56
GT4 Security
  • Public-key-based authentication
  • Transport- and message-level authentication
  • Extensible authorization framework based on Web
    services standards
  • SAML-based authorization callout
  • Integrated policy decision engine
  • XACML policy language, per-operation policies,
    pluggable
  • Credential management service
  • MyProxy (One time password support)
  • Community Authorization Service
  • Standalone delegation service
  • Ability to map between Grid and local identity

57
Security Tools
  • Basic Grid Security Mechanisms
  • Certificate Generation Tools
  • Certificate Management Tools
  • Getting users registered to use a Grid
  • Getting Grid credentials to wherever theyre
    needed in the system
  • Authorization/Access Control Tools
  • Storing and providing access to system-wide
    authorization information

58
Other Security Services Include
  • MyProxy
  • Simplified credential management
  • Web portal integration
  • Single-sign-on support
  • KCA kx.509
  • Bridging into/out-of Kerberos domains
  • SimpleCA
  • Online credential generation
  • PERMIS
  • Authorization service callout

59
A Cautionary Note
  • Grid security mechanisms are tedious to set up
  • If exposed to users, hand-holding is usually
    required
  • These mechanisms can be hidden entirely from end
    users, but still used behind the scenes
  • These mechanisms exist for good reasons.
  • Many useful things cant be done without Grid
    security
  • It is unlikely that an ambitious project could go
    into production operation without security like
    this
  • Most successful projects end up using Grid
    security, but using it in ways that end users
    dont see much

60
GT4s Use of Security Standards
Supported, Supported, Fastest,
but slow but insecure so default
61
GT-XACML Integration
  • eXtensible Access Control Markup Language
  • OASIS standard, open source implementations
  • XACML sophisticated policy language
  • Globus Toolkit ships with XACML runtime
  • Included in every client and server built on GT
  • Turned-on through configuration
  • that can be called transparently from runtime
    and/or explicitly from application
  • and we use the XACML-model for our Authz
    Processing Framework

62
Globus Certificate Service
  • An online service that issues low-quality GSI
    certificates
  • Intended for people who want to experiment with
    Grid components that require certificates but do
    not have any other means of acquiring
    certificates.
  • These certificates are not to be used on
    production systems.
  • Not a true Certificate Authority (CA)
  • No revoking or reissuing certificates
  • No verification of identities
  • The service itself is not especially secure.

63
Simple CA
  • A convenient method of setting up a certificate
    authority (CA).
  • The Certificate Authority can then be used to
    issue certificates for users and services that
    work with GSI and WS-Security.
  • Simple CA is intended for operators of small Grid
    testing environments and users who are not part
    of a larger Grid.
  • Most production Grids will not accept
    certificates that are not signed by a well-known
    CA, so the certificates generated by Simple CA
    will usually not be sufficient to gain access to
    production services.

64
MyProxy
  • MyProxy is a remote service that stores user
    credentials.
  • Users can request proxies for local use on any
    system on the network.
  • Web Portals can request user proxies for use with
    back-end Grid services.
  • Grid administrators can pre-load credentials in
    the server for users to retrieve when needed.
  • Greatly simplifies certificate management!

65
CAS Community Authorization Service
  • CAS allows resource providers to specify
    course-grained access control policies in terms
    of communities as a whole.
  • Fine-grained access control is delegated to the
    community.
  • Resource providers maintain ultimate authority
    over their resources (including per-user control
    and auditing) but are spared most day-to-day
    policy administration tasks.

66
VOMS
  • A community-level group membership system
  • Database of user roles
  • Administrative tools
  • Client interface
  • voms-proxy-init
  • Uses client interface to produce an attribute
    certificate (instead of proxy) that includes
    roles capabilities signed by VOMS server
  • Works with non-VOMS services, but gives more info
    to VOMS-aware services
  • Allows VOs to centrally manage user roles

67
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
68
Execution Management (GRAM)
  • Common WS interface to schedulers
  • Unix, Condor, LSF, PBS, SGE,
  • More generally interface for process execution
    management
  • Lay down execution environment
  • Stage data
  • Monitor manage lifecycle
  • Kill it, clean up
  • A basis for application-driven provisioning

69
GRAM - Basic Job Submission and Control Service
  • A uniform service interface for remote job
    submission and control
  • Includes file staging and I/O management
  • Includes reliability features
  • Supports basic Grid security mechanisms
  • Available in Pre-WS and WS
  • GRAM is not a scheduler.
  • No scheduling
  • No metascheduling/brokering
  • Often used as a front-end to schedulers, and
    often used to simplify metaschedulers/brokers

70
GT4 WS GRAM
  • 2nd-generation WS implementation optimized for
    performance, flexibility, stability, scalability
  • Streamlined critical path
  • Use only what you need
  • Flexible credential management
  • Credential cache delegation service
  • GridFTP RFT used for data operations
  • Data staging streaming output
  • Eliminates redundant GASS code

71
GRAM
  • Intended for jobs where arbitrary programs,
    stateful monitoring, credential management, and
    file staging are important
  • If the application is lightweight, with modest
    input/output, may be a better candidate for
    hosting directly as a WSRF service

72
GT4 WS GRAM Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
73
GT4 WS GRAM Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
Delegated credential can be Made available to
the application
74
GT4 WS GRAM Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
Delegated credential can be Used to authenticate
with RFT
75
GT4 WS GRAM Architecture
Service host(s) and compute element(s)
SEG
Job events
GT4 Java Container
Compute element
GRAM services
Local job control
GRAM services
Local scheduler
Job functions
sudo
GRAM adapter
Delegate
Transfer request
Client
Delegation
Delegate
GridFTP
User job
RFT File Transfer
FTP control
FTP data
Remote storage element(s)
GridFTP
Delegated credential can be Used to authenticate
with GridFTP
76
Submitting a Sample Job
  • Specify a remote host with F
  • globusrun-ws submit F host2 c /bin/true
  • The return code will be the jobs exit code if
    supported by the scheduler

77
Data Staging and Streaming
  • Simplest stage-in/stage-out example is
    stdout/stderr
  • globusrun-ws S s c /bin/date
  • -S is short for -submit
  • -s is short for streaming
  • The output will be sent back to the terminal,
    control will not return until the job is done

78
Resource Specification Language
  • For more complicated jobs, well use RSL to
    specify the job
  • ltjobgt
  • ltexecutablegt/bin/echolt/executablegt
  • ltargumentgtthis is an example_string lt/argumentgt
  • ltargumentgtGlobus was herelt/argumentgt
  • ltstdoutgtGLOBUS_USER_HOME/stdoutlt/stdoutgt
  • ltstderrgtGLOBUS_USER_HOME/stderrlt/stderrgt
  • lt/jobgt

79
Resource Specification Language
  • ltjobgt
  • ltexecutablegt/bin/echolt/executablegt
    ltdirectorygt/tmplt/directorygt ltargumentgt12lt/argument
    gt
  • ltenvironmentgtltnamegtPIlt/namegt ltvaluegt3.141lt/valuegtlt
    /environmentgt
  • ltstdingt/dev/nulllt/stdingt
  • ltstdoutgtstdoutlt/stdoutgt
  • ltstderrgtstderrlt/stderrgt
  • lt/jobgt

80
Resource Specification Language
  • ltjobgt
  • ltexecutablegt/bin/echolt/executablegt
    ltdirectorygt/tmplt/directorygt ltargumentgt12lt/argument
    gt
  • ltenvironmentgtltnamegtPIlt/namegt ltvaluegt3.141lt/valuegtlt
    /environmentgt
  • ltstdingt/dev/nulllt/stdingt
  • ltstdoutgtstdoutlt/stdoutgt
  • ltstderrgtstderrlt/stderrgt
  • lt/jobgt

81
Submitting Using XML
  • Create the file containing the RSL
  • You may validate the RSL ahead of time
  • globusrun-ws validate f rslfile.xml
  • If the file validates, submit using
    -submit

82
At Most Once Submission
  • You may specify a UUID with your job submission
  • If youre not sure the submission worked, you may
    submit the job again with the same UUID
  • If the job has already been submitted, the new
    submission will have no effect
  • If you do not specify a UUID, one will be
    generated for you

83
Staging Data
  • GRAMs RSL allows many fileStageIn/fileStageOut
    directives
  • The transfers will be executed by RFT
  • May specify additional RFT options using the
    RFTOptions tag
  • There is no GASS cache staging option anymore

84
Staging Data Stage In
  • GRAMs RSL allows many fileStageIn/fileStageOut
    directives
  • ltfileStageIngt lttransfergt
  • ltsourceUrlgt
  • gsiftp//job.submitting.host2811/bin/echolt/source
    Urlgt
  • ltdestinationUrlgtfile///GLOBUS_USER_HOME/my_ech
    olt/destinationUrlgt
  • lt/transfergt lt/fileStageIngt

85
Staging Data Stage Out
  • ltfileStageOutgt
  • lttransfergt
  • ltsourceUrlgtfile//GLOBUS_USER_HOME/stdout
  • lt/sourceUrlgt
  • ltdestinationUrlgtgsiftp//job.submitting.host2811/
    tmp/stdoutlt/destinationUrlgt
  • lt/transfergt
  • lt/fileStageOutgt

86
Staging Data - Cleanup
  • ltfileCleanUpgt
  • ltdeletiongt
  • ltfilegtfile//GLOBUS_USER_HOME/my_echolt/filegt
  • lt/deletiongt
  • lt/fileCleanUpgt

87
Staging Data - Credentials
  • The GridFTP servers youre using may require
    different credentials than the GRAM service
    youre submitting to
  • The RSL allows you to specify separate
    credentials for the executable and staging
    components of the job

88
RSL Substitutions
  • GRAM will perform some variable substitutions for
    you
  • GLOBUS_USER_HOME
  • GLOBUS_USER_NAME
  • GLOBUS_SCRATCH_DIR
  • GLOBUS_LOCATION
  • SCRATCH_DIR will be a compute-node local
    high-speed storage if defined, or
    GLOBUS_USER_HOME if not

89
Batch Submission
  • Your client does not have to stay attached to the
    execution of the job
  • -batch will disconnect from the job and output an
    EPR
  • You may redirect the EPR to a file with o
  • Use the EPR file with monitor or -status
  • You may also kill the job using -kill

90
Specifying Scheduler Options
  • RSL lets you specify various scheduler options
  • what queue to submit to
  • which project to select for accounting
  • max CPU and wallclock time to spend
  • min/max memory required
  • All defined online under the schema document for
    GRAM

91
Choosing User Accounts
  • You may be authorized to use more than one
    account at the remote site
  • By default, the first listed in the grid-mapfile
    will be used
  • You may request a specific user account using the
    ltlocalUserIdgt element

92
Multijobs
  • You may specify more than one ltjobgt element in a
    ltmultijobgt
  • At that point, you want to specify the
    ltfactoryEndpointgt in the RSL rather than the
    commandline
  • Will be used by MPICH-G to support MPI jobs

93
WS GRAM Performance
  • Time to submit a basic GRAM job
  • Pre-WS GRAM lt 1 second
  • WS GRAM 2 seconds
  • Concurrent jobs
  • Pre-WS GRAM 300 jobs
  • WS GRAM 32,000 jobs

94
CondorG
  • The Condor project has produced a helper
    front-end to GRAM
  • Managing sets of subtasks
  • Reliable front-end to GRAM to manage
    computational resources
  • Note this is not Condor which promotes
    high-throughput computing, and use of idle
    resources

95
Chimera Virtual Data
  • Captures both logical and physical steps in a
    data analysis process.
  • Transformations (logical)
  • Derivations (physical)
  • Builds a catalog.
  • Results can be used to replay analysis.
  • Generation of DAG (via Pegasus)
  • Execution on Grid
  • Catalog allows introspection of analysis process.

Sloan Survey Data
Galaxy cluster size distribution
96
Pegasus Workflow Transformation
  • Converts Abstract Workflow (AW) into Concrete
    Workflow (CW).
  • Uses Metadata to convert user request to logical
    data sources
  • Obtains AW from Chimera
  • Uses replication data to locate physical files
  • Delivers CW to DAGman
  • Executes using Condor
  • Publishes new replication and derivation data in
    RLS and Chimera (optional)

ChimeraVirtual DataCatalog
MetadataCatalog

t
DAGman
ReplicaLocationService
Condor
ComputeServer
StorageSystem
ComputeServer
StorageSystem
StorageSystem
ComputeServer
ComputeServer
97
(No Transcript)
98
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
99
GT4 Data Management
  • Stage/move large data to/from nodes
  • GridFTP, Reliable File Transfer (RFT)
  • Alone, and integrated with GRAM
  • Locate data of interest
  • Replica Location Service (RLS)
  • Replicate data for performance/reliability
  • Distributed Replication Service (DRS)
  • Provide access to diverse data sources
  • File systems, parallel file systems, hierarchical
    storage GridFTP
  • Databases OGSA DAI

100
GridFTP
  • A high-performance, secure, reliable data
    transfer protocol optimized for high-bandwidth
    wide-area networks
  • FTP with well-defined extensions
  • Uses basic Grid security (control and data
    channels)
  • Multiple data channels for parallel transfers
  • Partial file transfers
  • Third-party (direct server-to-server) transfers
  • Reusable data channels
  • Command pipelining
  • GGF recommendation GFD.20

101
GridFTP in GT4
Disk-to-disk onTeraGrid
  • 100 Globus code
  • No licensing issues
  • Stable, extensible
  • IPv6 Support
  • XIO for different transports
  • Striping ? multi-Gb/sec wide area transport
  • Pluggable
  • Front-end e.g., future WS control channel
  • Back-end e.g., HPSS, cluster file systems
  • Transfer e.g., UDP, NetBLT transport

102
Striped Server
  • Multiple nodes work together and act as a single
    GridFTP server
  • An underlying parallel file system allows all
    nodes to see the same file system and must
    deliver good performance (usually the limiting
    factor in transfer speed)
  • I.e., NFS does not cut it
  • Each node then moves (reads or writes) only the
    pieces of the file that it is responsible for.
  • This allows multiple levels of parallelism, CPU,
    bus, NIC, disk, etc.
  • Critical if you want to achieve better than 1 Gbs
    without breaking the bank

103
Striped GridFTP Service
  • A distributed GridFTP service that runs on a
    storage cluster
  • Every node of the cluster is used to transfer
    data into/out of the cluster
  • Head node coordinates transfers
  • Multiple NICs/internal busses lead to very high
    performance
  • Maximizes use of Gbit WANs

104
(No Transcript)
105
Typical Approach (without XIO)
Network Protocol
Network Protocol
Protocol API
Network Protocol
Application
POSIX IO
Proprietary API
106
Globus XIO Approach
Network Protocol
Network Protocol
Driver
Network Protocol
Globus XIO
Driver
Application
Driver
107
Drivers
  • Make 1 API do many types of IO
  • Specific drivers for specific protocols/devices
  • Transform
  • Manipulate or examine data
  • Do not move data outside of process space
  • Compression, Security, Logging
  • Transport
  • Moves data across a wire
  • TCP, UDP, File IO, Device IO
  • Typically move data outside of process space

108
Stack
Example Driver Stack
  • Transport
  • Exactly one per stack
  • Must be on the bottom
  • Transform
  • Zero or many per stack
  • Control flows from user to the top of the stack,
    to the transport driver.

Compression
Logging
TCP
109
Copying Files (in a nutshell)
  • globus-url-copy options srcURL dstURL
  • guc gsiftp//localhost/foo file///bar
  • Client/server, using FTP stream mode
  • guc vb dbg tcp-bs 1048576 p 8
    gsiftp//localhost/foo gsiftp//localhost/bar
  • 3rd party transfer, MODE E
  • guc https//host.domain.edu/foo
    ftp//host.domain.gov/bar
  • from secure http to ftp server

110
The Options Improving Performance
  • -p (parallelism or number of streams)
  • rule of thumb 4-8, start with 4
  • -tcp-bs (TCP buffer size)
  • use either ping or traceroute to determine the
    RTT between hosts
  • buffer size BW (Mbs) RTT (ms)
    1000/8/lt(parallelism value 1)gt
  • If that is still too complicated use 2MB
  • -vb if you want performance feedback
  • -dbg if you have trouble

111
Tuning GridFTP
  • Many ways you can tune the performance
  • Two sources of data are
  • http//www.globus.org/toolkit/docs/4.0/data/gridft
    p/rn01re01.html
  • http//www.nsf-middleware.org/OnTheGrid/
    2004-09-MaxGridFTP.pdf

112
Exercise Simple File Movement
  • grid-proxy-init
  • echo test gt /tmp/test
  • look at servers started for each
  • guc gsiftp//hostname/tmp/test file///tmp/test2
  • get (from server to client)
  • guc file///tmp/test2 gsiftp//hostname/tmp/test3
  • put (from client to server)
  • guc gsiftp//hostname1/tmp/test3
    gsiftp//hostname2/tmp/test4
  • Third party transfer (between two servers)
  • guc dcpriv gsiftp//localhost/dev/zero
    gsiftp//localhost/dev/null
  • transfer with encryption on data channel

113
Troubleshooting
  • Can I get connected?
  • telnet to the port telnet hostname port
  • 2811 is the default port
  • You should get something like this
  • 220 GridFTP Server gridftp.mcs.anl.gov 0.17
    (gcc32dbg, 1108765962-1) ready. Development
    Release
  • If not, you have firewall problems, or xinetd
    config problems. You are never even starting the
    server.

114
Troubleshooting
  • no proxy
  • grid-proxy-destroy
  • guc gsiftp//localhost/dev/zero file///dev/null
  • add dbg
  • grid-proxy-init
  • guc gsiftp//localhost/dev/zero file///dev/null
  • add dbg

115
Troubleshooting
  • Bad source file
  • grid-proxy-init
  • guc gsiftp//localhost2812/tmp/junk
    file///tmp/empty
  • junk does not exist
  • Note that an empty file named empty is created
  • We need to fix this in globus-url-copy, but for
    now it is there

116
RFT - File Transfer Queuing
  • A WSRF service for queuing file transfer requests
  • Server-to-server transfers
  • Checkpointing for restarts
  • Database back-end for failovers
  • Allows clients to requests transfers and then
    disappear
  • No need to manage the transfer
  • Status monitoring available if desired

117
Reliable File TransferThird Party Transfer
  • Fire-and-forget transfer
  • Web services interface
  • Many files directories
  • Integrated failure recovery
  • Has transferred 900K files

RFT Client
SOAP Messages
Notifications(Optional)
RFT Service
GridFTP Server
GridFTP Server
118
Replica Location Service
  • Identify location of files via logical to
    physical name map
  • Distributed indexing of names, fault tolerant
    update protocols
  • GT4 version scalable stable
  • Managing 40 million files across 10 sites

Index
Index
119
Reliable Wide Area Data Replication
LIGO Gravitational Wave Observatory
Birmingham
Replicating gt1 Terabyte/day to 8 sites gt30
million replicas so far MTBF 1 month
www.globus.org/solutions
120
OGSA-DAI
  • Grid Interfaces to Databases
  • Data access
  • Relational XML Databases, semi-structured files
  • Data integration
  • Multiple data delivery mechanisms, data
    translation
  • Extensible Efficient framework
  • Request documents contain multiple tasks
  • A task execution of an activity
  • Group work to enable efficient operation
  • Extensible set of activities
  • gt 30 predefined, framework for writing your own
  • Moves computation to data
  • Pipelined and streaming evaluation
  • Concurrent task evaluation

121
OGSA-DAI
  • Provide service-based access to structured data
    resources as part of Globus
  • Specify a selection of interfaces tailored to
    various styles of data accessstarting with
    relational and XML

122
The OGSA-DAI Framework
Application
Client Toolkit
OGSA-DAI service
Engine
SQLQuery
Activities
GZip
GridFTP
XPath
readFile
XSLT
JDBC
Data Resources
XMLDB
File
MySQL
DB2
XIndice
SWISS PROT
SQL Server
Data- bases
123
Extensibility Example
OGSA-DAI service
Engine
SQLQuery
SQLQuery
Multiple SQL GDS
JDBC
MySQL
124
OGSA-DAI A Framework for Building Applications
  • Supports data access, insert and update
  • Relational MySQL, Oracle, DB2, SQL Server,
    Postgres
  • XML Xindice, eXist
  • Files CSV, BinX, EMBL, OMIM, SWISSPROT,
  • Supports data delivery
  • SOAP over HTTP
  • FTP GridFTP
  • E-mail
  • Inter-service
  • Supports data transformation
  • XSLT
  • ZIP GZIP
  • Supports security
  • X.509 certificate based security

125
OGSA-DAI Other Features
  • A framework for building data clients
  • Client toolkit library for application developers
  • A framework for developing functionality
  • Extend existing activities, or implement your own
  • Mix and match activities to provide functionality
    you need
  • Highly extensible
  • Customise our out-of-the-box product
  • Provide your own services, client-side support,
    and data-related functionality

126
Data Replication Service (tech preview)
  • Pull missing files to local site

Site B
Site A
List of required Files
Reliable File TransferService
Data Replication Service
Data Replication Service
Reliable File Transfer Service
GridFTP
Local ReplicaCatalog
Replica LocationIndex
Local Replica Catalog
ReplicaLocationIndex
GridFTP
127
MCS - Metadata Catalog Service
  • A stand-alone metadata catalog service
  • WSRF service interface
  • Stores system-defined and user-defined attributes
    for logical files/objects
  • Supports manipulation and query
  • Integrated with OGSA-DAI
  • OGSA-DAI provides metadata storage
  • When run with OGSA-DAI, basic Grid authentication
    mechanisms are available

128
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
129
Monitoring and Discovery System(MDS4)
  • Grid-level monitoring system
  • Aid user/agent to identify host(s) on which to
    run an application
  • Warn on errors
  • Uses standard interfaces to provide publishing of
    data, discovery, and data access, including
    subscription/notification
  • WS-ResourceProperties, WS-BaseNotification,
    WS-ServiceGroup
  • Functions as an hourglass to provide a common
    interface to lower-level monitoring tools

130
Information Users Schedulers, Portals, Warning
Systems, etc.
WS standard interfaces for subscription,
registration, notification
Standard Schemas (GLUE schema, eg)
131
MDS4 Components
  • Information providers
  • Monitoring is a part of every WSRF service
  • Non-WS services are also be used
  • Higher level services
  • Index Service a way to aggregate data
  • Trigger Service a way to be notified of changes
  • Both built on common aggregator framework
  • Clients
  • WebMDS
  • All of the tool are schema-agnostic, but
    interoperability needs a well-understood common
    language

132
Information Providers
  • Data sources for the higher-level services
  • Some are built into services
  • Any WSRF-compliant service publishes some data
    automatically
  • WS-RF gives us standard Query/Subscribe/Notify
    interfaces
  • GT4 services ServiceMetaDataInfo element
    includes start time, version, and service type
    name
  • Most of them also publish additional useful
    information as resource properties

133
Information Providers (2)
  • Other sources of data
  • Any executables
  • Other (non-WS) services
  • Interface to another archive or data store
  • File scraping
  • Just need to produce a valid XML document

134
Information ProvidersGT4 Services
  • Reliable File Transfer Service (RFT)
  • Service status data, number of active transfers,
    transfer status, information about the resource
    running the service
  • Community Authorization Service (CAS)
  • Identifies the VO served by the service instance
  • Replica Location Service (RLS)
  • Note not a WS
  • Location of replicas on physical storage systems
    (based on user registrations) for later queries

135
Information ProvidersCluster and Queue Data
  • Interfaces to Hawkeye, Ganglia, CluMon, Nagios
  • Basic host data (name, ID), processor
    information, memory size, OS name and version,
    file system data, processor load data
  • Some condor/cluster specific data
  • This can also be done for sub-clusters, not just
    at the host level
  • Interfaces to PBS, Torque, LSF
  • Queue information, number of CPUs available and
    free, job count information, some memory
    statistics and host info for head node of cluster

136
Higher-Level Services
  • Index Service
  • Caching registry
  • Trigger Service
  • Warn on error conditions
  • Archive Service
  • Database store for history (in development)
  • All of these have common needs, and are built on
    a common framework

137
Common Aggregator Framework
  • Basic framework for higher-level functions
  • Subscribe to Information Provider(s)
  • Do some action
  • Present standard interfaces

138
Aggregator Framework Features
  • 1) Common configuration mechanism
  • Specify what data to get, and from where
  • 2) Self cleaning
  • Services have lifetimes that must be refreshed
  • 3) Soft consistency model
  • Published information is recent, but not
    guaranteed to be the absolute latest
  • 4) Schema Neutral
  • Valid XML document needed only

139
MDS4 Index Service
  • Index Service is both registry and cache
  • Datatype and data provider info, like a registry
    (UDDI)
  • Last value of data, like a cache
  • In memory default approach
  • DB backing store currently being developed to
    allow for very large indexes
  • Can be set up for a site or set of sites, a
    specific set of project data, or for
    user-specific data only
  • Can be a multi-rooted hierarchy
  • No global index

140
MDS4 Trigger Service
  • Subscribe to a set of resource properties
  • Evaluate that data against a set of
    pre-configured conditions (triggers)
  • When a condition matches, action occurs
  • Email is sent to pre-defined address
  • Website updated
  • Similar functionality in Hawkeye

141
WebMDS User Interface
  • Web-based interface to WSRF resource property
    information
  • User-friendly front-end to Index Service
  • Uses standard resource property requests to query
    resource property data
  • XSLT transforms to format and display them
  • Customized pages are simply done by using HTML
    form options and creating your own XSLT
    transforms
  • Sample page
  • http//mds.globus.org8080/webmds/webmds?infoinde
    xinfoxslservicegroupxsl

142
(No Transcript)
143
Working with TeraGrid
  • Large US project across 9 different sites
  • Different hardware, queuing systems and lower
    level monitoring packages
  • Starting to explore MetaScheduling approaches
  • GRMS (Poznan)
  • W. Smith (TACC)
  • K. Yashimoto (SDSC)
  • User Portal
  • Need a common source of data with a standard
    interface for basic scheduling info

144
Data Collected
  • Provide data at the subcluster level
  • Sys admin defines a subcluster, we query one node
    of it to dynamically retrieve relevant data
  • Can also list per-host details
  • Interfaces to Ganglia, Hawkeye, CluMon, and
    Nagios available now
  • Other cluster monitoring systems can write into a
    .html file that we then scrape
  • Also collect basic queuing data, some TeraGrid
    specific attributes

145
(No Transcript)
146
Scalability Experiments
  • MDS index
  • Dual
Write a Comment
User Comments (0)
About PowerShow.com