Title: Status Report
1OGSA/GT3 evaluation
- Status Report
- D. Foster et al.
2Table of Content
- Introduction
- GT3 ToolKit Experience
- GT3 Performance studies
- Integration of Existing Codes/Services
- Summary, Conclusions, and Outlook
3OGSA/GT3 activityIntroduction
4GT3 Activities
- Motivation
- The promise of the web services framework.
- Rigorous structure for interface specification
and communication semantics. - Basic framework becoming widely used.
- Much activity in defining required interfaces for
the Grid (OGSA) - First release of the new Globus toolkit in May
2003. - OGSI framework and some grid services
- GT3 out July the 1st
- To provide input to the EGEE middleware activity.
- Input to the strategic planning and architecture
activity - Initial objectives
- Three primary objectives
- Understand the GT3 offering and its quality.
- Test the available hosting environments.
- Learn how to create new services in this
framework. - Study how to leverage existing developments
(AliEn) in an OGSA context.
5OGSA Engineering Group
- Proposed to the LCG referees (May 2003)
- Started in June 2003
- Massimo Lamanna Overall Coordination (CERN)
- Ricardo Brito Da Rocha Test Service Development
(EDG) - Alexander Kryukov Test Service Development
(MSU) - Andrey Demichev Testbed Setup (MSU)
- Volodia Kalyaev Test Service Development
(MSU/CERN Summer Student) - Viktor Pose Performance and Testing (JINR,
Dubna) - Claude Wang AliEn (Academica Sinica, Taipei)
- Most people at CERN only for short periods.
- What will be presented represents what we have
been able to do in the short period since
starting.
6Overall Approach
- Create a simple testbed with the GT3 toolkit.
- First release was June 30th
- Create some new simple services.
- learn by doing
- Demonstrate the results and measure performance.
- Start to work with the AliEn components to
understand them. - Can we envisage one framework and competitive
services? - Report on the activity after a few months
(mid-Sept was the target). - Plan the next 6 months.
7OGSA/OGSI Overview
- After the first generation of Grid toolkits and
middleware, emphasis on - Agree on standards
- Build an open system
- Open Grid Service Architecture (OGSA)
- Component model and high-level design
- High-level services
- Open Grid Service Infrastructure (OGSI)
- Conventions
- Detailed implementation issues
- Actual implementation Globus Toolkit 3
- Other implementation could coexist
- The whole idea behind OGSA/OGSI does require
heterogeneity - Standard components from the Web world
- SOAP (Simple Object Access Protocol) to convey
messages (XML payloads) - WSDL (Web Service Description Language) to
describe interface - They are hosted in specific environments
- Standalone container, TomCat, IBM Websphere
- .NET
8What does GT3 offer? (now)
- The first OGSI implementation (July 2003)
- The toolkit itself
- Build new services and extend existing ones
- Security Infrastructure
- GSI
- Services
- GRAM (GT2 implementation wrapped up as a Grid
service) - IS (new GT3 implementation)
- RFT (Globus FTP)
- RLS (GT2 implementation as a Grid service)
9TestBeds
- First hand experience on Globus Toolkit 3
- This can be achieved only by using it!
- Prototypes, with the following common features
- Small
- Working (with limited functionality)
- No architectural ambition
- Engineering approach
- Mapping of functionality prototype functions
10TestBeds
- GT3 TestBed
- 4 CERN machines 1 in Moscow
- Focus on GT3 basic functionality and performances
- AliEn TestBed
- 3 CERN TestBed machines
- Future ARDA TestBed
- Focus on the complexity of future possible
architectures
11GT3 TestBed
12GT3 TestBed
- Simple system to distribute jobs and retrieve
output - No security (for most services)
- The user asks the Resource Broker (RB) to select
the best Computing Element (CE) - The user submits the job to the CE
- The Information and the Logging Bookkeeping
services exchange information mainly with the RB - Why did we do it this way?
- Simple scheme
- As already mentioned no architectural ambitions
- Learn by doing!
- What did we learn out of it?
- See next slides
13GT3 TestBed
- Resource broker and LB (Custom service)
- Surprisingly fast to set-up
- A few computing elements (GT3-GRAM, with
modifications) - 2 PC boxes in the CERN Computing Centre
- In a second phase, one PC located in Moscow was
added - Some problems (solved) in data stage-in/stage-out
- See GRAM comments in the performance part
- Information service (GT3-IS)
- Native GT3 service
- In this TestBed talks only with other services
14GT3 TestBed coverage
pull data access
Every service must implement this PortType
push data access
15First summary
- GT3 is the first OGSI 1.0 implementation
- Main focus of all activity so far
- GT3 (ToolKit doc) is in a status that allow a
quick start - Not everything is perfect, but GT3 is more mature
than expected - Development experience and quantitative
measurements and in the next section of the
presentation - GT3 provides a few OGSA services by now
- GRAM and RLS (GT2)
- IS (Information Service)
- RFT (Reliable File Transfer GridFTP based)
- GT3 encourages to create custom services
- The OGSI system provides the building blocks to
provide a variety of services
16GT3 ToolKit Experience
17Grid Service Development
- Grid Services
- Extended Web Services complying to the OGSI
specification - Core Architecture
HOSTING ENVIRONMENT
GRID CONTAINER
GRID SERVICES
COMPLEMENTARY
OGSI IMPL.
WEB SERVICES ENGINE
18Grid Service Development
- What we get
- From Web Services
- Interoperability
- standard for message creation and definition -
XML - standard for protocol-independent message passing
SOAP - standard for service definition WSDL
- result choice on hosting environment is left to
the service provider - Service Oriented Design approach
- From OGSI
- Stateful Services (Service Data)
- Other common features on independent services
- Different from GT2 where nothing is common
between services apart from GSI - Straightforward development common framework for
service usage and management
19Grid Service Development
- What we get
- From the Globus Toolkit 3
- Security Infrastructure
- Authentication, authorization, delegation,
message integrity and encryption - Higher-Level Services
- Information Services Index Service
- Data Management RLS and RFT
- Master Managed Job Factory GT3 interface for
GRAM - In summary
- Interoperable and environment independent
services
20Grid Service Development
- Current options
- Hosting Environments
- J2EE Application Server Jakarta Tomcat, GT3
Standalone Container, Websphere, - Microsoft .NET Platform
- OGSI implementations
- J2EE Servers Globus Toolkit 3
- Microsoft .NET OGSI.NET (Virginia Univ.)
MS.NETGrid (EPCC) - Others are appearing
- Any environment with an existing implementation
of a Web Services engine is one single step away
from providing Grid Services - Ex OGSILite (Perl), pyGridWare (Python)
21Designing Grid Services
- Important concepts when designing Grid Services
- Factories and Instances
1
FACTORY
CLIENT
INSTANCE
2
- Factories create instances and respond to
instance creation requests by clients - Instances respond to clients service specific
interaction requests - Advantages
- Workload balancing between pools of instances
- User dependent instances
- Disadvantages
- Instance creation overhead
22Designing Grid Services
- Approach
- Service Data, Subscriptions and Notifications
GRID SERVICE A
GRID SERVICE B
1 - SUBSCRIPTION
SDE A1
SDE A2
SDE B1
2,.. - NOTIFICATIONS
- Each Grid Service has its own Service Data Set -
collection of Service Data Elements (SDEs) - Every SDE has a set of associated values
concerning its validity in time goodFrom,
goodUntil, availableUntil - A service or client may declare interest in a SDE
by issuing a Subscription - Service Data flows by means of Notifications
normally when a change occurs or the value
lifetime has expired
23Writing Grid Services in GT3
- You need
- A service interface GWSDL (WSDL extended)
- manually written or generated from existing Java
code - The service implementation
- directly extending a basic Grid Service or using
Operation Providers (delegation) in Java - A deployment descriptor
- defined using WSDD (Web Service Deployment
Descriptor) - A build file
- For use by the Jakarta Ant build tool
- RESULT A JAR file to use for deployment (GAR)
24Using Grid Services
HOSTING ENVIRONMENT
GRID CONTAINER
SERVICE IMPLEMENTATION
2
STUBS
WSDL DESCRIPTION
1
2
CLIENT
2
STUBS
APPLICATION
25Work Summary
- Performance Prototypes
- Dummy Service
- Dummy Secure Service
- Dummy Service with Service Data
- Dummy Service with Notifications
- Dummy Service Index Service
- Index Listener
- Higher Level Prototyping
- File Catalog Service
- Metadata Catalog Service
- Storage Element Service
- Workload Management Service
- Computing Element
- Authentication and Authorization
26Globus Toolkit 3 Overview
- The Globus Toolkit 3 is a complete implementation
of the OGSI specification - The development process is much easier when
compared with previous versions of the toolkit - Some additional components to what is in OGSI
proved essential to achieve this - Security Infrastructure
- GSI3 is an easy to use security provider,
abstracting the developer from the major issues
it deals with - Deployment Tools
- By using Ant and providing sample build files for
service deployment, the developer can focus most
of his time on the implementation of the service
features - Backward compatibility
- All GT2 components are shiped with the GT3 full
bundle - Some services remain usable those where only an
OGSI-compliant interface was provided (e.g. GRAM) - Others are completely independent implementations
(eg. MDS2 and MDS3) - A large user community is being built
27Globus Toolkit 3 Overview
- Steep learning curve - it represents a new
approach to service design and implementation
(many small details that take time) - Incomplete documentation this is a real problem
being faced by developers at this time - Several bugs found in these exercises
- Core implementation related - due to framework
short lifetime - From tools deployed with the framework hard to
solve (e.g. Axis) - From the outside easy to solve (e.g. Tomcat)
- Resource Management services still based on GRAM
with an OGSI-compliant but complex architecture
behind - Good resources for documentation and good
interaction for problem solving - OGSI 1.0 Specification
- GT3 Tutorial http//www.casa-sotomayor.net/gt3-tut
orial/ - Globus Discuss discuss_at_globus.org
- Globus Bugzilla
28GT3 Performance measurements
29Overview
- Goal
- explore GT3 under heavy load/concurrency
- maximal throughput/rate of GT3 services
- see the limiting factors
- GT3 grid services measured
- GRAM
- DummyService
- IndexService
30GT3 GRAM performance
- Setup
- GRAM in GT3 standalone container
- managed-job-globusrun clients started
simultaneously on up to 32 client nodes (lxplus)
in non-batch mode used to submit jobs to GT3 GRAM - GRAM hardware 2 Intel Pentium III 600MHz
processors, 256MB RAM - Note 1 managed-job-globusrun client is capable
to submit 1 job
31GT3 GRAM performance
- Results service node
- Saturation throughput for job submission on the
service node 3.8 jobs/minute with an average CPU
usersystem usage of 62 -
- Comments
- scalability issue for heavily used servers
32GT3 GRAM performance
- Results client node
- using a 2 Intel Pentium III 600MHz processors,
256MB RAM client node, a managed-job-globusrun
client consumes at average 16 seconds CPU
usersystem time (on both CPUs) for the of 1
job - Comment
- lightweight clients (e.g. written in "C") needed
33DummyService performance
- Setup (1)
- each DummyService client executes the following
steps - calls DummyServiceFactory to create a
DummyService instance - executes 2 simple methods (echo and getTime) on
the DummyService instance - calls DummyService instance to destroy itself
- up to 1000 clients talking to the DummyService
were run simultaneously on up to 45 client nodes
(lxplus) - with and without authentication via GSI message
level security used according to guides and
tutorials at www.globus.org - grid service node hardware 2 Intel Pentium III
600MHz processors, 256MB RAM - Setup (2)
- same as Setup (1), but step 2. consists of 100
cycles, each of them calling the 2 simple methods
(echo and getTime) on the DummyService instance
34DummyService performance
- Preliminary Results
- security overhead needs further investigation
- cross check our implementation/setup with Globus
team foreseen
35DummyService performance
- Conclusions
- security overhead needs further investigation
- more tests on more powerful machines
- container comparison depending on the setup the
Tomcat container may be a bit slower or up to
50 faster, compared to the standalone container - Notes
- in the results table above the top saturation
rates are given - with varying number of clients throughput goes
down by up to 30 and the average CPU us usage
varies accordingly
36(No Transcript)
37- the first time the client contacts the
DummyServiceFactory, creates a DummyService
instance, and calls the first method, it takes
about 10s to accomplish it - the following times these actions take about 1s
38IndexService performance
- Setup (1) IndexService acting as a notification
source (pushing data) - multiple notification sinks subscribe to the
IndexService "Host" Service Data Element (SDE),
and are notified about each update of "Host" SDE,
happening at a fixed rate - no security
- grid service node hardware 2 Intel Pentium III
600MHz processors, 256MB RAM - Setup (2) IndexService responding to
findServiceData requests (pulling data) - multiple ogsi-find-service-data clients are run
sequentially and in parallel asking for
IndexService "Host" Service Data Element - no security
- grid service node hardware 2 Intel Pentium III
600MHz processors, 256MB RAM
39IndexService performance
40IndexService performance
- Comments
- saturation throughput with findServiceData is
about 13-20 times higher than with the
notification approach - Setup (1) measurement using Tomcat failed due to
a bug concerning threads, is fixed, fix announced
to appear in next (4.1.28) Tomcat version - preliminary measurement with a faster service
node (2 Intel(R) Xeon(TM) 2.40GHz processors,
1GB RAM) - saturation throughput for setup (1) was about 32
notifications/s for 800 listeners compared to 10
notifications/s with 400 listeners on the 2
600MHz machines not quite 4 times faster
41(No Transcript)
42(No Transcript)
43Reliable File Transfer Service
- To complement their GT3 testbed activity, Andrey
Demichev and Alexander Kryukov from SIMP MSU
Moscow are doing RFT tests - reliability means that problems like e.g.
- dropped connections,
- machine reboots,
- temporary network outages, etc
- should be handled automatically (usually via
retry) until they either resume or meet some
"ultimate failure" condition - preliminary tests (up to 3 clients, each
transferred 100 Mb data) show that the service
works perfectly, but further and more
comprehensive tests are needed - Note data on requests, transfers etc. are
recorded in PostgreSQL DB
44Current and next actions
- performance measurements
- further investigation of security overhead
- IndexService acting as a notification sink
- secure IndexService
- continue RFT tests
- redo the measurements on faster hardware (2
Intel(R) Xeon(TM) 2.40GHz processors, 1GB RAM)
and possibly nodes with more then 2 CPUs
45Integration of Existing Codes/Services
46Why Integration?
- GRID mainly concerns about the interoperability
among heterogeneous grid components - Heterogeneous Grid environments
- AliEn (Alice Environment)
- Web service oriented
- Heterogeneous Grid technologies
- Globus Toolkit 3
- OGSI .NET, MS .NETGrid (.NET environment)
- Unicore, others
47AliEn (Alice Environment) Highlights
- AliEn framework is a lightweight, but
functionally equivalent, alternative to full GRID
based on SOAP standard of Web Services. - Authentication module which supports various
authentication methods (Globus/GSI) - Distributed file catalogue built on top of RDBMS
with user interface that provides file system
functionality - Secure file transport and replication Service
- Task queue which holds commands to be executed in
the system and Resource Broker - Computing and Storage elements
- Metadata catalogue
- Configuration and Information Service
- Monitoring framework
-
48- Installation of AliEn testbed
- tbed0132 (AliEn Core Service)
- tbed0134 (CE, one WN, PBS, ClusterMonitor)
- tbed0135 (SE, file///, FTD)
- Small Tests
- Job submission
- File catalog
- Continue to use the TestBed in future
49 .NET
- The service layer constructed on .NET Web
services platform - SOAP handling
- Security
- WSDL self service description
- Application wrapping-up technologies
- .NET The Common Language Runtime (CLR)
- Java Java Native Interface (JNI)
50(No Transcript)
51System Design Investigation
- Investigation to service structure of the system
- Combining heterogeneous implementation of systems
- Understand engineering design options
- Eg. User Proxy Service
User Proxy Service
Authentication Service
Factory
Information Index
UI
Grid Service using 1 Technology
Grid Service using 2 Technology
1 Grid Environment
52Conclusions and Outlook
53Conclusions
- GT3 is the natural partner for new middleware
initiatives - Encouraging results so far
- Requires experience of large scale deployment
- OGSI seems to be already the lingua franca in
this field - OGSA should provide attractive high level
services - HEP-specific services will be missing
- A convincing backbone of services should
materialise (also with HEP contribution) - OGSA/OGSI concept validate when serious
challengers will be deployed on large scale
54Evaluation
- Effort to try to distill this experience
- Verify how much of the toolkits actually
experienced - Effort to have this procedure reproducible
- Other ToolKits in the near future?
- Preliminary tests ? Test/Performance suites
- Contacts with embryonic EGEE teams
- F. Hemmer, B. Jones, E. Laure
- Use GT2 experience (weak and strong points) from
EDG, LCG, etc in the GT3 evaluation - Deployment experience (weak and strong points)
from EDG testbeds and LCG
55Outlook
- Much progress has been made in a short time
just a few months - Generally impressed with GT3 and the overall
concept - Some major issues around the performance of the
hosting environments and the factories - Continue closely working with Globus
- Continue to validate this approach and prototype
interfaces and services in a GT3 context - Continue closely working with EGEE
- SC2 RTAG-11 (ARDA) has recently taken an approach
to identify the infrastructure needs through grid
service decomposition
56Special thanks
- TestBed support
- B. Panzer, T. Smith, T. Kleiworth
- Windows Servers support
- A. Lossent
- Globus community
- Many many
- Special thank to B. Sotomayor
- And many others!