Grid Monitoring with Globus - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

Grid Monitoring with Globus

Description:

Distributed, often complex, performance-critical nature of Grids & apps ... Madison;Scott Gose and Charles Bacon, ANL; Steve Fisher, RAL; Brian Tierney and ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 53

Provided by: jennife62

Category:

more less

Transcript and Presenter's Notes

Title: Grid Monitoring with Globus

1
Grid Monitoring with Globus

Jennifer M. Schopf
Argonne National Lab
April 2003

2
How MDS can destroy your world Martin
Stoufer

Jennifer M. Schopf
Argonne National Lab
April 2003

3
Why Information Infrastructure?

Distributed, often complex, performance-critical
nature of Grids apps demands tools for
Discovering available resources
Discovering available sensors
Integrating information from multiple sources
Archiving and replaying historical information
These and other functions are provided by an
information infrastructure
Many projects are concerned with design,
deployment, evaluation, and application

4
Topics

Evaluation of information infrastructures
Globus Toolkit MDS, RGMA, Hawkeye
Throughput, response time, load
Insights into performance issues
What monitoring and discovery could be
Next-generation information architecture
Open Grid Services Architecture mechanisms
Integrated monitoring discovery arch for GT3

5
Globus Monitoring and DiscoveryService (MDS2)

Part of Globus Toolkit, compatible with other
elements
Used most often for resource selection
aid user/agent to identify host(s) on which to
run an application
Standard mechanism for publishing and discovery
Decentralized, hierarchical structure
Soft-state protocols
Caching
Grid Security Infrastructure credentials

6
MDS2 Architecture
7
Relational Grid Monitoring Architecture (R-GMA)

Implementation of the Grid Monitoring
Architecture (GMA) defined within the Global Grid
Forum (GGF)
Three components
Consumers
Producers
Registry
GMA as defined currently does not specify the
protocols or the underlying data model to be
used.

8
GGF Grid Monitoring Architecture
9
R-GMA

Monitoring used in the EU Datagrid Project
Steve Fisher, RAL, and James Magowan, IBM-UK
Based on the relational data model
Used Java Servlet technologies
Focus on notification of events
User can subscribe to a flow of data with
specific properties directly from a data source

10
R-GMA Architecture
11
Hawkeye

Developed by Condor Group
Focus automatic problem detection
Underlying infrastructure builds on the Condor
and ClassAd technologies
Condor ClassAd Language to identify resources in
a pool
ClassAd Matchmaking to execute jobs based on
attribute values of resources to identify
problems in a pool

12
Hawkeye Architecture
13
Comparing Information Systems

14
Some Architecture Considerations

Similar functional components
Grid-wide for MDS, R-GMA Pool for Hawkeye
Global schema
Different use cases will lead to different
strengths
GIIS for decentralized registry no standard
protocol to distribute multiple R-GMA registries
R-GMA meant for streaming data currently used
for NW data Hawkeye and MDS for single queries
Push vs Pull
MDS is PULL only
R-GMA allows push and pull
Hawkeye allows triggers push model

15
Experiments

How many users can query an information server at
a time?
How many users can query a directory server?
How does an information server scale with the
amount of data in it?
How does an aggregator scale with the number of
information servers registered to it?

16
Testbed

Lucky cluster at Argonne
7 nodes, each has two 1133 MHz Intel PIII CPUs
(with a 512 KB cache) and 512 MB main memory
Users simulated at the UC nodes
20 P3 Linux nodes, mostly 1.1 GHz
R-GMA has an issue with the shared file system,
so we also simulated users on Lucky nodes
All figures are 10 minute averages
Queries happening with a one second wait between
each query (think synchronous send with a 1
second wait)

17
Metrics

Throughput
Number of requests processed per second
Response time
Average amount of time (in sec) to handle a
request
Load
percentage of CPU cycles spent in user mode and
system mode, recorded by Ganglia
High when running small number compute intensive
aps
Load1
average number of processes in the ready queue
waiting to run, 1 minute average, from Ganglia
High when large number of aps blocking on I/O

18
Performance of GIS Information Servers vs. Number
of Users
19
Performance of GIS Information Servers vs. Number
of Users
20
Performance of GIS Information Servers vs. Number
of Users
21
Performance of GIS Information Servers vs. Number
of Users
22
Experiment 1 Summary

Caching can significantly improve performance of
the information server
Particularly desirable if one wishes the server
to scale well with an increasing number of users
When setting up an information server, care
should be taken to make sure the server is on a
well-connected machine
Network behavior plays a larger role than
expected
If this is not an option, thought should be given
to duplicating the server if more than 200 users
are expected to query it

23
Directory Server Scalability
24
Directory Server Scalability
25
Directory Server Scalability
26
Directory Server Scalability
27
Experiment 2 Summary

Because of the network contention issues, the
placement of a directory server on a highly
connected machine will play a large role in the
scalability as the number of users grows
Significant loads are seen even with only a few
users, it will be important that this service be
run on a dedicated machine, or that it be
duplicated as the number of users grows.

28
Overall Results

Performance can be a matter of deployment
Effect of background load
Effect of network bandwidth
Performance can be affected by underlying
infrastructure
LDAP/Java strengths and weaknesses
Performance can be improved using standard
techniques
Caching multi-threading etc.

29
So what could monitoring be?

Basic functionality
Push and pull (subscription and notification)
Aggregation and Caching
More information available
More higher-level services
Triggers like Hawkeye
Viz of archive data like Ganglia
Plug and Play
Well defined protocols, interfaces and schemas

30
Topics

Evaluation of information infrastructures
Globus Toolkit MDS, RGMA, Hawkeye
Throughput, response time, load
Insights into performance issues
What monitoring and discovery could be
Next-generation information architecture
Open Grid Services Architecture mechanisms
Integrated monitoring discovery arch for GT3

31
Open Grid Services Architecture (OGSA)

Defines standard interfaces and behaviors for
distributed system integration, especially
Standard XML-based service information model
Standard interfaces for push and pull mode access
to service data
Notification and subscription

32
Service Data

Every service has its own service data
OGSA has common mechanism to expose a service
instances state data to service requestors for
query, update and change notification
Monitoring data is baked right in

33
ExampleReliable File Transfer Service
File Transfer
Internal State
Data transfer operations
34
MDS-3 Monitoring and Discovery System

Consists of a various components
Core functionality
Information providers
Higher level services
Clients

35
Core Functionality

Xpath support
XPath is a language that describes a way to
locate and process items in XML docs by using an
addressing syntax based on a path through the
document's logical structure or hierarchy
Xindice support native XML database
Registry support

36
Schema Issues

Need to keep track of service data schema
Avoid conflicts
Find the data easier
Should really have unified naming approach
Thats why Ben and I are here today )

37
MDS-3 Information Providers in June Release

All the data currently in core MDS-2
Full data in the GLUE schema for compute elements
(CE)
Ganglia information provider for cluster data
will also be available from Ganglia folks (with
luck)
Service data from RFT, RLS, GRAM
GT2 to GT3 work
GridFTP server data
Software version and path data
Documentation for translating your GT2
information provider to a GT3 information provider

38
MDS-3 Higher Level Products

Higher-level services can perform actions on
service data collected from other services
Part of this functionality can be provided by a
set of building blocks provided
Provider interface GRIS-style API for writing
information providers
Service Data Aggregator set up subscriptions to
data for other services, and publish it as a
single data stream
Hierarchy Builder allow for hierarchy of
aggregators

39
MDS-3 Index Server

Simplest higher-level service is the caching
index service
Much like the GIIS in MDS 2.x
Will have configurablity like an GIIS hierarchy
Will also have PHP-style scripts, much as
available today

40
(No Transcript)
41
Client findServiceData command

Can be used in three modes
To return a list of available services from a
Registry configured by your administrator.
To return all service data elements for an
individual service instance.
To return a specific service data element for an
individual service instance.

42
Clients

C bindings and the findServiceData command line
client were released as part of GT3 alpha-2
Core C bindings provide findServiceData C
function
findServiceData command line client gives an
example of using it to parse out information (in
this case, registry contents)

43
Service Data Browser

GUI client to display service data from any
service
Extensible for data-specific visualization
A version was released with GT3 alpha
http//www.globus.org/ogsa/releases/alpha/docs/inf
osvcs/sdbquickstart.html

44
We Need More Basic Information

Interfaces to other sources of data
GPT data
Other monitoring systems
Others?
Service data from other components
Every service has service data
OGSA-DAI
Will need to interface on schema

45
We Will Need More GUIs and Clients

Additional GUI visualizers may be implemented to
display service data specific to a particular
port type
Additional Client interfaces possibly?

46
We Need MoreHigher Level Services

We have a couple planned
Archiving service
Trigger template

47
Post-3.0 release Archiving Service

Will allow subscription to service data
Logging in a flexible way
Well defined interfaces for mining
Currently investigating other implementations to
leverage off of eg. NetLogger logging, Ganglia
RRDB

48
Post-3.0 release Trigger Template

Will provide a template to allow subscription to
data, reasoning about that data, and a course of
action to take place
Essentially, a gateway service between OGSA
Notifications and some other notification
framework, with filtering of notifications
Example Subscribe to disk space information,
send mail to sys admin when it reached 90 full
Needed trigger template and several small
examples of common triggers, and documentation
for how users could extend them or write new
ones.

49
Other Possible HigherLevel Services

Site Validation Service
Job Tracking Service
Interfacing to Netlogger?

50
Summary

Current monitoring systems
Insights into performance issues
What we really want for monitoring and discovery
is a combination of all the current systems
Next-generation information architecture
Open Grid Services Architecture mechanisms
MDS3 plans
Additional work needed!

51
Thanks

Testbed/Experiment support and comments
John Mcgee, ISI James Magowan, IBM-UK Alain Roy
and Nick LeRoy at University of Wisconsin,
MadisonScott Gose and Charles Bacon, ANL Steve
Fisher, RAL Brian Tierney and Dan Gunter, LBNL.
MDS3 planning is by Ben Clifford, Karl
Czajkowski, and others
This work was supported in part by the
Mathematical, Information, and Computational
Sciences Division subprogram of the Office of
Advanced Scientific Computing Research, U.S.
Department of Energy, under contract
W-31-109-Eng-38. This work also supported by
DOESG SciDAC Grant, iVDGL from NSF, and others.

52
Additional Information

Contact
Jennifer Schopf (jms_at_mcs.anl.gov)
Zhang, Freschl and Schopf, A Performance Study
of Monitoring and Information Services for
Distributed Systems, to appear in HPDC 2003
In the meanwhile just ask me
MDS-3 information
Soon at www.globus.org/mds

Write a Comment

User Comments (0)