Grid Monitoring with Globus - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Grid Monitoring with Globus

Description:

Distributed, often complex, performance-critical nature of Grids & apps ... Madison;Scott Gose and Charles Bacon, ANL; Steve Fisher, RAL; Brian Tierney and ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 53
Provided by: jennife62
Category:

less

Transcript and Presenter's Notes

Title: Grid Monitoring with Globus


1
Grid Monitoring with Globus
  • Jennifer M. Schopf
  • Argonne National Lab
  • April 2003

2
How MDS can destroy your world Martin
Stoufer
  • Jennifer M. Schopf
  • Argonne National Lab
  • April 2003

3
Why Information Infrastructure?
  • Distributed, often complex, performance-critical
    nature of Grids apps demands tools for
  • Discovering available resources
  • Discovering available sensors
  • Integrating information from multiple sources
  • Archiving and replaying historical information
  • These and other functions are provided by an
    information infrastructure
  • Many projects are concerned with design,
    deployment, evaluation, and application

4
Topics
  • Evaluation of information infrastructures
  • Globus Toolkit MDS, RGMA, Hawkeye
  • Throughput, response time, load
  • Insights into performance issues
  • What monitoring and discovery could be
  • Next-generation information architecture
  • Open Grid Services Architecture mechanisms
  • Integrated monitoring discovery arch for GT3

5
Globus Monitoring and DiscoveryService (MDS2)
  • Part of Globus Toolkit, compatible with other
    elements
  • Used most often for resource selection
  • aid user/agent to identify host(s) on which to
    run an application
  • Standard mechanism for publishing and discovery
  • Decentralized, hierarchical structure
  • Soft-state protocols
  • Caching
  • Grid Security Infrastructure credentials

6
MDS2 Architecture
7
Relational Grid Monitoring Architecture (R-GMA)
  • Implementation of the Grid Monitoring
    Architecture (GMA) defined within the Global Grid
    Forum (GGF)
  • Three components
  • Consumers
  • Producers
  • Registry
  • GMA as defined currently does not specify the
    protocols or the underlying data model to be
    used.

8
GGF Grid Monitoring Architecture
9
R-GMA
  • Monitoring used in the EU Datagrid Project
  • Steve Fisher, RAL, and James Magowan, IBM-UK
  • Based on the relational data model
  • Used Java Servlet technologies
  • Focus on notification of events
  • User can subscribe to a flow of data with
    specific properties directly from a data source

10
R-GMA Architecture
11
Hawkeye
  • Developed by Condor Group
  • Focus automatic problem detection
  • Underlying infrastructure builds on the Condor
    and ClassAd technologies
  • Condor ClassAd Language to identify resources in
    a pool
  • ClassAd Matchmaking to execute jobs based on
    attribute values of resources to identify
    problems in a pool

12
Hawkeye Architecture
13
Comparing Information Systems
 
 
14
Some Architecture Considerations
  • Similar functional components
  • Grid-wide for MDS, R-GMA Pool for Hawkeye
  • Global schema
  • Different use cases will lead to different
    strengths
  • GIIS for decentralized registry no standard
    protocol to distribute multiple R-GMA registries
  • R-GMA meant for streaming data currently used
    for NW data Hawkeye and MDS for single queries
  • Push vs Pull
  • MDS is PULL only
  • R-GMA allows push and pull
  • Hawkeye allows triggers push model

15
Experiments
  • How many users can query an information server at
    a time?
  • How many users can query a directory server?
  • How does an information server scale with the
    amount of data in it?
  • How does an aggregator scale with the number of
    information servers registered to it?

16
Testbed
  • Lucky cluster at Argonne
  • 7 nodes, each has two 1133 MHz Intel PIII CPUs
    (with a 512 KB cache) and 512 MB main memory
  • Users simulated at the UC nodes
  • 20 P3 Linux nodes, mostly 1.1 GHz
  • R-GMA has an issue with the shared file system,
    so we also simulated users on Lucky nodes
  • All figures are 10 minute averages
  • Queries happening with a one second wait between
    each query (think synchronous send with a 1
    second wait)

17
Metrics
  • Throughput
  • Number of requests processed per second
  • Response time
  • Average amount of time (in sec) to handle a
    request
  • Load
  • percentage of CPU cycles spent in user mode and
    system mode, recorded by Ganglia
  • High when running small number compute intensive
    aps
  • Load1
  • average number of processes in the ready queue
    waiting to run, 1 minute average, from Ganglia
  • High when large number of aps blocking on I/O

18
Performance of GIS Information Servers vs. Number
of Users
19
Performance of GIS Information Servers vs. Number
of Users
20
Performance of GIS Information Servers vs. Number
of Users
21
Performance of GIS Information Servers vs. Number
of Users
22
Experiment 1 Summary
  • Caching can significantly improve performance of
    the information server
  • Particularly desirable if one wishes the server
    to scale well with an increasing number of users
  • When setting up an information server, care
    should be taken to make sure the server is on a
    well-connected machine
  • Network behavior plays a larger role than
    expected
  • If this is not an option, thought should be given
    to duplicating the server if more than 200 users
    are expected to query it

23
Directory Server Scalability
24
Directory Server Scalability
25
Directory Server Scalability
26
Directory Server Scalability
27
Experiment 2 Summary
  • Because of the network contention issues, the
    placement of a directory server on a highly
    connected machine will play a large role in the
    scalability as the number of users grows
  • Significant loads are seen even with only a few
    users, it will be important that this service be
    run on a dedicated machine, or that it be
    duplicated as the number of users grows.

28
Overall Results
  • Performance can be a matter of deployment
  • Effect of background load
  • Effect of network bandwidth
  • Performance can be affected by underlying
    infrastructure
  • LDAP/Java strengths and weaknesses
  • Performance can be improved using standard
    techniques
  • Caching multi-threading etc.

29
So what could monitoring be?
  • Basic functionality
  • Push and pull (subscription and notification)
  • Aggregation and Caching
  • More information available
  • More higher-level services
  • Triggers like Hawkeye
  • Viz of archive data like Ganglia
  • Plug and Play
  • Well defined protocols, interfaces and schemas

30
Topics
  • Evaluation of information infrastructures
  • Globus Toolkit MDS, RGMA, Hawkeye
  • Throughput, response time, load
  • Insights into performance issues
  • What monitoring and discovery could be
  • Next-generation information architecture
  • Open Grid Services Architecture mechanisms
  • Integrated monitoring discovery arch for GT3

31
Open Grid Services Architecture (OGSA)
  • Defines standard interfaces and behaviors for
    distributed system integration, especially
  • Standard XML-based service information model
  • Standard interfaces for push and pull mode access
    to service data
  • Notification and subscription

32
Service Data
  • Every service has its own service data
  • OGSA has common mechanism to expose a service
    instances state data to service requestors for
    query, update and change notification
  • Monitoring data is baked right in

33
ExampleReliable File Transfer Service
File Transfer
Internal State
Data transfer operations
34
MDS-3 Monitoring and Discovery System
  • Consists of a various components
  • Core functionality
  • Information providers
  • Higher level services
  • Clients

35
Core Functionality
  • Xpath support
  • XPath is a language that describes a way to
    locate and process items in XML docs by using an
    addressing syntax based on a path through the
    document's logical structure or hierarchy
  • Xindice support native XML database
  • Registry support

36
Schema Issues
  • Need to keep track of service data schema
  • Avoid conflicts
  • Find the data easier
  • Should really have unified naming approach
  • Thats why Ben and I are here today )

37
MDS-3 Information Providers in June Release
  • All the data currently in core MDS-2
  • Full data in the GLUE schema for compute elements
    (CE)
  • Ganglia information provider for cluster data
    will also be available from Ganglia folks (with
    luck)
  • Service data from RFT, RLS, GRAM
  • GT2 to GT3 work
  • GridFTP server data
  • Software version and path data
  • Documentation for translating your GT2
    information provider to a GT3 information provider

38
MDS-3 Higher Level Products
  • Higher-level services can perform actions on
    service data collected from other services
  • Part of this functionality can be provided by a
    set of building blocks provided
  • Provider interface GRIS-style API for writing
    information providers
  • Service Data Aggregator set up subscriptions to
    data for other services, and publish it as a
    single data stream
  • Hierarchy Builder allow for hierarchy of
    aggregators

39
MDS-3 Index Server
  • Simplest higher-level service is the caching
    index service
  • Much like the GIIS in MDS 2.x
  • Will have configurablity like an GIIS hierarchy
  • Will also have PHP-style scripts, much as
    available today

40
(No Transcript)
41
Client findServiceData command
  • Can be used in three modes
  • To return a list of available services from a
    Registry configured by your administrator.
  • To return all service data elements for an
    individual service instance.
  • To return a specific service data element for an
    individual service instance.

42
Clients
  • C bindings and the findServiceData command line
    client were released as part of GT3 alpha-2
  • Core C bindings provide findServiceData C
    function
  • findServiceData command line client gives an
    example of using it to parse out information (in
    this case, registry contents)

43
Service Data Browser
  • GUI client to display service data from any
    service
  • Extensible for data-specific visualization
  • A version was released with GT3 alpha 
  • http//www.globus.org/ogsa/releases/alpha/docs/inf
    osvcs/sdbquickstart.html

44
We Need More Basic Information
  • Interfaces to other sources of data
  • GPT data
  • Other monitoring systems
  • Others?
  • Service data from other components
  • Every service has service data
  • OGSA-DAI
  • Will need to interface on schema

45
We Will Need More GUIs and Clients
  • Additional GUI visualizers may be implemented to
    display service data specific to a particular
    port type
  • Additional Client interfaces possibly?

46
We Need MoreHigher Level Services
  • We have a couple planned
  • Archiving service
  • Trigger template

47
Post-3.0 release Archiving Service
  • Will allow subscription to service data
  • Logging in a flexible way
  • Well defined interfaces for mining
  • Currently investigating other implementations to
    leverage off of eg. NetLogger logging, Ganglia
    RRDB

48
Post-3.0 release Trigger Template
  • Will provide a template to allow subscription to
    data, reasoning about that data, and a course of
    action to take place
  • Essentially, a gateway service between OGSA
    Notifications and some other notification
    framework, with filtering of notifications
  • Example Subscribe to disk space information,
    send mail to sys admin when it reached 90 full
  • Needed trigger template and several small
    examples of common triggers, and documentation
    for how users could extend them or write new
    ones.

49
Other Possible HigherLevel Services
  • Site Validation Service
  • Job Tracking Service
  • Interfacing to Netlogger?

50
Summary
  • Current monitoring systems
  • Insights into performance issues
  • What we really want for monitoring and discovery
    is a combination of all the current systems
  • Next-generation information architecture
  • Open Grid Services Architecture mechanisms
  • MDS3 plans
  • Additional work needed!

51
Thanks
  • Testbed/Experiment support and comments
  • John Mcgee, ISI James Magowan, IBM-UK Alain Roy
    and Nick LeRoy at University of Wisconsin,
    MadisonScott Gose and Charles Bacon, ANL Steve
    Fisher, RAL Brian Tierney and Dan Gunter, LBNL.
  • MDS3 planning is by Ben Clifford, Karl
    Czajkowski, and others
  • This work was supported in part by the
    Mathematical, Information, and Computational
    Sciences Division subprogram of the Office of
    Advanced Scientific Computing Research, U.S.
    Department of Energy, under contract
    W-31-109-Eng-38. This work also supported by
    DOESG SciDAC Grant, iVDGL from NSF, and others.

52
Additional Information
  • Contact
  • Jennifer Schopf (jms_at_mcs.anl.gov)
  • Zhang, Freschl and Schopf, A Performance Study
    of Monitoring and Information Services for
    Distributed Systems, to appear in HPDC 2003
  • In the meanwhile just ask me
  • MDS-3 information
  • Soon at www.globus.org/mds
Write a Comment
User Comments (0)
About PowerShow.com