Title: CrossGrid Task 3.3 Grid Monitoring
1CrossGrid Task 3.3 Grid Monitoring
- Trinity College Dublin (TCD, AC14 CR11)
- Brian Coghlan, Stuart Kenny, David OCallaghan
- CYFRONET Academic Computer Center, Krakow (CYFRO,
C01) - Bartosz Balis, Slawomir Zielinski, Kazimieriz
Balos - ICM, University of Warsaw (ICM, AC2 C01)
- Krysztof Nawrocki, Adam Padee
2CrossGrid Task 3.3 Grid Monitoring
- Provides monitoring information from four main
sources - - Applications (OCM G)
- - gathers performance data from an executing
application - - used by application developers in order to
understand an applications behavior and
improve its performance . - - Infrastructure (JIMS)
- - gather and expose information concerning the
state of devices used to build a grid
environment - - notify the user not only about simple events,
but derived ones as well, - take managerial
actions in cases of failures. - - Instruments(/Networks) (SANTA-G)
- - allow information captured by external
monitoring instruments to be introduced into the
Grid information system. - - used in validation and calibration of both
intrusive monitoring systems and systemic
models, and also for performance analysis . -
3CrossGrid Task 3.3 Grid Monitoring
- - Derived Results
- - gathering information from other monitoring
tools and creation of one consistent user
interface. - - generation of forecasts of future grid state
using Kalman Filters and neural networks.
4Grid Monitoring System
Infrastructure monitoring
R-GMA/OGSA info
Application monitoring
5Task 3.3.1 OCM-G, Current State
- OCM-G integrated with GT.
- Secure communication based on globus_io between
components (authentication, possibly encryption). - Service Managers run on a "well known port
(3331, configurable). - Configuration via local config files (user home
dir or /opt/cg/etc) - No longer need for shared fs !
- Still one central Service Manager
- can handle multi-site applications unless
firewalls block communication - Registration of application processes improved
- Locks to get rid of race condition while forking
LMs - Support for user-defined events (probes) added.
- CVS status
- code up to date.
- building with autobuild, on RH6.2.
- need to make changes to comply with developers
guide.
6Task 3.3.1 OCM-G, Task Contacts
- Task 2.4 - G-PM fully integrated with OCM-G in
its current functionality. - G-PM now needs user certificate to connect to the
OCM-G.
7Task 3.3.1 OCM-G, Integration
- Smooth integration with G-PM.
- Communication based on globus_io.
- No dependencies to other Globus / EDG components.
8Task 3.3.1 OCM-G, Problems and Issues
- Building under RH7.3 problems with globus_io
development package. - Interface to Grid Benchmarks should be defined.
9Task 3.3.2 SANTA-G, Current State
- Improve the schema of information available
- - Done, still more to do
- Add more SQL parsing support
- - Done, added more WHERE predicates
- - Supports , gt , lt queries
- Add on-line data acquistion
- - Sensor now starts/stops TCPdump at
startup/shutdown - - Allows querying of dynamically generated
network traffic - Integrate Sensor and QueryEngine components
- - Sensor now contacts QueryEngine at startup
- - informs it when a new log file is generated,
informs QE of shutdown - Enhance Viewer functionality
- - Improved Viewer GUI.
- - Graphical packet display, displays timestamps
in correct format, automatically resolves IP
addresses - - Query Builder added to allow user to
construct complex queries
10Task 3.3.2 SANTA-G, Task Contacts
- EDG WP3
- - SANTA-G makes use of the EDG R-GMA.
- - has also contributed to it, CanonicalProducer
was an extension to the EDG R-GMA developed as
part of Task 3.3.2. - Task 3.3.3 JIMS
- - integration with this task has begun
- - work should be completed by the end of the
summer - (see next slide).
11Task 3.3.2 SANTA-G, Integration
12Task 3.3.2 SANTA-G, Integration
13Task 3.3.2 SANTA-G, Problems and Issues
- Need the most recent EDG R-GMA RPMS
- - Canonical Producer not in earlier release!!
- R-GMA RPMs Redhat 7.3 only!
- Still To Do
- - Expand schema of available information
- - Improve SQL support
- - Complete SANTA-G/JMX integration
- - Testing
- - Investigate security
14Task 3.3.3 JIMS, Current State
- JIRO-based Infrastructure Monitoring System
JIMS - ported from JDMK to pure JMX reference
implementation - host monitoring module, ready.
- SNMP is in progress
- SOAP Gateway for integration with other CG tasks
- exposes Web Services based interface
- makes integration with OGSA (Open Grid Services
Architecture) easier - Web Services Gateway module
- simple SOAP client for testing purposes
15JIMS, SOAP Gateway architecture
16JIMS, SOAP Gateway Facilities
- Web Services Gateway serves as a mediator between
MBean Servers in monitored stations and external
applications - Place for registering active monitored stations
and removing non-existent ones
17JIMS, Test SOAP Client Interface
18Task 3.3.3 JIMS, Problems and Issues
- What is done
- Host monitoring system - JIMS - ready
- SOAP Gateway - before deadline
- Open (not commercial) implementation of discovery
services - before deadline - To do
- integration with CVS and autobuild process, by
the end of this week - Simplifying installation process
- Adding functionality
- other mechanisms for monitored stations
unregistering - security when connecting modules via Web Services
(SOAP/XML)
19Task 3.3.4 PostProcessing, Current State
- Forecaster based on linear Kalman filter
implemented and available as RPM. - More work needed to put it in CVS, will be done
during the meeting. - Current solution for real monitoring data from
clusters is VO-Centric Ganglia
20Task 3.3.4 PostProcessing, Integration
- For integration meeting will provide 2 RPMs
- ganglia-monitor-core-mcastmin-2.4.1-1.i386.rpm
- serves as monitoring daemon on worker nodes
- gmmetad-2.2-1.i386.rpm
- located on cluster CE. Gathers information from
monitoring daemons and passes it to central
monitoring host - RPM slightly altered wrt to original
- Would like to install these on X clusters for
testing during integration meeting.
21Task 3.3.4 PostProcessing, Problems and Issues
- 3rd part which binds forecaster and data sources
under development - Not ready for integration meeting.