Title: GMDAT Grid Monitoring and Data Analysis Tool
1GMDATGrid Monitoring and Data Analysis Tool
K. Nawrocki, A. Padee, K. Wawrzyniak Interdiscip
linary Centre for Mathematical and Computational
Modeling Warsaw University
2GMDAT main goals
- create a tool to provide the scheduler with data
about Grid status at a given moment and, if
possible, with predictions about its state in
near future i.e. - predict values of parameters such as memory
usage, available CPU resources at given Grid
cluster, available bandwidth between pairs of
Grid clusters, etc. - provide convenient API for other possible
consumers (e.g. Data Access Optimization), and
GUI for grid users - the monitoring system has to be lightweight and
be able to work on a 24h/day basis
3GMDAT architecture overview
- Existing solutions used and extended whenever it
was possible - Ganglia monitoring toolkit components used as a
transport layer (gmond gmeta). - Historical information stored in RRD databases
- SOAP interface for other CrossGrid tools.
- New suite of tools developed
- Sensors for estimation of available bandwidth and
round trip times, and measurement of transfers
between clusters - Predict for generation of forecasts
- Client libraries and database tools for
integration with other middleware components
4GMDAT architecture diagram
5GMDAT Implementation structure
6GMDAT the predict tool
The model
Infinite Impulse Response filter
Finite Impulse Response filter
Kalman Filter loop
7predict in action
cpu_user prediction for one step
- cpu_user
- prediction for 100 steps
- (start for sample 1000)
- prediction for 50 steps
- (start for sample 1115)
8GMDAT in action
- C API
- XML over SOAP
- Web interface (user admin)
9GMDAT in action (e.g.)
Web interface
C API
// calculate minimum value of 'idle_bogomips' for
defined set of clusters for time0 (real data)
cout ltlt "MIN " ltlt myXML.getClusterParameter("
min","idle_bogomips",clusters,0) // calculate
maximum value of 'idle_bogomips' for defined set
of clusters for time0..6 min // (ie. data and
predictions for next 6 minutes for each cluster
from defined set are averaged // and maximum
value of these values over clusters is returned)
cout ltlt "MAX " ltlt myXML.getClusterParameter
("max","idle_bogomips",clusters,6) // calculate
maximum value of 'bandwidth' between all pairs of
clusters for defined set // for time0 (ie. real
data) cout ltlt "MAX " ltlt
myXML.getNetParameter("max","bandwidth",clusters,0
)
XML over SOAP
ltGRID NAME"CrossGrid" AUTHORITY"http//grid.fuw.
edu.pl/ganglia-webfrontend/"gt ltGRID
NAME"fuw.edu.pl" AUTHORITY"http//cms.fuw.edu.pl
/ganglia/" LOCALTIME"1103720907"gt ltCLUSTER
NAME"fuw.edu.pl cluster" LOCALTIME"1103720851"
OWNER"unspecified" LATLONG"unspecified"
URL"unspecified"gt ltHOST NAME"cms2.fuw.edu.pl"
IP"193.0.84.232" REPORTED"1103720840" TN"11"
TMAX"20" DMAX"0" LOCATION"unspecified"
GMOND_STARTED"1103111048"gt ltMETRIC
NAME"out_traffic_195.134.67.0_24" VAL"1297"
TYPE"uint32" UNITS"KB" TN"151" TMAX"60"
DMAX"0" SLOPE"both" SOURCE"gmetric"/gt ltMETRIC
NAME"load_five" VAL"0.01" TYPE"float" UNITS""
TN"80" TMAX"325" DMAX"0" SLOPE"both"
SOURCE"gmond"/gt