Title: Information Modeling and Monitoring in Grid Systems
1Information Modeling and Monitoringin Grid
Systems
Sergio Andreozzi INFN-CNAF Bologna
(Italy) sergio.andreozzi_at_cnaf.infn.it
2OUTLINE
- Problem Statement
- Information Modeling of Grid resources
- GLUE Schema
- Computing Resources
- Storage Resources
- Network Resources
- Common Information Model (CIM)
- Monitoring a Grid
- GridICE
3PART I
4Grid basic principles
- Grid systems allow to
- Share resources across administrative domains
(e.g., computing power, storage space, database) - Shared resources are
- geographically dispersed
- heterogeneous
- belong to different administrative domains
- dynamic composition
- can be remotely accessed by users
5Grid basic principles
- Virtualization of users and resources
- mapping from virtual resources to physical
- mapping from virtual users to physical users
Grid system
1
6Problem Statement Information Modeling
- Resources available in Grid systems must be
described in a precise and systematic manner if
they are to be able to be discovered for
subsequent management or use - A shared description allows multiple experts to
contribute to the problem and serves as a
communication mean between different knowledge
domains
7Problem Statement Grid Monitoring
- How do we measure signicant parameters to analyze
usage, behavior and performance of a Grid system - How do we detect and notify fault situations,
contract violations, and user-defined events
8PART II
- Information Modeling
- and the GLUE Schema
9Information Model definition
- Abstraction of real world into constructs that
can be represented in computer systems (e.g.,
objects, properties, behavior, and relationships) - Not tied to any particular implementation
- Used to exchange information among different
domains
10Problem Statement Information Modeling
- Main Use Cases
- Discovery for brokering and access
- what are the Computing Elements available to the
VO CMS and that offer the SL3 operating system
with installed the CMKIN software package? - what are the Storage Elements that offer 20
gigabytes of disk space for the VO ATLAS? - Discovery for monitoring
- how many CPUs the site XYZ is offering to the
EGEE Grid? - what is the success rate of job submitted per
site?
11Information Model how can be represented
- Typically, graphical languages are preferred
- Several solutions are available
- We have selected the Unified Modeling Language
(UML) - It is a widely accepted international standard
(Object Management Group, OMG) - It is often used for information and conceptual
modeling - It has become well established in many
communities with extensive tool support from both
commercial and open source vendors
12Unified Modeling Language (UML)
- The Unified Modeling Language (UML) is a
graphical language for visualizing, specifying,
constructing, and documenting the artifacts of a
software-intensive system. - The UML offers a standard way to write a system's
blueprints, including conceptual things such as
business processes and system functions as well
as concrete things such as programming language
statements, database schemas, and reusable
software components. - (Object Management Group)
13Unified Modeling Language
- First Specification in 1997
- Current Specification version 1.5 (12 different
diagrams) - Finalizing Specification version 2.0 (13
different diagrams) - Each diagram type has
- Semantics what does the diagram type do?
- Notation what graphical symbols can the diagram
type contain? - Diagram groups
- Structural model the static aspects of a system
- Behavioral model the behavior of a system
(dynamic model) - We use Class diagrams they show the static
structure of the model, in particular, the things
that exist (such as classes and types), their
internal structure, and their relationships to
other things
14UML Class Diagram elements
- Class represents a concept within the system
being modeled. It has data structure, behavior
and relationships to other elements
- Generalization taxonomic relationship between a
more general element (the parent) and a more
specific element (the child) that is fully
consistent with the first element and that adds
additional information. It is used for classes,
packages, use cases, and other elements
15UML Class Diagram elements
- Binary association an association among exactly
two classes (maybe also from a class symbol to
itself) - Aggregation it denotes weak ownership (i.e., the
part may be included in several aggregates) and
its owner may also change over time. Deleting the
aggregate referencing does not imply deletion of
the parts - Composition strong form of aggregation a part
instance may be included in at most one composite
at a time the composite object has sole
responsibility for the disposition of its parts
16GLUE Schema
2
- approach to the information modeling of Grid
resources started in April 2002 by the DataTAG
and iVDGL projects - Contributions from DataGrid, Globus, PPDG, GryPhyn
GLUE Schema (Relational) R-GMA
DataGrid Schema (LDAP)
GLUE Schema (UML)
GLUE Schema (XML) GT MDS 4
Globus Schema (LDAP)
GLUE Schema (LDAP) GT MDS 2
17GLUE Schemamodeling guidelines
- Focus on the virtual abstraction given by the
Grid paradigm - Virtual pool of resources
- Generalization
- capture common aspects for different entities
providing the same functionality (e.g., uniform
view over different batch services) - Deal with both monitoring needs and discovery
needs - Monitoring concerns those attributes that are
meaningful to describe the status of resources
(e.g., useful to detect fault situation) - Discovery concerns those attributes that are
meaningful for locate resources on the base of a
set of preferences/constraints (e.g., useful
during matchmaking process)
18GLUE Computing resourceswarm up
- What is the core offered functionality?
- Computing power
- What I need to know in order to use it?
- Offered execution environment (e.g., OS type,
available software libraries) - Offered Quality of Service (e.g., estimated
response time) - Status (e.g., number of running jobs)
- Policy (e.g., max execution time, assigned CPUs)
- Access rights (e.g., can I use it?)
- Location (e.g., Uniform Resource Locator or URL)
19GLUE Computing resourcessome more thought
about the service
- The computing power is typically offered by
cluster systems - Requests are typically staged into queues for
efficient system usage - Queue policies enable service differentiation
(e.g., dedicated CPUs vs. shared CPUs assignment,
differentiated max CPU time, differentiated queue
service strategy) - A service has quality aspects
20GLUE Schema an example
- Site A has 6 worker nodes
- (3 fresh new and fast, 3 old and slow)
- The farm is configured as follows
- a high-end queue to the 3 fast WNs
- a slow queue to the 3 slow WNs
- a background queue to the 6 WNs (lower priority)
Queue
Host - Fast
Host - Slow
Cluster
21GLUE Computing Element (NG)
- Possible evolution of GLUE Schema
- CE is a site cluster
- Queues are used to differentiate the service
- The service offers access and management of
available execution environments - ServiceClass
- HighEnd, Slow, Background
- ExecutionEnvironment
- charact node A, charact node B
22GLUE Schema CE - example
23GLUE Storage resourceswarm up
- What is the core offered functionality?
- Storage Space usage
- What I need to know in order to use it?
- Storage Service manager type (e.g., srmv2)
- Available data access protocols (e.g., gridftp,
rfio) - Offered Quality of Service (e.g., availability,
reliability) - State (e.g., available space)
- Policy (e.g., file life time, MaxFileSize)
- Access rights (e.g., can I use it?)
- Location (e.g., Uniform Resource Locator or URL)
24GLUEStorage Element
- Storage Element
- Refers to a group of services responsible for the
management of storage areas and access them - Storage resources contributed to a Grid system
can vary from simple disk servers to complex
massive storage systems
25GLUEStorage Space
- Storage Space portion of a logical storage
extent that - is assigned to Grid users (e.g., a VO, a group of
a VO) - is associated to a directory of the underlying
file system (e.g. /permanent/CMS) - has a set of policies (MaxFileSize, MinFileSize,
MaxData, MaxNumFiles, MaxPinDuration, Quota, ACL) - has a state (available space, used space)
26GLUE Storage Element
27Expressing relationships amongComputing and
Storage Services
- A typical job execution request involves certain
properties for the computing element and for a
permanent storage area - SiteAdmins may want to specify preferences on
which Storage Areas should be used by jobs
executed by certain computing elements - Possible mount point information and weight for
choosing among different opportunities are
provided
28Network Resources
9
- (not yet in the GLUE Schema)
- Definition of a network model that enables an
efficient and scalable way of representing the
communication capabilities between grid services - Partition the Grid into Domains, and limiting the
monitoring activity to the observation of
Domain-to-Domain paths - Communication characteristics measured within the
boundaries of D1 and D2 are negligible with
respect to the same characteristic measured
between the boundaries of D1 and D2.
29Partitioning the Grid into Domains
- A Domain is a set of elements identified by URIs
(referred in the model as edge services) - Connectivity is a metric that reflects the
quality of communication through a link between
two Edge Services - A Domain communicates with other domains using
Network Services - A Network Service offers a unidirectional
communication service between two Domains - Each domain has a Theodolite Service that gather
network service related metrics towards others
domains
30GLUE Network Element
31GLUE Network an example scenario
32Common Information Model
8,10
- CIM Common Information Model
- Conceptual view of the managed environment for IT
resources that attempts to unify and extend the
existing instrumentation and management standards - Targeted at management of resources, where
management is defined as the active process of
monitoring, modifying, and making decisions about
a resource - Maintained by Distributed Management Task Force
(DMTF), a worldwide industry organization - It uses UML Class Diagram as a modeling language
33CIM related activities at GGF
- CIM Grid Schema WG (CGS WG)
- Started at GGF 5
- Goal define CIM extensions for the Job
Submission Service Model, i.e. - managed objects and their relationships for
managing the execution and monitoring of batch
jobs in a grid environment - Defined extensions will be submitted to DMTF for
inclusion in the official CIM standard - Common Resource Model WG (CRM WG)
- BOF at GGF 7
- Goal define CIM extensions to describe managable
resources as OGSA services
34PART III
35Grid Monitoring Definition
- Measuring the activity of significant Grid
resource-related parameters to analyze usage,
behavior and performance of a Grid system, and to
detect and notify fault situations, contract
violations, and user-defined events
36Grid MonitoringRequirements
- dynamically partition resources and service usage
using three criteria site ownership, operations
domain, and virtual organization accessibility - collect data in order to enable retrospective
analysis - deal with a large volume of data by carefully
introducing reduction mechanisms - collect both fine-grained and coarse-grained
monitoring data
37Grid MonitoringRequirements
- help to detect fault situations and possibly
prevent them - provide general visualization and analysis
functionalities - rely on a common information model of the Grid
resources - adopt interfaces and protocols that are standard
within the Grid community - integrate with local monitoring systems, when
available - Track which machines are running the VO
applications, the status and behavior of each
machine, and the behavior of the software.
38GridICE architectural view
10
Presentation Service
DetectionNotification
Data Collector Service New resources detector
scheduler persistent storage
Information Service
Measurement Service
39Measurement Service
service able to probe resources for certainp
parameters
- Grid Service
- gatekeeper
- gsiftp
- gris
- workload-mgr
-
- VO view - aggregation
- number of total CPUs
- number of free CPUs
- number of running jobs
- number of waiting jobs
- SE free disk space
- Machine
- CPU load
- memory usage
- disk usage (per partition)
- network activity
- number of processes
- system uptime
40Information Service
- Service that offers measured values to potential
consumers - Relying on GIS (MDS 2.x)
- Decoupling wide-discoverable information from
periodical mobut not all nodes have direct access
(writing) to it - Using the GLUE Schema mapping to LDAP data model
7
6
41Grid Information Services overview
3
4
5
42Discovery with MDS 2
CLIENT
- Considering a specific customization (LCG
project) - Hierarchical structure of aggregators
- Asynchronous
- The BDII (root of the information service tree)
contains ALL the information published by the
resource information services
BDII
GRIS
43MDS2-based Information Serviceexample
44Data Collector Service
- Service allowing the collection of historical
monitoring data - It consists of three main components
- new resources detection service scans the GIS in
order to detect which are the new source of
monitoring data that should be observed - scheduler fires periodical observations which
task is to discover which resource metrics are
being offered and store the read data in a - persistent storage
45Data Collector Service (2)
- From the viewpoint of observability of Grid
resources, we distinguish the following states
46Data Analyzer
- Service providing performance analysis, usage
level and general reports and statistics - It can be configured to generate and send
periodical reports of the Grid activity, and also
of the Grid structure
47Detection Notification Service
- Service providing a flexible and configurable
means for event detection and notification
actions. - It provides for timely notification of the
configured events. - When fault situations are detected, the right
person or group of persons must be notified.
48Presentation Service
- Service providing the user interface to the
monitoring information and control - designed on a role-based strategy, providing for
different views depending on the type of the
consumer - Physical view, which target is the whole set of
grid resources being managed - Virtual Organization view, which target is the
whole set of grid resources that can be accessed
by users of a certain Virtual Organization
49Conclusion
- Information Modeling of Grid resources
- Characteristics of Grid systems require a shared
information model of resources to be used as a
base for the Information Service - An important approach to the information modeling
of Grid resources has been presented - Monitoring a Grid
- Requirements of Grid monitoring have been
presented - GridICE has been presented as an example of a
monitoring tool for a Grid system
50REFERENCES
- 1 Németh Z, Sunderam, V. Characterizing Grids
Attributes, Definitions, and Formalisms, Journal
of Grid Computing, 2003, volume 1, number 1,
pages 9-23 - http//ipsapp009.kluweronline.com/IPS/frames/toc.
aspx?J6160I1 - 2 GLUE Schema Official documents
http//www.cnaf.infn.it/sergio/datatag/glue - 3 Globus Toolkit Monitoring and Discovery
Service 2 - http//www.globus.org/mds/mds2/
- 4 Globus Toolkit Monitoring and Discovery
Service 4 - http//www-unix.globus.org/toolkit/docs/deve
lopment/4.0-drafts/info/WSMDSFacts.html - 5 R-GMA Relational Grid Monitoring Service
- http//www.r-gma.org
- 6 S. Andreozzi, GLUE Schema implementation for
the LDAP model, INFN Technical report, - http//www.cnaf.infn.it/sergio/publications
/Glue4LDAP.pdf
51REFERENCES
- 7 K. Czajkowskiy, S. Fitzgeraldz, I. Foster,
and C. Kesselman. Grid Information Services for
Distributed Resource Sharing. In Proceedings of
10th IEEE International Symposium on
High-Performance Distributed Computing (HPDC-10) - http//www.globus.org/research/papers.htmlMDS-HP
DC - 8 GGF CIM Grid Schema WG
- https//forge.gridforum.org/projects/cgs-wg/
- 9 S. Andreozzi, A.Ciuffoletti, A. Ghiselli, C.
Vistoli. Monitoring the Connectivity of a Grid. - In Proc. of the 2nd International Workshop
on Middleware for Grid Computing (MGC 2004) in
conjunction with the 5th ACM/IFIP/USENIX
International Middleware Conference, Toronto,
Canada, October 2004 - http//www.cnaf.infn.it/sergio/publications/MGC2
004.pdf - 10 GridICE Homepage
- http//grid.infn.it/gridice
- 11 Common Information Model (CIM).
http//www.dmtf.org
52- Q A
- or Querying the MDS 2 or GridICE demo