Experiment and Analysis towards effective cluster management system - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Experiment and Analysis towards effective cluster management system

Description:

... of Engineering and Science. mature cluster open source ... Monitoring Control Channel (Management Channel) improve manageability. Local Management Agents ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 24

Provided by: katherin2

Category:

more less

Transcript and Presenter's Notes

Title: Experiment and Analysis towards effective cluster management system

1
Experiment and Analysis towards effective cluster
management system

Chokchai Box Leangsuksun (Louisiana Tech U)
Tirumala Rao Balumuri
Jie XU
Stephen Scott (Oak Ridge National Lab)
Richard Libby (Intel)

2
Introduction

Management several challenges in cluster
environments
Performance, Reliability, Availability,
Serviceability
Typical two phase process monitor and control
Existing open source monitoring and management
standards lead the next phase of cluster
management

3
Related research

Ganglia Clumon are widely used cluster
monitoring tools which have similar architecture
and slightly varying functionality and
characteristics

Clumon Architecture
1-level Ganglia Architecture
4
Related research

Other monitoring frameworks like dproc, supermon,
bigbrother - similar critical issues in cluster
monitoring
Scalability , lack of management capability
Ganglia proposed the N level architecture to meet
the challenges of massive clusters and grids and
maintain low overhead

5
Related research

IPMI Intelligent Platform Management Interface
IPMI defines the interface to communicate the
sensor values and control hw components.
Power up/down, reset, reading temperatures etc.
IPMI provides vender-independent interface with
monitoring and management capabilities to cluster
environment
Open source projects openIPMI, freeIPMI,
dcpclient etc.

6
Implementation and Experimental

Studied the existing monitoring tools like
ganglia clumon and IPMI management framework
Prototype experimental management tool by
enhancing existing monitor tools, ganglia
clumon with hardware metrics (IPMI)
Benchmark and scalability analysis - our
prototypes to meet our requirements

7
Implementation and Experimental

Studied the influence of enhancing ganglia
clumon with hardware capability on the cluster
monitoring characteristics like scalability,
fault tolerance and resources
Considered the issues of the level of monitoring
detail required

8
Implementation and Experimental
9
Experimental results and Analysis

9 Intel 1.2GHz dual XEON servers systems with
IPMI-enabled, 512MB memory and 100MBits/s
Ethernet port.
The cluster was built with OSCAR 3.0 and Red Hat
Linux 9.0
Resource overhead comparisons were made between
enhanced ganglia clumon

10
Screen samples of experiments
11
Experimental results and Analysis
In ganglia environment the CPU usage increased at
the rate of 0.026 for each node added to the
cluster In clumon the CPU usage increased at the
rate of .03 for each node
12
Experimental results and Analysis
Fig 4.3. IPMI Enabled Ganglia Clumon Network
Traffic Comparison

In ganglia environment the network traffic
increases at the rate of 3.2 for every node
added to the cluster
In clumon environment the network traffic
increases at the rate of 4

13
Experimental results and Analysis

Studied a set of other IPMI management
capabilities in our effort to encapsulate
management system into the monitoring tools
Measured a set of management operations such as
power on/off, reboot, sensor query, id on/off and
sel clear

Time taken to issue IPMI commands to remote node
collected by dpccli
14
Experimental results and Analysis
Results obtained from dpccli
15
Survey and experiments of IPMI tools
16
Experimental results and Analysis

IPMItool response time was close to OpenIPMI and
they have similar features
Better than dpccli in response time
Freeipmi ipmipower utility is much faster than
all of these tools but it was tested to provide
poor authentication layer

17
Effective management Hardware perspective

Explored the IPMI capability to cluster framework
IPMI PEF support to reduce the load of analyzing
the number of events around the cluster

18
Effective management Hardware perspective
Our observations

Using the IPMI control capabilities to tweak the
sensor, power and any IPMI component behavior for
each cluster node
The experimental results provided insight into
some techniques of ensuring effective control
Hardware events gathered at the cluster nodes can
be correlated to predict imminent failures

19
Summary existing monitoring tools

mature cluster open source monitoring tools
current tools are not well integrated for
complete RAS management
only presents monitoring information, no
interpretation
does not assure quick detection of abnormalities
provides no means for management (monitor only)

20
Intelligent Cluster Monitoring

Monitoring Control Channel (Management Channel)
? improve manageability
Local Management Agents
Central Manager
Distributed control centralized intelligence
management
? better fault handling

21
Central Manager Function Unit
22
Monitoring Agents
23
Cluster Management Protocol

SNMP ? network resource management
CMIP ? cluster resource management
Basic Commands
Get Request
Get Response
Set Request
Exec Request
Alert Response

24
Conclusion Future work

We conducted our research from two directions
the hardware aspect and software aspect
Investigated how a popular hardware management
platform like IPMI can be incorporated into
existing cluster monitoring tool to provide
valuable hardware information
Proposed intelligent management framework
Event based correlation of hardware events
Policy based hardware monitoring and notification
Studying the deviation patterns from the regular
pattern and cross correlation