Experiment and Analysis towards effective cluster management system - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Experiment and Analysis towards effective cluster management system

Description:

... of Engineering and Science. mature cluster open source ... Monitoring Control Channel (Management Channel) improve manageability. Local Management Agents ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 24
Provided by: katherin2
Category:

less

Transcript and Presenter's Notes

Title: Experiment and Analysis towards effective cluster management system


1
Experiment and Analysis towards effective cluster
management system
  • Chokchai Box Leangsuksun (Louisiana Tech U)
  • Tirumala Rao Balumuri
  • Jie XU
  • Stephen Scott (Oak Ridge National Lab)
  • Richard Libby (Intel)

2
Introduction
  • Management several challenges in cluster
    environments
  • Performance, Reliability, Availability,
    Serviceability
  • Typical two phase process monitor and control
  • Existing open source monitoring and management
    standards lead the next phase of cluster
    management

3
Related research
  • Ganglia Clumon are widely used cluster
    monitoring tools which have similar architecture
    and slightly varying functionality and
    characteristics

Clumon Architecture
1-level Ganglia Architecture
4
Related research
  • Other monitoring frameworks like dproc, supermon,
    bigbrother - similar critical issues in cluster
    monitoring
  • Scalability , lack of management capability
  • Ganglia proposed the N level architecture to meet
    the challenges of massive clusters and grids and
    maintain low overhead

5
Related research
  • IPMI Intelligent Platform Management Interface
  • IPMI defines the interface to communicate the
    sensor values and control hw components.
  • Power up/down, reset, reading temperatures etc.
  • IPMI provides vender-independent interface with
    monitoring and management capabilities to cluster
    environment
  • Open source projects openIPMI, freeIPMI,
    dcpclient etc.

6
Implementation and Experimental
  • Studied the existing monitoring tools like
    ganglia clumon and IPMI management framework
  • Prototype experimental management tool by
    enhancing existing monitor tools, ganglia
    clumon with hardware metrics (IPMI)
  • Benchmark and scalability analysis - our
    prototypes to meet our requirements

7
Implementation and Experimental
  • Studied the influence of enhancing ganglia
    clumon with hardware capability on the cluster
    monitoring characteristics like scalability,
    fault tolerance and resources
  • Considered the issues of the level of monitoring
    detail required

8
Implementation and Experimental
9
Experimental results and Analysis
  • 9 Intel 1.2GHz dual XEON servers systems with
    IPMI-enabled, 512MB memory and 100MBits/s
    Ethernet port.
  • The cluster was built with OSCAR 3.0 and Red Hat
    Linux 9.0
  • Resource overhead comparisons were made between
    enhanced ganglia clumon

10
Screen samples of experiments
11
Experimental results and Analysis
In ganglia environment the CPU usage increased at
the rate of 0.026 for each node added to the
cluster In clumon the CPU usage increased at the
rate of .03 for each node
12
Experimental results and Analysis
Fig 4.3. IPMI Enabled Ganglia Clumon Network
Traffic Comparison
  • In ganglia environment the network traffic
    increases at the rate of 3.2 for every node
    added to the cluster
  • In clumon environment the network traffic
    increases at the rate of 4

13
Experimental results and Analysis
  • Studied a set of other IPMI management
    capabilities in our effort to encapsulate
    management system into the monitoring tools
  • Measured a set of management operations such as
    power on/off, reboot, sensor query, id on/off and
    sel clear

Time taken to issue IPMI commands to remote node
collected by dpccli
14
Experimental results and Analysis
Results obtained from dpccli
15
Survey and experiments of IPMI tools
16
Experimental results and Analysis
  • IPMItool response time was close to OpenIPMI and
    they have similar features
  • Better than dpccli in response time
  • Freeipmi ipmipower utility is much faster than
    all of these tools but it was tested to provide
    poor authentication layer

17
Effective management Hardware perspective
  • Explored the IPMI capability to cluster framework
  • IPMI PEF support to reduce the load of analyzing
    the number of events around the cluster

18
Effective management Hardware perspective
Our observations
  • Using the IPMI control capabilities to tweak the
    sensor, power and any IPMI component behavior for
    each cluster node
  • The experimental results provided insight into
    some techniques of ensuring effective control
  • Hardware events gathered at the cluster nodes can
    be correlated to predict imminent failures

19
Summary existing monitoring tools
  • mature cluster open source monitoring tools
  • current tools are not well integrated for
    complete RAS management
  • only presents monitoring information, no
    interpretation
  • does not assure quick detection of abnormalities
  • provides no means for management (monitor only)

20
Intelligent Cluster Monitoring
  • Monitoring Control Channel (Management Channel)
  • ? improve manageability
  • Local Management Agents
  • Central Manager
  • Distributed control centralized intelligence
    management
  • ? better fault handling

21
Central Manager Function Unit
22
Monitoring Agents
23
Cluster Management Protocol
  • SNMP ? network resource management
  • CMIP ? cluster resource management
  • Basic Commands
  • Get Request
  • Get Response
  • Set Request
  • Exec Request
  • Alert Response

24
Conclusion Future work
  • We conducted our research from two directions
    the hardware aspect and software aspect
  • Investigated how a popular hardware management
    platform like IPMI can be incorporated into
    existing cluster monitoring tool to provide
    valuable hardware information
  • Proposed intelligent management framework
  • Event based correlation of hardware events
    Policy based hardware monitoring and notification
  • Studying the deviation patterns from the regular
    pattern and cross correlation
Write a Comment
User Comments (0)
About PowerShow.com