Title: PowerPoint Presentation - MonALISA
1An Agent Based, Dynamic Service System to
Monitor, Control and Optimize Distributed
Systems
February 2006
Iosif Legrand California Institute of
Technology
2The MonALISA Framework
- MonALISA is a Dynamic, Distributed Service System
capable to collect any type of information from
different systems, to analyze it in near real
time and to provide support for automated control
decisions and global optimization of workflows in
complex grid systems. - The MonALISA system is designed as an ensemble
of autonomous multi-threaded, self-describing
agent-based subsystems which are registered as
dynamic services, and are able to collaborate and
cooperate in performing a wide range of
monitoring tasks. These agents can analyze and
process the information, in a distributed way,
and to provide optimization decisions in large
scale distributed applications.
3MonALISA is A Dynamic, Distributed Service
Architecture
- The framework is based on a hierarchical
structure of loosely coupled agents acting as
distributed services which are independent
autonomous entities able to discover themselves
and to cooperate using a dynamic set of proxies
or self describing protocols. - An agent-based architecture provides the ability
to invest the system with increasing degrees of
intelligence to reduce complexity and make
global systems manageable in real time. For an
effective use of distributed resources, these
services provide adaptability and
self-organization.
4MonALISA service Data Handling
Lookup Service
Lookup Service
Client (other service) Web client
Postgres MySQL
Data Stores
WEB Service WSDL SOAP
Data Cache Service DB
Discovery
Registration
Communications via the ML Proxy
Client (other service) Java
data
MonALSIA Service
Predicates Agents
Applications
Configuration Control (SSL)
User defined loadable Modules to write /sent data
5The MonALISA Discovery System Services
Fully Distributed System with no Single Point of
Failure
Global Services or Clients
Clients , HL services repositories
Dynamic load balancing Scalability
Replication Security AAA for Clients
Proxies
AGENTS
Distributed System for gathering and Analyzing
Information.
MonALISA services
Distributed Dynamic Discovery- based on a lease
Mechanism and REN
Network of JINI-LUSs Secure Public
6Monitoring Internet2 backbone Network
- Test for a Land Speed Record
- 7 Gb/s in a single TCP stream from Geneva to
Caltech
7The UltraLight Network
BNL ESnet IN /OUT
8Monitoring Network Topology Latency, Routers
9Monitoring The GLORIAD Ring
10Monitoring Grid sites, Running Jobs, Network
Traffic, and Connectivity
JOBS
TOPOLOGY
ACCOUNTING
11Monitoring OSG Resources, Jobs Accounting
Running Jobs
Accounting
42 SITES 4 000 Nodes ( 10 000 CPUs)
Thousands of Jobs 60 000 parameters
12FTP Data Transfer between GRID sites
Total FTP Traffic per VO
13Bandwidth Challenge at SC2005
151 Gbs
500 TB Total in 4h
14End User / Client Agent LISA- Localhost
Information Service Agent
- Authorization
- Service discovery
- Local detection of the hardware and software
configuration - Complete end-system monitoring Per-process load,
I/O and network throughputs, etc. - End-to-end performance measurements
- Will act as an active listener for all events
related with the requests generated by its local
applications.
15Host Monitoring at SC2005
- Many network problems are actually endhost
problems misconfigured or underpowered
end-systems - The LISA application was designed to monitor the
endhost and its view of the network. - For SC05 we developed we used LISA to gather the
relevant host details related to network
performance - Information on the system information, TCP
configuration and network device setup was
gathered and accessible from one site. - Future plans are to coordinate this with LISA and
deploy this as part of OSG. The Tier-2 centers
are a primary target.
16Available Bandwidth Measurements
- Embedded Pathload module.
17Coordination Service for Available Bandwidth
Measurements
- Enforces measurement fairness
- Avoids multiple probes on shared network segments
- Dynamic configuration of measurements timing
- Logs events
- Provides service redundancy by using a
master-slave model
18Monitoring the Execution of Jobs and the Time
Evolution
SPLIT JOBS
LIFELINES for JOBS
Summit a Job
DAG
19ApMon Application Monitoring
- Library of APIs (C, C, Java, Perl. Python) that
can be used to send any information to MonALISA
services - Flexibility, dynamic configuration, high
communication performance
dynamic reloading
Config Servlet
- Automated system monitoring
- Accounting information
MonALISA hosts
APPLICATION
MonALISA Service
ApMon
APPLICATION
MonALISA Service
ApMon
System Monitoring
No Lost Packages
ApMon configuration generated automatically by a
servlet / CGI script
ApMon Config
load1 0.24
processes 97
pages_in 83
20MonALISA agents to create on demand on an
optical path or tree
Discovery Secure Connection
2
3
ML Demon
1
Time to create a path on demand lt1s
independent of the location and the number of
connections
Control and Monitor the switch
Runs a ML Demon gtml_path IP1 IP4 copy file IP4
4
ML proxy services used in Agent Communication
21Monitoring and Controlling Optical Planes
Controlling
Port power monitoring
22Monitoring Optical Switches Agents to Create on
Demand an Optical Path
23Communities using MonALISA
- Major Communities
- OSG
- CMS
- ALICE
- D0
- STAR
- VRVS
- LGC RUSSIA
- SE Europe GRID
- APAC Grid
- UNAM Grid
- ABILENE
- ULTRALIGHT
- GLORIAD
- LHC Net
- RoEduNET
ABILENE
- Demonstrated at
- SC2003
- Telecom World 2003
- WSIS 2003
- SC 2004
- I2 2005
- TERENA 2005
- IGrid 2005
- SC 2005
- MonALISA
- Running 24 X 7 at 250 Sites
- Collecting 250,000 parameters in near real-time
- Update rate of 25,000 parameter updates per
second - Monitoring
- 12,000 computers
- gt 100 WAN Links
- Thousands of Grid jobs running con- currently
-
CMS-DC04
-
GRID3
VRVS
ALICE
24 The MonALISA Architecture Provides
- Distributed Registration and Discovery for
Services and Applications. - Monitoring all aspects of complex systems
- System information for computer nodes and
clusters - Network information WAN and LAN
- Monitoring the performance of Applications, Jobs
or services - The End User Systems, its performance
- Video streaming
- Can interact with any other services to provide
in near real-time customized information based on
monitoring data - Secure, remote administration for services and
applications - Agents to supervise applications, trigger alarms,
restart or reconfigure them, and to notify
other services when certain conditions are
detected. - The MonALISA framework is used to develop higher
level decision services, implemented as a
distributed network of communicating agents, to
perform global optimization tasks. - Graphical User Interfaces to visualize complex
information