Title: Virtual Host PMP
1 - Virtual Host PMP
-
- MONALISA Monitoring Service
- Shawn McKee smckee_at_umich.edu
- Yang Xia yxia_at_hep.caltech.edu
- GGF8 Meeting 2003
2End to End Performance Issues
- E2E performance requires knowledge of the
end-systems along each network path, with the
expect E2E performance of the path, taking into
account detailed information about the
end-system. -
- What is the CPU, disks, interfaces and memory of
the end hostsand the performance of these? - What is the network interface type, firmware,
parameters and expected performance, both
long-term and based upon recent results? - System state is critical what is the CPU load
and estimated bus loading? - What is the upcoming workload in the network,
both locally and globally?
3Adapting Existing Tools for E2E
- Many of the tools being developed for managing
data-intensive Grids systems, namely MonALISA,
have to address similar issues in monitoring and
planning. - We could use some of the same tools and higher
level services that will give performance
estimates end-to-end based on the above
information and/or recent historical data.
4Start at the ends Hosts
- We must enable data acquisition from the hosts
involved. - We need a system which can dynamically download a
data gathering application which can run on most
systems - JAVA seems to be the most likely candidate.
- Pervasive
- Can be cryptographically signed
- Permissions can be fine grained
- Runs on MANY OSs
5What to acquire from each host?
- Static Info
- GUID
- Operating system and version
- Processor details
- NIC info (firmware, brand, type)
- Memory info
- TCP stack parameters
- Dynamic Info
- Interrupts/sec
- CPU usage
- NIC bandwidth, errors, queue lengths
- Memory usage
- Bus usage
- Test Info
- IP address of the test destination.
- Trace route between host and destination.
- Ping test between host and destination.
- Iperf test between host and destination.
6Host Application
- Having a system accessible thru the web and
supporting Linux and Windows would give us the
broadest initial coverage. - First time users connect to a E2E server or
peer-to-peer system and download a signed Java
applet to their host.
7Starting the Application
- The user allows the applet to start (security
signing) - The applet starts and creates a GUID for this
host and records, in standard format, the
stable host details. Each new invocation of
the applet will verify the currency of the stable
information - The applet can provide the GUID and host details
to a registration server with or without
anonymization of identifying details
8Path Testing
- The series of tests is run by a Java application
and could be a defined set of measurements - Ping (reachability/RTT)
- Traceroute (both forward and backward)
- One-way loss (each direction)
- Iperf (bandwidth EACH way, measured
simultaneously) - Missing components are downloaded from servers.
- Bandwidth testing is done both ways,
simultaneously to find duplex problems on the
path - Dynamic host information is recorded at both ends
during each sub-test
9Test Results
- Test results would be saved locally and a summary
given to the user. - A Java analysis applet could parse the info,
looking for common problems - Results could be logged with a central service
- Logged events could be further analyzed by
central servers with access to current network
details. Users could be referred to the most
likely problem domain with current contact
information provided by the central server
10Some Goals
- Put the wizard knowledge into the applets
- Enable ordinary users to perform state of the art
testing - Provide a reference set of network testing
applications by host type for users - Define a network measurements database for the
network users community - Interoperate with PMP stations in the network
- Instrument applications to automatically provide
data to system
11M
NALISA
MONitoring Agents using a Large Integrated
Services Architecture http//monalisa.cacr.caltech
.edu
Principle Author Iosif Legrand California
Institute of Technology
12MonALISA Design Considerations
- Act as a true dynamic service and provide the
necessary functionally to be used by any other
services that require such information (Jini,
UDDI - WSDL / SOAP) - - mechanism to dynamically discover all the "Farm
Units" used by a community - - remote event notification for changes in the
any system - - lease mechanism for each registered unit
- Allow dynamic configuration and the list of
monitor parameters. - Integrate existing monitoring tools ( SNMP, LSF,
Ganglia, Hawkeye ) - It provides
- - single-farm values and details for each node
- - network aspect
- - real time information
- - historical data and extracted trend
information - - listener subscription / notification
- - (mobile) agent filters and alarm triggers
algorithms for prediction and
decision-support
13JINI Network Services
A Service Registers with at least one Lookup
Service using the same ID. It provides
information about its functionality and the URL
addressed from where interested clients may get
the dynamic code to use it. The Service must ask
each Lookup Service for a lease and periodically
renew it. If a Service fails to renew the
lease, it is removed form the Lookup Service
Directory. When problems are solved, it can
re-register.
jar
jar
Web
Web
Server
Server
Publish the Interface jar
The lease mechanism allows the Lookup Service to
keep an up to date directory of services and
correctly handle network problems.
14MonaLISA Data Collection
Dynamic Thread Pool
Other tools (Ganglia, MRT)
PULL
SNMP get walk rsh ssh remote
scripts End-To-End measurements
Farm Monitor
Configuration Control
Trap Listener
PUSH snmp trap
WEB Server
Dynamic loading of modules or agents
Trap Agent (ucd snmp) perl
15Data Collection Modules
- MonaLisa is a monitoring Framework
- SNMP (walk and get ) for computing nodes,
routers and switches - Scripts , dedicated application (programs) which
may be invoked on remote systems - Interface to Gangia
- Interface to LSF and PBS
- Interface to Hawkeye (Wisconsin)
- Interface to LDAP ( MDS ) ( Florida)
- Interface to IEPM-BW measurements
- Specialized modules for VRVS
16Global Client / Dynamic Discovery
17Global Views CPU, IO, Disk, Internet Traffic
18Regional Centers Discovery Data access
19Access to historical and real-time values
Past values are presented and the GUI remains a
registered listener and the new vales are added
Real Time Histograms for various parameters
20Monitoring VRVS Reflectors
21SUMMARY
- MonaLisa is able to dynamically discover all the
"Farm Units" used by a community and through the
remote event notification mechanism keeps an
update state for the entire system - Automatic secure code update (services and
clients) . - Dynamic configuration for farms / network
elements. Secure Admin interface. - Access to aggregate farm values and all the
details for each node - Selected real time / historical data for any
subscribed listeners - Active filter agents to process the data and
provided dedicated / customized information to
other services or clients. - Dynamic proxies and WSDL pages for services.
- Embedded SQL Data Base and can work with any
relational DB. Accepts multiple customized Data
Writers (e.g. to LDAP) as dynamically loadable
modules. - Embedded SNMP support and interfaces with other
tools ( LSF, Ganglia, Hawkeye) . Easy to
develop user defined modules to collect data. - Dedicate pseudo-clients for repository or
decision making units - It proved to be a stable and reliable service