Title: Local Monitoring Module LMM
1Local Monitoring Module (LMM)
- Author Anna Bekkerman
- abekkerm_at_ecs.umass.edu
2Managing LMMs Setup
- When LMM is started the following components are
created - LocalServer and Sender
- Functionality Communicate with RAPIDS server
- CurrentSetup
- Functionality Contains parameters of the current
monitoring setup - Metrics, processes, update rate etc.
- DBManager
- Functionality Concurrent access to the
monitoring setup
CurrentSetup
DBManager
Sets up parameters
Requests current parameters
LocalServer
commands
LMM
signals, events
Sender
Sends signals, events
3Managing LMMs Setup
- Requirement provide control over the level of
monitoring intrusiveness - Solution dynamic modification of the monitoring
setup - Design
- During the experiment user sends setup commands
- In the beginning of each collection session LMM
requests current setup parameters - DBManager handles setup modification commands as
well as LMMs requests - Dynamic setup modification has not been
implemented in the current version of RAPIDS
4Start-up Procedure
- Start network control application (if needed)
- Initialize RAPIDS Message Queue (RMQ)
- Start heartbeat application
- Launch processes
5Network Control
- Requirement simulate packet drop rates and
transit delays on links between radar nodes and
the SOCC - Solution use iptables to forward packets to
user-land application that will delay or drop
them
SOCC
SOCC1
SOCC2
Radar node
Radar node
Radar node
6Network Control Implementation
- Configure iptables to use the QUEUE target
SOCC
emmy2
SOCC1
SOCC2
/etc/sysconfig/iptables on emmy5.casa.umass.edu
-A INPUT -s 128.119.245.36 -j QUEUE
Radar node
Radar node
Radar node
emmy2.casa.umass.edu
emmy5
All packets coming from emmy2 will be queued!
7Network Control Implementation
- In order to unload the ip_queue module do
- gt /sbin/modprobe r ip_queue
- In order to stop iptables do
- gt /sbin/service iptables stop
- Start iptables
- Load the ip_queue module that forwards packages
to the user space
gt /sbin/service iptables start Applying iptables
firewall rules OK
These commands should be executed under root
login!
- gt /sbin/modprobe ip_queue
- gt /sbin/lsmod
- Module Size Used by
- ip_queue 14553 0
8Network Control Implementation
- When LMM is started it will launch the
simuwan_usr application that processes forwarded
packages - Problem simuwan_usr must be started by root
- Solution use sudo utility to start the
application - The utility is used to run commands with the root
user's privileges
9How to Set Up the sudo Utility
- On the machine X edit the sudoers file (as root)
- Specify commands that should be executed under
root login - Now, while running on X under someLogin, LMM
should be able to start/stop simuwan_usr
gt /usr/sbin/visudo
someLogin X NOPASSWD
/usr/share/rapids/bin/ simuwan_usr, /usr/bin/kill
10More on simuwan_usr Application
- Two types of action can be applied to the packets
- Delay for either constant or variable amount of
time - Drop according to the specified drop rate
- Packets that are not dropped can be delayed still
- Uses Glibs event loop to process forwarded
packets - Each packet is an event
- Each event should be assigned a verdict ACCEPT
or REJECT - Each event can be assigned a timeout before it is
dispatched
11RAPIDS Message Queue (RMQ)
- RMQ employs Unix message queues to store
Messaging and Application events - Events are generated
- Through wrapped library function calls
- Using RAPIDS API
- RMQManager
- Creates/removes RMQ
- Periodically retrieves events and prepares them
for sending to the RAPIDS server
Application 1
Application 1
Function call
LMM
RMQ
RMQManager
To server
12Network Monitoring
- Requirement monitor status of links between SOCC
and radar nodes - Solution send Im alive messages from radars
to the SOCC - Drawback false alarms
13Network Monitoring Implementation
- heartbeat_socc application is started on the SOCC
node - If there is more than one node in the SOCC, the
first one specified in the configuration file is
chosen - heartbeat_sensor applications are started on the
rest of the nodes - SOCC periodically pings nodes
SOCC
SOCC1
SOCC2
Radar node
Radar node
heartbeat_socc
heartbeat_sensor
14Network Monitoring Implementation
- If node X replays, SOCC generates Variable event
Xtrue - If node X does not replay, SOCC generates
Variable event Xfalse - When RAPIDS server receives false event for node
X, it reports failure for connection SOCC ? X
LMM
RMQ
To server
RMQManager
SOCC
SOCC1
SOCC2
Variable event
Radar node
Radar node
heartbeat_socc
heartbeat_sensor
15Launching Processes
- User provides commands to start/stop processes in
the configuration file - RAPIDS server sends these commands to LMMs while
setting them up - LMM writes commands to a script
- Scripts are created in the home directory and
deleted at the end of the experiment - Script name is start_commands/stop_commands
followed by the sequence number of the node where
the script is executed - SOCC nodes have 1-digit sequence numbers 1, 2, 3
- Sensor nodes have 3-digit sequence numbers 100,
101, 102 - Starting script is executed in the beginning of
the experiment - Stopping script is executed when LMM receives
stop signal
16Collection Session Class Diagram
RMQManager
DBManager
SyncBuffer
CollectionSession
executes
creates
void start()
starts
SystemBucket
Bucket
ProcessBucket
EventBucket
17Collection Session Algorithm
- Create three Buckets for storing system metrics,
process metrics and events - Generate commands using CommandProvider
- CommandProvider requests current set of monitored
metrics from DBManager - Depending on the current setup different set of
commands will be generated
18Collection Session Algorithm
- Run CommandExecutor
- System/process metrics
- Each command reads current value of a certain
metric - For example CPU utilization, workload etc.
- Command writes metric values to a bucket
- Events
- RMQManager inserts events into a SyncBuffer
- Special EventCatcher command retrieves events
from the buffer and puts them into a bucket - Send events/metrics to the RAPIDS server using
Sender
19Commands Class Diagram
Command
Bucket bucket
void store() void start()
CPUUsage
EventsCatcher
Workload
MemoryUsage
Ps
20Commands Implementation
- CPUUsage
- Reads values from /proc/stat
- MemoryUsage
- Reads values from /proc/meminfo
- Workload
- Reads values from /proc/loadavg
- Ps
- Looks through all process subdirectories in /proc
- Reads the filename of the process from
/proc/pid/stat - Stores information about processes whose names
were provided in RAPIDS configuration file