Title: Network Monitoring
1Network Monitoring
- J. Won-Ki Hong
- Dept. of Computer Science and Engineering
- POSTECH
- Tel 054-279-2244
- Email jwkhong_at_postech.ac.kr
2Table of Contents
- Introduction
- Monitored Types of Information
- Network Monitoring Configurations
- Network Monitoring Methods
- Performance Monitoring
- Performance Indicators
- Performance Monitoring Functions
- Fault Monitoring
- Problems of Fault Monitoring
- Fault Monitoring Functions
- Accounting Monitoring
3Introduction
- Network monitoring is concerned with observing
and analyzing the status and behavior of the end
systems, intermediate systems, and subnetworks
that make up the network to be managed - Issues in network monitoring
- what to monitor?
- define what is to be monitored
- how to monitor?
- how to obtain information from managed resources
- what to do with the monitored information?
- how the monitored information is used in various
management functional areas
4Monitored Types of Information
- Static information
- hardly changes
- current configuration information
- e.g., the number and identification of ports on a
router - Dynamic information
- changes frequently
- information related to events in the network
- e.g., change of state, transmission/reception of
packets - Statistical information
- derived from dynamic information
- e.g., average number of packets transmitted per
unit time
5Organization of a Management Information Base
MANAGEMENT INFORMATION BASE (MIB)
Call_Blocked Packet_Loss Time_Delay
Throughput
Statistical data base
Abstraction of state and event variables
State_Variable
Event_Variable
Dynamic data base
Sensor activation and data collection
Sensor data base
Switch_server Buffer Source
Server Station_Info
Switch_Buffer Switch_Source
Status_Sensor Derived_Status_Sensor Event_Sensor
Configuration data base
Static data base
6Monitoring System Components
- monitoring application
- includes the functions of monitoring that are
visible to the user - e.g., performance, fault, accounting
- manager function
- performs the basic monitoring function of
retrieving information - agent function
- gathers and records management information for
one or more network elements and delivers the
information to the monitor - managed objects
- mgmt information that represents resources and
their activities - monitoring agent
- generates summaries and statistical analysis of
mgmt information
7Functional Architecture for Network Monitoring
8Network Monitoring Configurations
9Network Monitoring Methods
- Polling
- a request-response interaction between a manager
and agent - a manager sends request to an agent which
processes the request and responds with
information from its MIB - a manager may use polling to
- learn about the configuration it is managing
- obtain periodically an update of conditions
- investigate an area in detail after being altered
to a problem - Event Reporting
- information flow is initiated from the agent to
manager - an agent may generate report periodically to give
the manager its current status or whenever a
significant event (e.g., change of a state) or an
unusual event (e.g., fault) occurs - good for detecting problems as soon as they occur
10Performance Monitoring
- Measuring the performance of the network (or
performance monitoring) is absolutely required in
NM - to detect fix problems that cause performance
degradation - to better plan network upgrades
- Problems in selecting and using appropriate
indicators (or metrics) - too many indicators in use
- the meaning of most indicators are not yet
clearly understood - some indicators are supported by some
manufacturers only - frequently, the indicators are accurately
measured but incorrectly interpreted by human or
mgmt application - the calculation of indicators takes too much time
11Network Performance Indicators
- Service-oriented
- Availability the percentage of time that a
network system, a component, or an application is
available for a user - Response Time how long it takes for a response
to appear at a users terminal after a user
action calls for it - Accuracy the percentage of time that no errors
in the transmission and delivery of information - Efficiency-oriented
- Throughput the rate at which application-oriented
events (e.g., file transfers) occur - Utilization the percentage of the theoretical
capacity of a resource (e.g., transmission line,
switch, CPU) that is being used
12Elements of Response Time
TO
Network interface (e.g., router)
SI
Workstation
Server
Network
SO
WO
WI
TI
CPU
RT TI WI SI CPU WO SO TO
RT response time
CPU CPU process delay TI inbound terminal
delay WO outbound queuing
time WI inbound queuing time
SO outbound service time SI inbound service
time TO outbound terminal
delay
13Performance Monitoring Functions
- Performance Measurement
- the actual gathering of statistics about network
traffic timing - typically performed by agents within network
devices - e.g., amount of data in and out of a node, number
of connections, traffic per connection - Performance Analysis
- analyzing the gathered data and presenting it
- e.g., total, average, min, max, histogram
- Synthetic Traffic Generation
- generating artificial traffic load
- permits the network to be observed under a
controlled load
14Typical Performance-Related Questions
- Performance measurements can be used to answer a
number of questions - Why is the response so slow? (a very loaded
question!) - Why is the retransmission rate so high?
- Is traffic evenly distributed among network users
or are there source-destination pairs with
unusually heavy traffic? - What is the percentage of each type of packet?
- What is the channel utilization and throughput?
- What is the effect of traffic load on
utilization, throughput time delays? - When does traffic load start to degrade system
performance? - What is the maximum capacity of the channel under
normal operating conditions? How many active
users are necessary to reach this maximum?
15Fault Monitoring
- To detect faults as quickly as possible after
they occur and to identify the cause of the fault
so that correctional action may be taken - Problems of Fault Monitoring
- Fault Detection Problems
- Unobservable faults e.g., deadlock, device not
monitorable - Partially observable faults insufficient to
pinpoint the problem - Uncertainty in observation not clear what the
problem is - Fault Isolation Problems
- Multiple potential causes
- Too many related observations
- Interference between diagnosis and local recovery
procedures - Absence of automated testing tools
16What happens when the T1 link fails?
802.3
802.5
Client
Server
Router
Router
T1
MUX
MUX
PBX
802.3
PBX
Heterogeneous Network Environment
17Propagation of Failures to Higher Layers
18Fault Monitoring Functions
- Logging
- record important events and errors
- logs should be accessible by managers (e.g., via
polling) - Event Reporting
- sending events, errors to managers
- sending alarms to manager to warn possible
problems - Diagnostic Functions
- connectivity test (e.g., traceroute)
- response-time test
- liveness test (e.g., ping)
- protocol integrity test
- loopback test
19Accounting Monitoring
- Keeping track of users usage of network
resources - communication facilities
- computer hardware
- software and systems
- services
- Usage may need to be broken down by account, by
project, or by individual user for appropriate
accounting purposes
20Summary
- Network monitoring is the most basic aspect of NM
- The purpose of network monitoring is to gather
information about the status and behavior of
network elements - Information to be gathered include
- static, dynamic and statistical information
- Monitoring methods - polling event reporting
- Monitoring functions
- performance monitoring
- fault monitoring
- accounting monitoring
- READ Chapter 2 of Textbook