Title: Network Management
1College of Engineering The Pennsylvania State
University
- Network Management
- Issues, Approach
- A Candidate Solution
William J. Burkhard Associate Director The Center
for Electronic Design, Communications and
Computing
James F. Carras Network Coordinator The Center
for Electronic Design, Communications and
Computing
Thomas G. Long Senior Systems Analyst The Center
for Electronic Design, Communications and
Computing
09 September 2002
2Network Management Topics
- Issues
- Network Management Elements
- Monitoring Resources
- Resource Condition Reporting
- Monitoring Traffic Loads
- Benefits of Monitoring Traffic
- Traffic Analysis Filtering
- Importance of Traffic Analysis Filtering
- Resource Control
- Importance of Resource Control
- Network Management Configuration
- Operational Schema
- Concept of Operations Flow Diagram
- Concept of Operations Description
- System Recommendation Cost
- Summary and Conclusion
3Network Management Issues
- By May 2002, the Colleges network architecture
migrated from a centrally manageable Star
configuration to a completely distributed
architecture. - The distributed architecture increased the
Colleges dependency on the backbone architecture
of the Universitys Information Technology
Services managed systems. - A single point of failure exists in the new
distributed network management schema. - Loss of network connectivity to the Colleges
server farm in Hammond Building results in
reports that the all College network resources
are in a failure mode. - To adequately manage the Colleges new network
architecture requires a philosophical change to
the approach to accomplishing network management.
4Network Management Monitoring Resources
- Nagios
- Host Services Monitoring
- Runs Intermittent checks on hosts and services.
- Sends notifications when problems are
encountered. - Web Browser accessible hardware status,
historical logs and reports. - Network Services Monitored
- SMTP, POP3, HTTP, NNTP, Ping
- Functionality
- Ability to define network host hierarchy.
- Ability to send contact notification of problems
via email and pager. - Ability to define event handlers.
- Ability to apply on-the-fly command interface
modifications.
5Network Management Resource Condition Reporting
- Equipment Operational Status
- Nagios provides the capability to establish
control parameters and rules for monitoring and
reporting on network closet router and Ethernet
switch up or down time status. - When a systems performance falls outside the
performance parameters of rule sets, an alert is
generated. The alert can take one form or
multiple forms the software alert can one or
more individuals via email, pager, a cell phone
message, instant message, SMS, etc. - The softwares configurability supports network
host hierarchy for the detection of and
distinction between hosts that are down and those
that are unreachable. - The web interface permits the viewing of network
element status and facilitates on-the-fly
configurations. - A web accessible external command interface
enables the application of system monitoring and
notification behaviors these behaviors can also
be configured via third-party applications.
6Network Management Monitoring Traffic Loads
- Multi Router Traffic Grapher (MRTG)
- Feature
- Monitors traffic loads on Network links.
- Functionality
- Provides live visual representation of port data
traffic loads. - Provides histories of port data traffic loads on
daily, weekly, monthly and annual basis. - Provides a web interface for viewing port traffic
loads.
7Network Management Benefits of Monitoring Traffic
- Aside from a catastrophic network hardware
failure reported by the resource monitoring
application, Monitoring Traffic is a critical
indication of when a real or potential problem
may exist in the network. - Monitoring data traffic rates down to the
Ethernet switch level provides indications of
when rates exceed the expected norm this becomes
the first indicator that a system is misbehaving,
has been hacked and is broadcasting high outbound
data rates over the network, or a subnet is under
attack from outside the Colleges network. - Indications of high data rates allows the staff
to efficiently, effectively and swiftly analyze
conditions and respond to these unpredictable but
certain events. - As demands increase on the networks bandwidth,
traffic monitoring is also an activity that will
indicate when available bandwidth needs to be
increased to support faculty usage demands.
8Network Management Traffic Analysis Filtering
- Ethereal
- Features
- Network Protocol Analysis.
- Filters for refining displayed packet summary
information. - Maintaining saved copies of network trace
information. - Functionality
- Live network data capture.
- Editing of capture files.
- Multi protocol filtering of 289 protocols.
- GUI browsing of network data.
9Network Management Importance of Traffic Analysis
- Traffic analysis is a necessary element in the
network management structure because it
complements traffic load monitoring. - Load monitoring provides the indication something
is potentially wrong and traffic analysis
provides the answer to What is going wrong. - Traffic analysis monitors packet flow into and
out of a network segment. - Traffic analysis alerts the network management
team when the packet analysis process identifies
data traffic that my match known hacker data
signatures. - The configurability, monitoring flexibility and
filtering capabilities all facilitate efficient
threat analysis and identification. - Knowing the nature and identity of an attack
enables the network management team to react
swiftly and appropriately to these events the
employment of router filters can block subsequent
hacker attacks. - Accurate information about an attack signature
can be forwarded to the University Computer
Network Security Office for appropriate actions.
10Network Management Resource Control
- iBoot
- Features
- Web Addressable Configurable
- Remote Manual Control
- Automatic Failure Detection
- Password Protected
- Output Power Reset
- Switches up to 12 Amps _at_ 115 Volts
- Functionality
- Remote Reboot of any Device
- Automatic Reboot on Loss of Ping Response
11Network Management Importance of Resource Control
- Resource control facilitates monitoring and power
control to the most critical networking closet
resourcethe building router. - Provides a mechanism to quickly recover from
possible hung system conditions. - In the automatic (ping) mode, the iBoot can
detect the routers failure to respond to pings
this state assumes that the router needs to be
rebooted and the iBoot recycles device power. - In the event of non restoration of router
functionality after an auto reboot, the network
management team can remotely access the iBoot to
force another reboot attempt. - The iBoot can also be used to remotely force a
router reboot if the network management team
determines the reboot may result in clearing an
anomaly or other router problem. - Prevents costly down time by possibly
eliminating the need for a site visit by the
network management team.
12Network Management Network Management
Configuration
13Network Management Operating Schema
Web Based Desktop Remote Monitoring
Applications
Remote User
Dell Master Management Server
COE Network Connections
Status
Loads
Analysis
Network Operations Control Center Room 151D
Hammond
Building N
Building 1
14Network Management Operating Schema Flow Diagram
Resource Monitoring Alert
Customer Call
Review Traffic Analysis Data To Identify
Attack Signature Source
Place Filter On Router Or Server Firewalls
Gather Forensic Data
External
Review Nagios Status
Review MRTG Data Rates
COE or External Problems
Notify PSU Security
Hardware Response Problem
COE
No
Review Traffic Analysis Data to Find Offending Sys
Find Repair Or Block Sys
Yes
Repair
15Network Management Concept of Operation
Description
- Data communications problems usually manifest
themselves in one of two ways - System Hardware Failures which can be either user
or network based. - Telephonic inquiries by Technical Contacts or
users. - Appropriate immediate response is to determine if
a network failure occurred in a building this is
accomplished by referring to the network status
plots provided by Nagios. - Network system hardware failures are always
followed by an almost immediate, visual and
audible alerts from Nagios. Nagios also provides
pager and email alerts to notify staff during
non-working hours. - Customer calls usually signify localized problems
attributable to computer configuration issues, a
computer-faceplate connection issue or other
nonhard network system issue. - No immediate indication of a hardware problem can
be an indicator that computer systems within or
external to the College may be participating in
hacker activities such as Denial of Service
attacks. Responses to internal attacks are
different from a response to external attacks
both types of responses rely on the analysis of
data traffic. - A College system identified by the traffic
analysis misbehaving on the network is removed or
blocked from network access once a system repair
is verified, the systems is again given network
access privileges. - A system external to the College caught spamming
or creating other problems for the network are
filtered at a building router to prevent its
access to and disruption of College computing. - Forensic analysis of attacks are conveyed to the
University Network and Computing Security Office
for analysis and any appropriate follow-on
actions. - The remaining element not shown on the flow
diagram is the iBoot. This device is used to
remotely cycle closet router AC power as a step
in attempting to remotely clear an observed
problem associated with this device. - The main console in 151D Hammond provides
continuous monitoring of all aspects of network
management. Initial alerts and corrective
actions can be taken locally on a desktop system
however, detailed troubleshooting, analysis and
corrective actions are coordinated and
accomplished in this facility.
16Network Management System Recommendation Cost
Software Applications are currently provided as
freeware.
17Network Management Summary Conclusion
- Summary
- The distribution of COE networking resources
requires a change in network management
philosophies an architecture. - Loss of network connectivity to TNS backbone
from Hammond west results in a indication that
the entire COE network is down. - A new COE network management architecture
includes hardware and software solutions in all
building closets were router connect to TNS
backbone. - A minimal hardware architecture design and
applications are presented. - Applications for critical network hardware status
monitoring, data traffic and protocol/packet
analysis are imperative elements in network
management. - A concept for operational procedures ties
together the concept for integrated COE network
management. - Analysis and control is remotely accessible via
web or Telnet sessions.
- Conclusion
- The Center is in need of a network management
structure that provides accurate early
notification of hardware and data traffic loads. - A minimal and cost effective network management
architecture is proposed. - The proposed solution address all the basic needs
for sound and qualitative network management. - The proposed network management architecture
eliminates a single point of failure. - The proposed operational concept provided a
formal structure for network management. - The proposed operational structure enable timely
and accurate use of personnel resources to
resolve problems. - Cell phones for three primary individuals are
preferable to pagers the initial cost is lower
but recurring costs are higher. However, cell
phones provide greater versatility.
18Network Management
- Addendum
- Additional Reporting Details
16 September 2002
19Network Management - Nagios Resource Monitoring
Reports
- Tactical Overview of Resource Availability
- Status Reports - Instantaneous Historical
- Availability Overview
- Summary
- Network Grid Availability
- Network Mapping
- 3-D Mapping
- Troubleshooting Reports
- Identification Alerting to Service Problems
- Identification Alerting to Network Outages
- Notification of Data Traffic Trends
- Others
- Availability of each Monitored Element
- History of Alerts Related to each Monitored
Element - Anomaly Notifications
- Host Resource Monitoring Processor Loads, Disk
Memory Usage, Running Processes Log Files - Hierarchical Detection Notification and
Distinction of Services that are Down vs.
Unreachable - Escalation of Host and Services Notifications to
Different Contact Groups.
20Network Management - MRTG Monitoring Traffic
Reports
- The Multi Router Traffic Grapher (MRTG) is a
tool to monitor the traffic load on
network-links. MRTG generates HTML pages
containing graphical images which provide a LIVE
visual representation of this traffic. - Instantaneous monitoring of port daily in and
out data traffic loads viewable with web
interface. - Histogram reports of cumulative port in and
out data traffic loads viewable with web
interface weekly, monthly and yearly. - MRTG Logfile information is available for use in
user developed analysis programs.
21Network Management - Ethereal Traffic Analysis
Filtering Reports
- Ethereal Supports Data Capturing and Analysis of
289 Protocols. - When set up to Capture Packets of Selected
Protocols, the Software Captures and Analyses
Packet by Packet Content. - Captured Data is Retained in a Database for
Periodic Review or Tracking Down Protocols that
Contain Problem Packets. - Format and Content of Captured Data Depends on
which of the 289 Protocols are being Analyzed by
Ethereal. - Ethereal is a Continuously Running Process Whose
Output can be Saved or Printed for Human Analysis
of Detected Anomalies. - Protocols not of Interest can be Filtered to
Prevent Capturing Excess Data that Would Cloud
the Analysis Process. - Manual Data Analysis of Packets Tagged as Having
Bogus Signatures Leads to Identification of
Denial of Service or Hacker Systems Addresses
for Router Filters Reporting to the University
Security Office.