Title: Ten Commandments
1Ten Commandments Of TCP/IP Performance
Nalini Elkins Inside Products, Inc. Nalini_elkins_at_
inside-products.com
Inside Products, Inc. www.inside-products.com (831
) 659-8360 sales_at_inside-products.com
2How Does TCP/IP Work?
- To find problems in TCP/IP, lets start with
thinking about what TCP/IP is. - TCP/IP helps the network to get information -
packets - from one place to another through some
network equipment - routers, switches, etc. - What so hard about this?
Packet 2
Packet 1
Packet 1
Packet 2
3Networks are Complex
- There are hundreds of thousands, even millions
of connections in a network - Finding just which one has problems is a daunting
task.
4Network diagnostics involves decoding multiple
layers of protocols.
FTP
TCP
LPR/LPD
IP
LDAP
IPv6
HTTP
UDP
MQSeries
TN3270
5Ten Commandments
- Thou shalt monitor thy application backlog queue
- Thou shalt not kill thy network by many short
connections - Thou shalt drop unused connections
- Thou shalt honor thy TCP duplicate ACKs and thy
retransmissions - Thou shalt relate thy TCP resets to the cause.
- Thou shalt not fail in watching thy TCP attempt
fails - Thou shalt delve deeply into UDP no ports errors
- Thou shalt address the reason for your IP address
errors - Thou shalt not convert thy applications directly
from multi-dropped SDLC - Thou shalt not use two packets when one will do
6Thou shalt monitor thy application backlog queue
- How does backlog queue work? Example below
- Application can have 5 active connections
- 10 connections can be in the backlog queue
- When the 16th connection comes in, it is rejected
- How to monitor? See above
- Portion of Netstat All display
- SNMP MIB will also show queues
7Case Study
8 What is going on?
- User is a U.S. state government.
- Using a CICS accounting application, sometimes
the user gets a Connection Refused other times,
the session initiation just hangs.
- Technical support has no idea what to do.
- Users are angry!
- The problem has gotten escalated to just below
the governors office.
9 Backlog Queues
- We looked at the application backlog queues.
- We see that the backlog queues are being exceeded
and connections are dropped.
10 Connection Refused
- When the backlog queue is exceeded, then the
users get a Connection Refused message. - This is actually better than.
- If the user is stuck in the backlog queue, then
they just see an hourglass on the terminal and
they appear to be hung! - This is even more frustrating.
11 The Real Problem
- The installation tried a number of ways to speed
up the application. - The vendor of the application is being contacted
for assistance.
12 How to Find This!
- One way is via the z/OS SNMP MIB.
- The counts are also shown per socket on the
Netstat All command. If you do a Netstat all
(IPAddr 0.0.0.0 then you may see all the
listener connections. - But, remember, these are all just snapshots.
You should be monitoring these continuously.
13 SOMAXCONN
- SOMAXCONN is used in conjunction with the backlog
queue value specified in application programs. As
a socket connection request arrives at the TCP/IP
stack and the server is busy processing a
previous request, the new request is queued up to
the amount specified with the SOMAXCONN
parameter. - When that number is exceeded, TCP/IP connection
requests will timeout and get refused. The value
specified in the backlog queue can not exceed
that of SOMAXCONN. No error will be given, but
the value of SOMAXCONN will be used. SOMAXCONN is
set per listener. In other words, the SOMAXCONN
value is not cumulative for all listener ports.
- SOMAXCONN Specifies maximum length for the
connection request queue created by the socket
call listen(). - Sample SOMAXCONN 10
- This is the length of the backlog queue.
14Thou shalt not kill thy network by many short
connections
- Each connection establishment requires a flow of
packets - If there is a lot of data flow, better to have a
long connection with many packets than multiple
short connections.
15TCP Many New Connections
- We were led to the problem of possible unneeded
sessions by noticing that hundreds of new
connections were made and terminated for TCP port
23. - We investigated the possible sources for this
activity.
16Many Connections for Port 23
- When we look at the IP addresses with connections
to port 23, some stand out. - IP address 10.111.1.190, in particular, was
responsible for 53 of the connections.
17Thou shalt drop unused connections
- Many unused connections come from Voice Response
Units (VRUs) - Others may be from scripts which use TN3270.
- Minor modifications may help fewer connections
at longer intervals. (Instead of 100 connections
every 3 minutes, 50 connections every 10
minutes.) - Investigate persistent connections, connection
pooling.
18Thou shalt honor thy TCP duplicate ACKs and thy
retransmissions
- What do the dup acks and
- retransmissions have in
- common?
- The same subnet
- The same time of day
- The same socket application
- The same route - set of hardware
ACK 101
ACK 101
ACK 201
19TCP Retransmits By Port
- Notice that port 23 is responsible for 96 of the
retransmits. - Lets see about remote addresses
20TCP Retransmits By Remote Address
- Five remote addresses are responsible for over
80 of the retransmits. - Duplicate acknowledgments show a similar pattern
to the retransmits.
21Thou shalt relate thy TCP resets to the cause
- A RESET packet is sent by TCP to abort a
connection. - May or may not be a problem - closing an idle
connection is proper - On the other hand, if an application is refusing
connections because it is out of resources then
you may see many RESETs. - Lets look at the next commandment for an
example.
22Thou shalt not fail in watching thy TCP attempt
fails
Host TCP Port 445
Count of TCP Attempt Fails
- It may be that there is a TCP application which
is not active. - The packet to the TCP port which is not active
will be responded to with a TCP RESET packet. - The count of TCP Attempt Fails will be
incremented.
Packet to TCP Port 445
TCP Reset Packet
23TCP Resets
24Thou shalt delve deeply into UDP no ports errors
Host UDP Port 161
Count of UDP No Ports
- UDP No Ports is equivalent to TCP Attempt Fails
- It may be that there is a UDP application which
is not active. - If all UDP sockets are active, then it may be
that UDP traffic is coming in at too high a rate
for a particular port. - We have seen this error to be correlated with
ICMP Destination Unreachable SubType Port
Unreachable error.
Packet to UDP Port 161
ICMP Destination Unreachable Port Unreachable
25UDP Port Unreachable
- In the case above, no application was listening
on port 161, so this generated the ICMP error. - Since this port happened to be for UDP, then it
also generated a UDP No Ports error. - If this is just a mistake and happens thousands
of times a day because some application is not
properly configured
26 Thou shalt address the reason for your IP
address errors
Host 1.2.255.255
Count of Address Errors
Count of IP Discards
- Many UDP applications send packets to a broadcast
address. - The mainframe does not recognize such addresses.
- The packets are dropped and noted as address
errors. - Such packets may also come from a router if
routing is misconfigured. - We have seen millions per day.
Packet to 1.2.255.255
27Misdirected Packets
- This analysis only lasted a few minutes.
- Hundreds of packets were sent from many UDP
NetBios connections. - Some were improperly configured SQL servers.
- These packets were dropped by the mainframe.
- Why clog up the network and make the mainframe do
extra work for no reason?
28Thou shalt not convert thy applications directly
from multi-dropped SDLC
Multi-dropped SDLC link
Packet For PU 1
- TCP virtual circuit.
- Remote host
- Local host
Small packets means overhead.
Packet For PU 2
Packet For PU 3
IP Header 20 bytes TCP Header 20 bytes Data 8
bytes
Packet For PU 2
Packet For PU 1
Packet For PU 3
- Makes sense to have
- small packets so
- that no one
- dominates traffic.
29Sample Application
- This application was converted directly from
SDLC. - Suffered from poor response time.
30Thou shalt not use two packets when one will do
- PSH bit on in the header indicates that data
transmission is complete. - The PSH bit could have been turned on in Packet
1. - Packet 2 does not need to be sent.
- A small mistake.
- If you do it a million times a day, it becomes a
big mistake.
Packet 1 data
Packet 2 0 bytes data PSH in TCP header
31Tuning TCP Saves Money
- Eliminate errors and unneeded traffic and benefit
from - lower CPU usage
- Less frequent hardware upgrades
- lower costs for MIPS based software charges
- Increased bandwidth availability
- Increased technical staff productivity
- Inside the Stack is the only TCP/IP monitor
focused on problem solving and tuning.
32- Data from a recent Network Health Check reveal
TCP, UDP, ICMP, and listener errors for both
systems. - Over 2,000 errors per 3-minute interval.
- With tuning these numbers fall significantly.
- Errors contribute to TCP/IP SRB usage.
33- After a Health Check and tuning efforts lasting 2
-3 weeks, the listener and UDP errors for both
systems have been completely eliminated. - The ICMP errors for both systems are nearly
eliminated. - The TCP errors have been cut to 1/4 to 1/3 of
what they used to be. - TCP dropped from 2nd highest user of CPU to 4th
highest user of CPU (SRBs).
34The Silent Killer
- You may not even realize you have problems with
TCP/IP. - Just as cholesterol in the heart can be a silent
killer, retransmissions, excessive connections,
and unneeded traffic can clog up the network. - And these problems are preventable!
35How Can We Help?
- ESAI and Inside Products are TCP/IP specialists.
- We can help you with
- Training
- Tools
- Consulting
36Inside the Stack
- Inside the Stack provides
- Real time monitoring
- Historical reports
- Alerting
- Connection monitoring
- TCP stack diagnostics
- There are hundreds of reports possible!
37Inside the Stack
38TCP Problem Finder
- The product most directed to the serious
diagnostician TCP Problem Finder allows you
to - Find problems in diagnostic traces - which can
consist of thousands or hundreds of thousands of
packets - See the exact flow in a connection from a high
level overview or the details - We use this product ourselves in consulting. IBM
subcontracts to us to help with TCP problem
resolution, we could not do it without TCP
Problem Finder!
39TCP Problem Finder
40Network Health Check
- When you are serious about tuning your network,
our Network Health - Check can help to
- Identify response time problems for applications
(host or network) - Identify response time problems for individual
connections (host or network) - Identify congestion or network traffic errors on
subnets - Identify paging, queues, high CPU usage for TCP
sockets or TCP address space - Analyze TCP profile
- Identify paging, queues, high CPU usage for
individual FTPs - Identify routes and applications with packet
fragmentation - Identify excessive idle or hanging connections
- Identify connections in frequent error status
- Identify application configuration problems
(keepalive required, etc)
41TCP Classes
- We offer many classes in TCP/IP including
- Security, IPSec, Policy Agent
- IPv6 (Addressing, Multi-platform)
- TCP Tuning and Performance Analysis
- Trace Analysis and Diagnostics
42Contact Us!
- For more information on
- Inside the Stack,
- TCP Problem Finder,
- TCP Response Time Monitor,
- Availability Checker,
- Network Health Check,
- TCP/IP classes
- Coming soon!
- EE Health Check
- Please contact us!
-
1-831-659-8360 or 1-866-464-3724
sales_at_inside-products.com sales_at_ESAIGroup.com
Australia Blueline Software UK
FitzSoftware BENELUX Adinsec BvBa