Title: Inter-Domain Traffic
1Inter-Domain Traffic Engineering
Principles, Applications and Case Studies
2Who We Are
- Josh Wepman
- Applications Engineer/Snake Oil Salesman
- Ixia NetOps
- jaw_at_ixiacom.com
- Joe Abley
- Toolmaker/Engineer/Token Canadian
- MFN PAIX
- jabley_at_mfnx.net
3What We Are Talking About
- Inter-domain Measurement, Analysis and Control
- Improving Connectivity
- With whom?
- Where?
- At what speed?
4What we are NOT talking about
- MPLS
- DiffServ
- RSVP
- CR-LDP
- All sorts of other words with lots of capital
letters that have become associated with traffic
engineering
5Goals For The Afternoon
- Methods and Concepts on how to "improve"
inter-domain connectivity - Depending on who YOU are, "improve" will have
different meanings - Finding ways to reduce impact of failure in peer
or transit networks - a.k.a. "increasing reliability
- WARNING Some operational complexity may arise!
- Put on your peril-sensitive glasses...
6Presentation Outline
- Inter-Domain TE Goals Definition
- Inter-domain TE Measurement
- Applying Data to Address Your Goals
- Eliciting Control and the Feedback-Loop
- Conceptual Examples
- Who is Doing This Stuff?
- Real_Live_Network Examples
- No Questions? Good!
7Inter-Domain TE Goals Definition
- Iteration-1 Conceptual
- Define Goals, Measure, Analyze, Refine Goals,
Action - What is it you need to accomplish?
8Examples of Goals
- Need to offload my "NSFnet" peering links
outbound (congestion management) - Need to expand my inter-domain peering links
cluefully (growth) - Need to find some people to provide my services
to (sales) - That's right, I said itsell stuff!!!
9Adjusting Your Assumptions
- Be prepared to adjust your assumptions based on
measured data! - What you planned to do, and what you end up doing
may change substantially. - Do not fear - this is real network data!
- Clue should increase as valid network data
becomes available and consulted
10Data Needs
- What data sets are required?
- Flow-export data
- BGP routing data
- Active measurement data
- SNMP
- Some public tools available (cflowd, zebra, ping,
scotty, etc) - Some commercial products available
11Inter-domain TE Measurement
- Also Known As
- Getting good, problem/goal specific data!
12Assumed Network Model
- Hierarchical Network Model
- Ingress/Egress Network services are separated
from Transit Services - Works in other network models (as we will show),
but this is what we are focusing on...
13Hierarchical Network Model
Core Network Services
Core1
Core2
Peer1
Peer2
LocalASN
RemoteASN
AS2
AS3
AS3
AS4
AS9
14Types of Data to Measure
- Routing Data
- Focus here is BGP
- Traffic Data
- Flow-export V5 is the focus here
- Active Measurement Performance Data
- Ping/Traceroute/One-way delay/Jitter
15Routing Data
- Routers generally do this well
- Core competency by design (Routers route...)
- Different data sets are available for
measurement - IBGP (Good if you are looking at the whole
system, looking outbound or using a flat network
model) - Route-Reflection (Often needed for inbound
analysis, can create some complexity in flat
netowrk models) - EBGP (Good for seeing your neighbor's view of
you) - Choose the right one to measure based on your
needs/goals
16Routing Data In/Outbound
Core Network Services
IBGP vs. Route-Reflection
Core1
Core2
Collector
Peer1
Peer2
Data
LocalASN
RemoteASN
AS2
AS3
AS3
AS4
Routes
AS9
17Routing Data In/Outbound
- When your goal is outbound characterization, and
your measurement point is the exit point for
traffic, IBGP is your guy/girl/other. - Routes are always external, and thus always
propagated (sans election and policy of course) - Protocols hate being anthropomorphized
- When your goal is inbound characterization, and
your measurement point is the entry point for
traffic, Route-Reflection must be used. - Only way to get internal routes cleanly
18Route Data Full Mesh (tangent)
- Value of full mesh monitoring
- Historical route tracking
- Policy benchmarking
- Tracking med-selection issue
- Identifying disasters the FIRST time cluefully
- Dont just wait for it to happen again!
- PLEASE! For everyones sake!
- Slightly off topic, but pretty darn important!
19Route Data Full Mesh (pic)
Core1
Core2
Core2
Core1
Core1
Core2
Collector
Core2
Core1
Core1
Core2
Core2
Core1
20Traffic Accounting Data
- Also Known As
- Flow-export
- NetFlow
- Cflow
- A MAJOR pain in the AS!
21The Quick Skinny on Flow
- Packet and Byte counters per unique set of
traffic attributes - Measured from strategic routers per input
interface - Which interfaces depends on your defined
goals/needs... - Come a long way in the last few years
- In some respects ?
22Flow Data Inbound - Easy
Core Network Services
Core1
Core2
Collector
Peer1
Peer2
Data
LocalASN
RemoteASN
AS2
AS3
AS3
AS4
Routes
AS9
23Flow Data Outbound - Easy
Core Network Services
Core1
Core2
Collector
Peer1
Peer2
Data
LocalASN
RemoteASN
AS2
AS3
AS3
AS4
Routes
AS9
24Flow Data Outbound - Harder
AS2
AS4
Core
Core
AS6
Core
Core
Core
AS3
25Flow Data Outbound - Harder
- Since flow-export data is inbound only, all
potential feeder links in a non-hierarchical,
mixed services device must be accounted for in
order to catch all traffic outbound - Issue How do you know what data coming in core
link4 is bound for the local external link? Route
Reflection is bad here! Can double-count! - Problem exacerbated by complex policy
2618 Words or less on flow data
- Micro-management of networks based on flows
BAD - Macro-management of networks based on flows
GOOD
27Operational Challenges (1)
- Keep this in mind!
- Gilbs Law
- Anything can be measured in a way that is
superior to not measuring it at all.
28Operational Challenges (2)
- ACLs vs. data-export in the great beast!
- Sampled NetFlow on the GSR is usually distributed
to the LCs - ACL gt SNF gt PIRC gt IP Coloring gt BGP Policy
accounting gt FR Traffic policing which is not FR
traffic shaping - Apparently this changes in 12.0(18)S
29Operational Challenges (3)
- Some releases of JUNOS have bugs where only flow
data from the highest-numbered ifIndex gets
exported - Check for PR20159
30Operational Challenges (4)
- On high-speed interfaces, the best you can
realistically do is sample at some ratio lt 11 - If you need to count bytes, this will introduce
errors - If you need to compare samples, make sure the
samples are normalized - This does NOT mean multiply by interval!
- Lack of current research on statistical validity
of flow data based on samples - Last research circa 1993
- Research predates substantial HTTP traffic
31Operational Challenges (5)
- The Gilb-Wepman Construct
- The total P.I.T.A. factor experienced through
the process of network measurement is far less
than the total P.I.T.A factor experienced through
planning and engineering a network without
network measurements. - P.I.T.A Pain In The Ass
- those without customers may be unfamiliar with
this term
32Performance Data
- Active measurement
- Round-trip vs. one-way
- mrtg and link utilization
- Important, but not part of our examples
- Short on time sadly
- Helps in goal selection and re-selection
- Bottom line is it better or worse?
33Applying Data to your Goals
- What to do with all this data?
- Traffic Accounting Data applied to Routing data?
- Traffic Load per ltsomethinggt
- attribute or route
- The focus here is on traffic stats (byte and
packet rates) per AS-PATH
34AS-PATH / Traffic-data tables
- Traffic load per AS-PATH creates a tree of
traffic relationships - (101) X-bits/sec
- (101,1234) Y-bits/sec
- (101,1234,9995) Z-bits/sec
- 101 -gt 1234 -gt 9995
- XYZ -gt YZ -gt Z
- Addresses the middle mile ASs instead of
traditional first or last ASN. - Allows "TO (source/sink) and "THROUGH (transit)
values instead of just "TO" values.
35Data Aggregation - Time
- Aggregate data over timeframes (macro-level view)
- Long term averages
- Short term benchmarks
- Of course, short term means long term.
- Micro-management of networks based on flows
- BAD!
36Data Aggregation - Interfaces
- Aggregate across the set of interfaces that
represent your problem statement - What interfaces am I interested in?
- Can be interface specific (one)
- Can be router specific (many)
- Can be domain wide (all)
- Can be N of M interfaces (some)
- Pretty common
37What to do with all this?
- What does one do once they have all this data?
38Eliciting Control and The Feedback Loop
- Sit down, Josh
- Begone with your Snake Oil
- Its time to beat on some routers
39Assumptions about your Routing Architecture
- Routes to external networks are in BGP
- Your IGP tells you how to find the NEXT_HOP
addresses in BGP - We select exit points for traffic based on BGP
path selection, not some other weird thing - If your routing policy differs significantly from
this, you have more problems than measurement can
solve
40Fixing Outbound Traffic
- Mark policy on BGP routes at the place where you
learn them - General policy -- prefer peering links over
expensive transit links, prefer private peering
links over public peering links - Specific policy -- temporarily avoid NAP X for
traffic to AS Y, prefer AS C to reach remote
network D
41Tweakable Knobs
- LOCAL_PREF
- MED
- AS_PATH
- Check your vendors BGP path selection tiebreaker
list, and chose a set of knobs that gives you the
kind of control your policy dictates
42Control of Outbound Traffic
- Danger, Will Robinson!
- Helpdesk phone may ring
- Small change, pause, check, log, pause, breathe,
repeat - Exit selection is a reasonably precise science
43Fixing Inbound Traffic
- Controlling inbound traffic flow is all about
trying to influence the BGP path selection
decisions which happens in networks you dont
control - Some of those networks you pay money to. Money is
sometimes an appropriate weapon - Its nice to buy people drinks at NANOG
44Tweakable Knobs
- Provider-specific knobs
- whois -h whois.ra.net as1755
- CIDR abuse
- Cheap trick
- Longest prefix wins
- AS_PATH stuffing
- AS_PATH pollution
- Another cheap trick
45Responsible Citizenship
- Some tweakable knobs have an unwelcome impact on
the networks of others - Have you met my friend, MED?
- Your relationship with your target networks is
symbiotic - It is inappropriate to make demands of someone
elses routing policy, but asking nicely is OK
46Conceptual Examples (1)
- Who are the top consumers of my network
resources? - Top sources of traffic
- Top sinks of traffic
- Asymmetry
47Conceptual Examples (2)
- Traffic Aggregation Points and Peering
Optimisation - Appropriate network expansion
- Offloading the expensive peer
- Mitigating settlement fees and traffic ratios
- Mitigating congestion
- Do it without MED selection issues
- Maximize route availibility (Ngt1 copies, not 1 or
0)
48Conceptual Examples (3)
- Theft-over-IP (how to know when peers are
stealing from you) - Peers dumping traffic at you for routes you
didnt send them - Rather rude
- Catch them in the act
49Who is doing this stuff?
- Yahoo! - Jeffrey Papen (TUNDRA Tool)
- Peering Analysis, Capacity Planning, Performance
Analysis - Features
- Custom macros for AS analysis
- Source and Destination AS bandwidth details
- Transit AS (hop counts) bandwidth summary data
- Bandwidth forecasting peering merit analysis
- Billing formulas for cost/benefit budget analysis
- Also
- Analyze internal usage for Charge Back Billing
- POP-to-POP Network Performance Analysis (latency
/ loss) - DOS attack detection
50Destination vs. Transit Traffic UUNet (Yahoo
TUNDRA Output)
51Who is doing this stuff?
- MFN
- Lots of people, we think
- Not enough people, we think
52Real Live Network Examples 1
- We peer with a particular large regional ISP in
several places. Due to various familiar reasons,
the demands on the peering circuits approach
supply - Who are the top talkers and top listeners that we
reach via this peer? - Maybe we can peer with them directly
- Not just sinks, but traffic aggregation points
(middle mile)
53Network Facts
- Topology is not pure core/edge in some locations,
so we might expect some complexities - All peering routers happen to be GSR12000s
- Peering circuits are all OC12
- Backbone links are mostly OC48
54Data Collection
- Relative traffic volumes
- Low NetFlow sample ratio is OK
- Turning on ip route-cache flow sampled seems
like it can cause traffic belches - Turn off all inbound ACLs on peering interfaces
- Turn off all outbound ACLs on peering routers
- Drink from the Hose
- Take off every /var
55Analysis of Data
- Relative byte count through and to networks
reached through the peer in question - Ranked list of peering candidates
- Absolute numbers dont really matter we have a
list of people we should be talking to, in order
of how useful they would be to peer with
56SeeASP Output
57Real Live Network Examples 2
- AS R wants to peer
- Thats fine, well public peer with anybody.
Were easy. - AS R wants to private peer right away, since they
say we send them 140M of traffic already - Can we confirm those numbers before we dedicate a
port to them?
58Network Facts
- We currently reach AS R through AS T
- We peer with AS T in six places
- One of the peering routers is a 7500, which
doesnt do SNF - One of the peering routers is a router which is
also being used to collect data to answer the
previous question
59More Network Facts
- Topology is not edge/core everywhere
- We want numbers out of this, so we need to manage
the SNF ratios - K1dd13s keep attacking the routers
- Ops folk attack K1dd13s with ACLs
- The ACL attacks the SNF
- The SNF dies!
60Analysis
- We only have traffic samples, but we want
absolute numbers - We have interface byte and packet counters
- We can take AS R traffic as a proportion of all
AS T traffic, and divide up the mrtg/duck data in
proportion
61Summary
- What did we talk about?
- Answering specific, ad-hoc questions by attacking
them with numbers - Inter-Domain Traffic Engineering is an Iterative
process (lather, rinse, repeat) - What didnt we talk about?
- Experience exporting from Juniper (and other
non-cisco) routers - Construction of a full-time, general-purpose
measurement infrastructure - What if my vendor does not support flow-export
and traffic accounting? - Questions?
- No? Good.