Title: On The Marginal Utility of Network Topology Measurements
1On The Marginal Utility of Network Topology
Measurements
- John Byers
- with
- Paul Barford (now at Wisconsin),
- Azer Bestavros, and Mark Crovella
2Measurement Philosophy
- Current Dogma When conducting a wide-area
measurement study more is better. - More measurements
- More measurement sites
- True, but taking more measurements and deploying
more infrastructure is expensive! - Our focus How much better is more?
- Even harder When can we stop measuring?
- Not much work on this topic in our community.
3Problem InstanceDiscovering Internet Topology
- Typical goal discover the router-level Internet
graph - Typical approach merge lists of known nodes and
edges - Traceroute reports the IP path from A to B
- i.e., how IP paths are overlaid on the router
graph
4Traceroute studies
- Yield overlays of projections from Ss to Ds
- Sources active, expensive
- Destinations passive, cheap
D
D
D
D
D
S
S
5Motivating Questions
- How should we use traceroute and what can it
discover? - Physical topology (nodes, links)?
- IP routing topology?
- Whats a good way to organize a
collection-of-traceroutes study? - Many sources?
- Many destinations?
- How much is enough?
6Theoretical Inroads
- Take a graph G (V, E) and a routing algorithm
R. - Choose j sources and k destinations at random.
- Consider the subgraph G (V, E) induced by
routes from R between all (S, D) pairs. - How do expected values of V and E scale as
a function of j and k ? - Chuang-Sirbu scaling law is special case for j
1. - Marginal utility of adding k1 st source or
destination is expected contribution to V or
E.
7What might we expect?
- Two extremal cases
- Clique each new (S, D) discovers a new path
- Star each new S or D discovers only a small
neighborhood
D
D
D
D
D
D
D
D
D
D
Clique
Star
8Skitter to the Rescue
- Two datasets from CAIDA
- Small dataset May 2000
- 8 sources, 1277 destinations, 20K paths
- Sources in New Zealand, Japan, Singapore, San
Jose (2), Ottawa, London, Washington - All sources traced to all destinations
- Large dataset October 2000, 30 times bigger
- 12 sources, 313709 destinations, 600K paths
- No destination common to all sources, or vice
versa
9Interface Disambiguation
- Traceroutes report only on interfaces used
- Routers often have multiple interfaces
- But merging traceroutes requires matching routers
- Solution probe each interface from some site X
- Routers are supposed to respond on the interface
used for routing to X - Results in set of (probe interface, response
interface) pairs - Each connected component is taken to be a router
10Classifying Nodes
- Core, border, stub, leaf
- Solely from traceroute information
Leaf
Border
Core
Stub
11Classification depends on msmts
Core
Stub
Border
12Limitations and Caveats
- Interface disambiguation
- 13 of interfaces never responded
- Node classification
- Identifying a border node requires two paths to
it - Representativeness
- Datasets are small, may not be representative
- Skitter sources not selected at random
- Unknown coverage of true network
- Diminishing returns may not signify good coverage
13Diminishing Returns (Small Dataset)
14Diminishing Returns (Large Dataset)
15Diminishing returns by Classification (Small
Dataset)
Core
Stub
Border
16What Does This Suggest?
D
D
S
D
D
S
D
D
17Adding Destinations Nodes
Slope is about 3
18Adding Destinations Links
Slope is about 4
19Add Sources or Destinations?
Isolines represent constant node discovery,
varying Ss or Ds
20Node Degree Distribution
1 Source
8 Sources
21Node Degree Distribution Tail
8 Sources
1 Source
22Degree distribution convergence RMSE
23Information Theory Plug
- Can compare marginal utility of different
processes.
Link Discovery
Node Discovery
24Related Work
- Pansiot Grad 98
- First multi-traceroute study
- Similar methodology, incl. interface
disambiguation - Chuang Sirbu 98Phillips, Shenker
Tangmunarunkit 99 - single-source case, found sublinear growth of
multicast tree with added destinations - Govindan Tangmunarunkit 00
- Extensive node discovery, overcoming limitations
of traceroute - Broido Claffy 01
- Larger datasets more detailed look at graph
structure
25Conclusions
- Rigorous quantification of marginal utility of
additional measurements. - To discover all physical nodes, traceroute is
inefficient - Diminishing returns many Ss and Ds needed
- Trading off Ss and Ds
- Adding destinations seems more cost-effective
- To discover how typical routes pass through
network, traceroute is informative - Routing core and feeders
- Much of routing core is visible from few Ss
(given enough Ds)