AHBHA: Managing Congestion through Adaptive HopByHop Aggregation

About This Presentation

Title:

AHBHA: Managing Congestion through Adaptive HopByHop Aggregation

Description:

Congestion: Applications/clients present a larger aggregate load than ... If former, smooth inputs; latter, throttle neighbors. input. input. input. output. MBG 11 ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 86

Provided by: peopleCs

Learn more at: http://people.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: AHBHA: Managing Congestion through Adaptive HopByHop Aggregation

1
AHBHA Managing Congestion through Adaptive
Hop-By-Hop Aggregation

Michael Greenwald,University of Pennsylvania

2
What is congestion?

Congestion Applications/clients present a larger
aggregate load than intermediate nodes in the
network can handle.
Congestion Control mechanism that ensures
network remains manageable under overload.
Much more difficult than the related problem of
Flow Control participants are unaware that they
share a resource

3
What causes congestion? (Isnt bandwidth cheap?)

Persistent congestion solved by adequate
provisioning (True, bandwidth is cheap).
Cause Intermittent high load
Intermittent emergency (earthquake, 9/11)
Extreme loads (expected Mothers day.
Unexpected Pathfinder pictures)
Periods of growth
DOS or DDOS attack
Effect Congestion
Bursty traffic (statistical multiplexing)
Many sources converge on a single link
Low capacity link becomes bottleneck
subset of multicast destinations

4
Why must congestion be controlled?

Congestion collapse
Links clogged with useless packets that will be
dropped anyway, or are retransmissions, or are
out-of-date
Long Delay
Relevant mostly for short transactions over long
distances.
Variability in delay (jitter)
Drop rate
Not a problem in itself, since packets only
dropped if cant make it through bottleneck
anyway, but
Use up bandwidth on other links before being
dropped.
Control over which packets get dropped?
Low Utilization (inefficiency)
Fairness

5
How is congestion controlled?

Slow-start/congestion avoidance
Losses-per-epoch/Fast Retransmit
Full Buffers, tail-drop gt RED
Non-compliant flows gt FRED, Penalty box etc.
Pkt drop/buffer size is noisy signal gt Vegas,
Adjust parameters gt BLUE, ARED
Avoid packet loss gt ECN
Explicit feedback gt XCP
Fat pipes gt TCP FAST
Fairness, lossy wireless gt TCP Westwood
Mice, Fairness, QOS, bad RTE

6
Response

Concern with robustness, efficiency, and fairness
Control theoretic approach
Stability, convergence
Make world safe for control theory
Controller reacts as quickly as signal changes
Know RTT, react quickly, change slowly
Response to feedback must be predictable
Behavior of aggregate independent of of flows.
Behavior of client/application/transport
predictable.

7
A Different View

Complexity Epicycles on epicycles
Fragility
The end-to-end argument misinterpreted
Trapped by success, religious dogma, need for
field testing
Congestion control common to all clients
Dont optimize for a particular application, even
TCP

8
A Different View

View from routers predictable response to
feedback
View from hosts delivery fabric with predictable
congestion feedback
Stark contrast with current system
Extreme example Aggregated small TCP flows do
not exponentially decrease or linearly increase.
1,000,000 flows, so window for each flow is small
(approx 1)
Congestion notifies 10 of the flows, decrease by
at most 10 of packets.
Regardless, each of 1,000,000 flows increases
cwnd by 1 each RTT, effectively doubling rate.
(alternatively, larger fraction in SlowStart)

9
AHBHAAdaptive Hop-by-Hop AggregationA simple
idea
Router architecture
Interconnect

Hop-by-hop feedback and controller at each node.
Why any different than CreditNet (Kung) or HBH
(Kanakia)?
Aggregate flows based on purely local
characteristics (next-hop X TOS X QOS) X input
What about head-of-line blocking? Local vs.
global behavior? Isolation of congestion?
Transitive renaming of congested links
Based on observation that most of the net is
adequately provisioned.

10
Controlling Utilization of a Resource

Consider a Flow Queue with a current queue
length, a known rate capacity, and a set of input
flows
Capacity may be a physical limit for a physical
link, or a rate limit imposed by a neighbor for
finer grained flow.
Input flows may be local flows sharing a single
physical output link, or may be flows coming in
from neighbors.
Queuelength gt threshold triggers congestion
control (at most once per RTT)
Must determine whether queue growth due to
burstiness or due to input rate exceeding output
rate
If former, smooth inputs latter, throttle
neighbors.

11
Controlling Utilization of a Resource

MAIR (Mean Aggregate Input Rate) Sum (over
inputs) (smoothed) number of packets /
interval
If MAIR lt output capacity, then input flows need
only be smoothed.
Pacing 1/pkts-per-RTT
If MAIR gt output capacity, then input rates must
be reduced
Acceptable rates computed in 2 passes.
BaseAllocation Capacity/Nflows. // Can use
weights per flow instead.
ExcessAllocation 0 UncontrolledFlows 0for
flow in inputs, if 2flowRate lt BaseAllocation,
then ExcessAllocation BaseAllocation -
2flowRate UncontrolledFlowsend
FairAllocation BaseAllocation
ExcessAllocation/(Nflows-UncontrolledFlows)
Send FairAllocation if input rate gt
FairAllocation

Input from fairness controller
Drain queue in 2 RTT
12
Renaming to Isolate Congestion
R1
R2

R1-gtR2 becomes congested.
QuarantineSet N NextHop(N)_at_R1 R2

13
Renaming to Isolate Congestion
R1
R2
R1

Create artificial node R1
QuarantineSet N NextHop(N)_at_R1 R2
Routing update to all neighbors of R1 advertising
R1 as best path to N.

14
Renaming to Isolate Congestion
R1
R2
R1

If queue to R1 is congested, recurse and split
artificial node and advertise to input queues.

15
Releasing control

Record time of transition from uncontrolled to
controlled.
Record time of most recent (last) congestion
event.
NewCongestionInterval LastCongestionEvent -
FirstCongestionEvent
CongestionInterval max(NewCongestionInterval,Ol
dCongestionInterval)
If ((now-LastCongestionEvent) gt
CongestionInterval)
Release control
OldCongestionInterval max(OldCongestionInterva
l/2,NewCongestionInterval)
// Release state after 10 CongestionInterval w/
no congestion
// OldCongestionInterval gt 4max(IRTT,ORTT)

16
Miscellaneous Details

Uncontrolled flows
XMIT If X packets sent in RTT interval t0, then
at most 2X packets in interval t1
Periodic CC packets between immediate neighbors
List controlled flows and rate
Once per Max(RTT/2, 20 PacketTimes)
If no CC pkt from N in RTT interval, then all
flows are controlled at X/2.
CC packets high priority
Compute RTT
Assume max rate known neighbors.
Good assumption for dedicated lines
May need to be estimated for Ether/shared channel
or multi-hop neighbors

17
Advantages

Works for Mice, Elephants, Non-TCP flows
Long-delay flows ramp-up in log(n) round-trips
high utilization
Doesnt treat loss as congestion signal
Not sensitive to parameters
Fairness decoupled from CC mechanism agnostic on
policy, or policy delivery mechanism. Can work
with either packet marking (diffsrv) or
flow-weighting (periodic packets from src to dst,
providing per-flow weights).
Aggregation per-hop control makes flows
smoother and less self-similar
Response time to source comparable to e2e packet
loss.

18
Serendipity

Buffer sizes
Per-link rather than cross-network
1 buffer per neighbor, rather than 1 per flow
Broken routers, misbehaving hosts, DOS attacks
Multicast
Simplifies TCP

19
Preliminary Observations

Significantly simpler than current world (let
alone more complex world)
AHBHA comparable in all cases. Never
significantly better. Simulations using ns2
RED (varying capacities loads), Floyd TCP
Friendly, FAST (), TCP WESTWOOD(), XCP (),
AHBHA Regions Legacy, Defective routers, DOS,
Short flows (1 pkt), Mbottle, SimpleTCP, w/ECN
to source
() Compared to results in paper, () needed to
compile separate versions of ns2
Works with non-cooperating clients and routers.

20
Preliminary Non-Results

Stability not proven
Convergence not proven
Convergence-time not established
(On the other hand, intuitive reasons to believe
stable e.g. bounded input increase,
superposition of stable systems, RTTs are equal
(not just by assumption))
If many congestion points, then lose congestion
isolation

21
Unresolved Issues with Naming Controlled
Aggregates

1 bit for renaming next hop? B bits? Exhaustive
list?
Aggregate by next hop? Or 2nd hop (horizon
effect)?
More hysteresis in determining CongestionInterval,
rather than Releasing control after quiet
period.
Right choices depend on patterns of congestion in
real network.
Measurement required.

22
cing Measuring Network-Internal Delays using
only Existing Infrastructure

joint work with Kostas Anagnostakis
University of Pennsylvania Raphael RygerYale
University

23
Remote measurement of per-link delays

Network measurement techniques
Understanding of control mechanisms (such as TCP
congestion control) --- both results and workload
Gain insights into network performance
Fault Isolation, Error reporting
Curiosity switch providers?
Network parameters such as delay, loss, and
throughput are easy to measure end-to-end
Network parameters such as delay, loss, and
throughput are difficult to measure on individual
links inside the network.

RESEARCH
MANAGEMENT
USER
24
Understand your toolsKnow yourself

How accurate are the results?
Why do we believe it is accurate?
What are its limitations?
Answering these questions is difficult, sometime
surprising, and results in a much better tool.

25
Network Delay Tomography A Brief History
X2
B
A
From a remote source, S, estimate the
distribution of link delays
a2
X1
S
1
a5
a1
a4
2
X3
a3
C

Direct measurement, using existing tools (e.g.
pathchar)
ltRTT to tailgt - ltRTT to headgt yields RTT on link
(TTL-expired responses)
Only existing infrastructure measure anywhere
w/o cooperation.
But
ICMP responses representative?
Asymmetric paths return paths vary so (tail -
head) may not be meaningful.
Round trip vs. one-way delay?

26
Network Delay Tomography A Brief History
X2
B
A
From a remote source, S, estimate the
distribution of link delays
a2
X1
S
1
a5
a1
a4
2
X3
a3
C

Indirect inference methods (e.g. minc project)
One packet to multiple sources, and correlate
behavior on links in the resulting tree
But
Deployability (works best with multicast, need
cooperating rcvrs)
Accuracy (assumes independence of delay, quality
of estimates degrades over longer paths)
Robustness (high variance in error)
Computational complexity
Need for many samples, therefore much time

27
Network Delay Tomography A Brief History
X2
B
A
From a remote source, S, estimate the
distribution of link delays
a2
X1
S
1
a5
a1
a4
2
X3
a3
C

Direct methods (e.g. cing project)
f(ltTimestamp to tailgt,ltTimestamp to headgt) yields
delay on link
No infrastructure required, highly accurate,
strong experimental validation
But
Packet pair may not encounter equal queues
ICMP processing may not be representative
Clocks are unsynchronized
Routing irregularity, so not always applicable

28
Network tomography a direct method

Use router ICMP Timestamp messages and
packet-pair probes to directly estimate queuing
delay

2
1
3
A
B
2
1
Account for fixed, by subtracting min time over
set of observations
propagation delay
variable
fixed
queueing delay
29
Question your assumptions
2
1
3
A
B
2
1

Feasibility basic mechanism supported? Accuracy?
Stability of routing? etc. etc.
Do back-to-back packets really experience the
same delay on their shared path?
Are ICMP processing times indicative of
processing time for normal packets?
How to account for differing offset and skew on
clocks?
Are the paths to adjacent nodes coincident?

30
Back-to-back packets

Do packets arrive back-to-back?
Do back-to-back packets experience identical
queuing delays and process time? (and stay
back-to-back?)
Distinctions are irrelevant to algorithm the
issue is simply difference in timestamped value.
Experiment Probe routers with varied load and
varied path length from source

7300 routers
This issue common to all algorithms
31
ICMP Processing Time
Cooperating rcvr
Allows spoofed src
X2
A
a2
X1
S
a5
a1
a4
2
response
X3
a3
A?2, req
spoof

Send direct first, so queuing delays err
conservatively and overestimate ICMP processing
time.
Median processing time always negligible
95 usually negligible

Dot median, boxinterquartile range, bars
5-95, dots are outliers.

This issue common to all direct measurements that
use ICMP
Variation in processing time between head tail
Comparison w/non-ICMP traffic

32
ICMP Processing Time
Cooperating rcvr
Allows spoofed src
X2
A
a2
X1
S
a5
a1
a4
2
response
X3
a3
A?2, req
spoof

Spoofing and cooperation limits scope of
experiment (7 targets, 20 routers).
Broader study? If processing delays significant
on head of link, then estimated queuing delay for
link should sometimes be negative.
Occasionally present in 9.9 of sample (1,368)

Dot median, boxinterquartile range, bars
5-95, dots are outliers.
33
Unsynchronized Clocks
2
1
3
A
B
2
1
?I
OA,2
OA,1
OA,2 - OA,1

Clock offsets may vary because of clock skew or
jumps to adjust for skew.
Both src dst may jump, and may be skewed in
opposite directions
May distort individual observations, as well as
provide an erroneous minimum for d2prop
Impossible to tell for individual observation
whether ?t due to queuing or clock artifact

34
Unsynchronized Clocks
Local clock

Post processing looks at multiple observations
RTT provides valuable clues queuing and max
Can recover skew only care if jump occurred
between request response
Look for colinear regions label others cant
tell

35
Routing Issues
X2
B
A
a2
A routing map, R, is regular over a graph G if
Rs(m) Rs(d) for all m in s?d
X1
S
1
a5
a1
a4
2
X3
3
a3
C

Reachability It is easy to see that most links
are not measurable by the direct method from a
single source many links connect to a node, but
only one is on the path from S.
(In some sense, S is mainly interested in links
reachable from S.)
Regularity If the path to the head of a link is
not a prefix of the path to the tail of the link,
then we cannot meaningfully subtract the
timestamp responses.

36
Irregular routing
Internet routing is irregular.

Nevertheless
Coverage for single links ranges from 20 (SRI)
to 53 (LIACS)
Multiple sources increase the likelihood
measurably
Multi-hop segments increase the likelihood of
coverage

37
Simpler approach?TTL vs. Timestamp

Why arent TTL-limited RTT measurements (ala
pathchar) sufficient?
TTL-limiting removes the routing problem
RTT measurements removes the clock problem.
Accuracy
Asymmetry
One-way vs. round-trip
Not back-to-back on return path

10,931 links
38
A hybrid solution

Indirect inference is accurate for small trees
look only at small, isolated trees.
Timestamps and TTL-limited probes make every
router a cooperating receiver.
Indirect inference can isolate return-path delays
from forward path delays.
Indirect inference can determine delay
distribution in shared portion of overlapping
segments
Deconvolution

39
Putting it all together

By combination/choice of Timestamps, RTT,
TTL-expiration and using either Indirect methods
of MINC or deconvolution we can cover just about
every link in the Internet, often by many methods
But which methods to use?

40
Putting it all together

Shared link is most accurate for MINC
Deconvolution is only as accurate as the least
accurate segment.
But which methods to use?

41
Relative Accuracy

Estimated vs. Actual Mean delay
2nd row shows effect of divergent paths 200ms
extra delay on path of 2nd pair

42
Increased Coverage
Multiple sources
Granularity vs. accuracy
43
Collecting vast quantities of data

Individual delay measurements over thousands of
paths, tens of thousands of nodes, millions of
samples
Long running simulation of AHBHA can generate
petabytes of data for moderate size networks.
How can we accurately collect these measurements
without sinking under their weight?

44
Space-Efficient Online Computation of Quantile
Summaries

joint work with Sanjeev KhannaUniversity of
Pennsylvania

45
Summarizing extremely large data sets

The problem
Vast quantities of data, perhaps ephemeral
Memory is limited and observations are lost once
observed
Therefore construct a proxy data structure of
manageable size, able to return needed
information
What kind of information do we need?
Distribution of values
Quantile queries Given a quantile, ?, return the
value whose rank is ??N?
e.g. min, max, median, 90th percentile, 99th
percentile
Munro Paterson 1980 (Pohl1969) p-pass
algorithm to compute exact quantile requires
?(N1/p) space.

46
Trading off accuracy for space

Explicit a priori guarantee on precision of the
approximation, but try to use the smallest memory
footprint possible.
Explicit and tunable a priori guarantee on
maximum memory footprint, and make the
approximation as accurate as possible.

47
Trading off accuracy for space

Explicit a priori guarantee on precision of the
approximation, but try to use the smallest memory
footprint possible.
Explicit and tunable a priori guarantee on
maximum memory footprint, and make the
approximation as accurate as possible.

Histograms
48
Trading off accuracy for space
?-approximate quantile summary

Explicit a priori guarantee on precision of the
approximation, but try to use the smallest memory
footprint possible.
An ?-approximate quantile summary can answer any
quantile query to within a precision of ?
Given a quantile, ?, return a value whose rank is
guaranteed to be within the interval (? - ? )N,
(? ? )N

49
Requirements

Explicit tunable a priori guarantees on the
precision of the approximation
As small a memory footprint as possible
Online Single pass over the data
Data Independent Performance guarantees should
be unaffected by arrival order, distribution of
values, or cardinality of observations.
Data Independent Setup no a priori knowledge
required about data set (size, range,
distribution, order).

50
Related Work

Manku, Rajagopalan, and Lindsay generalize a
class of 1-pass algorithms (e.g. Agrawal Swami
COMAD95, Alsabti, Ranka Singh VLDB97),
SIGMOD98
a priori knowledge of size of data set
O((1/?) log2 (? N)) worst case space
does not exploit any structure in observations
SIGMOD99
Give up deterministic guarantee in exchange for
dropping the requirement of a priori knowledge of
size of data set
Gibbons, Matias, Poosala VLDB97 Chaudhuri,
Motwani, Narsayya SIGMOD98
Multiple passes ( CMN only probabilistic
guarantee)

51
Our epsilon-approximate quantile summary
52
Overview of Summary Data Structure
??.01, N1750
192
204
201
529,536
539,540
501,503

Keep a data structure that stores vi, rmin(vi),
and rmax(vi) for each observation.
vi value of ith observation stored in the
summary
ltv0, v1, . vi, vS-1gtS can be ltlt N
rmin(vi) minimum possible rank of vi
rmax(vi) maximum possible rank of vi

53
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1750
15,2
28,7
10,1
192
204
201
529,536
539,540
501,503

Keep a data structure that stores vi, rmin(vi),
and rmax(vi) for each observation. Tuple vi,
gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
rmax(vi) - rmin(vi)
Quantile ? .3? Compute r and choose best vi

54
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1750
15,2
28,7
10,1
2?N35
192
204
201
529,536
539,540
501,503

Keep a data structure that stores vi, rmin(vi),
and rmax(vi) for each observation. Tuple vi,
gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
rmax(vi) - rmin(vi)
If (rmax(vi1) - rmin(vi) - 1) lt 2?N, then
?-approximate summary.
Our goal always maintain this property.

55
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1750
15,2
28,7
10,1
2?N35
192
204
201
529,536
539,540
501,503

Keep a data structure that stores vi, rmin(vi),
and rmax(vi) for each observation. Tuple vi,
gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
rmax(vi) - rmin(vi)
Goal always maintain ?-approximate summary
(rmax(vi1) - rmin(vi) - 1) (gi ?I - 1) lt
2?N
Insert new observations into summary

56
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1750
15,2
28,7
10,1
2?N35
197
192
204
201
529,536
539,540
502,536
501,503

Keep a data structure that stores vi, rmin(vi),
and rmax(vi) for each observation. Tuple vi,
gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
rmax(vi) - rmin(vi)
Goal always maintain ?-approximate
summary (rmax(vi1) - rmin(vi) - 1) (gi ?I -
1) lt 2?N
Insert new observations into summary

57
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1751
15,2
28,7
1,34
10,1
2?N35.02
197
192
204
201
530,537
540,541
502,536
501,503

Keep a data structure that stores vi, rmin(vi),
and rmax(vi) for each observation. Tuple vi,
gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
rmax(vi) - rmin(vi)
Goal always maintain ?-approximate
summary (rmax(vi1) - rmin(vi) - 1) (gi ?I -
1) lt 2?N
Insert new observations into summary

58
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1751
15,2
28,7
1,34
10,1
2?N35.02
197
192
204
201
530,537
540,541
502,536
501,503

Keep a data structure that stores vi, rmin(vi),
and rmax(vi) for each observation. Tuple vi,
gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
rmax(vi) - rmin(vi)
Goal always maintain ?-approximate
summary (rmax(vi1) - rmin(vi) - 1) (gi ?I -
1) lt 2?N
Insert new observations into summary
Delete all superfluous entries.

59
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1751
15,2
28,7
1,34
10,1
2?N35.02
192
204
201
530,537
540,541
501,503

Keep a data structure that stores vi, rmin(vi),
and rmax(vi) for each observation. Tuple vi,
gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
rmax(vi) - rmin(vi)
Goal always maintain ?-approximate
summary (rmax(vi1) - rmin(vi) - 1) (gi ?I -
1) lt 2?N
Insert new observations into summary
Delete all superfluous entries.

60
Reducing space requirement of summary

Delete all superfluous entries What do we mean
by superfluous entries?
Goal minimizing workspace --- not size of final
summary
Can always reduce the final summary to size
O(1/?).
Deletion rule (compress) will reduce summary
size, but will take care to keep workspace small
regardless of incoming observations.
To explain COMPRESS operation, we need to develop
some more terminology

61
Terminology

Full tuple A tuple is full if gi ?I 2?N
Full tuple pair A pair of tuples is full if
deleting the left-hand tuple would overfill the
right one
Capacity number of observations that can be
counted by gi before the tuple becomes full. (
2?N - ?I)
We say that ti and tj have similar capacities if
log capacity(ti) ? log capacity(tj) (intuition,
not defn)
Similarity partitions the possible values of ?
into bands.

62
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14
vi,gi,?i
S

The bands can be used to impose a tree structure
over the tuples.
Group tuples with similar capacities into bands

63
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14
S

The bands can be used to impose a tree structure
over the tuples.
Group tuples with similar capacities into bands

64
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14

The bands can be used to impose a tree structure
over the tuples.
Group tuples with similar capacities into bands

65
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14

The bands can be used to impose a tree structure
over the tuples.
Group tuples with similar capacities into bands
First (least index) node to the right with higher
capacity band becomes parent.

66
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14

The bands can be used to impose a tree structure
over the tuples.
Group tuples with similar capacities into bands
First (least index) node to the right with higher
capacity band becomes parent.

67
COMPRESS operation

General strategy delete tuples with small
capacity and preserve tuples with large capacity.
1) Deletion cannot leave descendants unmerged ---
it must delete entire subtrees
2) Deletion can only merge a tuple with small
capacity into a tuple with similar or larger
capacity.
3) Deletion cannot create an over-full tuple
(i.e with g? gt floor(2?N))

68
Analysis

Theorem
At any time n, the total number of tuples stored
in S(n) is at most (11/2?)log(2?n)
Sketch of proof
Each tuple requires the support of many
observations in order to survive a COMPRESS
Only n observations
Therefore only a relatively small number of
tuples can survive

69
Useful Lemmas

A tuple that survives insertion at time m must
have ? floor(2?m) (else would be immediately
deleted (has no descendants, and if smaller ?
then parent has capacity to absorb it)).
If ?i and ?j are ever in the same band, they will
always be in the same band. (Technical details
on defn of band band boundaries are only
deleted, never created).
The number of observations covered cumulatively
by tuples in bands 0.. ? is bounded by 2?/?

70
Limited number of full tuple pairs in each band

For any given ?, at most 4/? nodes from band ?
are right partners in a full tuple pair.
Defn If neighbors are a full tuple pair, then
gj-1 gj ?j gt 2?n
Assume p pairs exist. Sum over all such
pairs ?gj-1 ? gj ? ?j gt 2p?n
2?gj ? ?j gt 2p?n
?gj is bounded by of observations in bands
0..? (2?)/?
? ?j is bounded by max ?j in ?, 2?n - 2?-1
(2?1)/? p(2?n - 2?-1) gt 2p?n
4/? gt p
What about non full tuple pairs? At most 1 per
parent.

71
Each parent requires many descendants to survive
COMPRESS
Vi
Vj

At time n, for any ?, at most 3/2? nodes have a
child in band ?
Choose a parent Vi with a child in band ?.
Choose the rightmost child, Vj. Let mj (lt n -
2?-1/(2?)) be the time Vj was inserted.
Red nodes, descendants of Vj, and anything merged
into Vj, must have arrived after n - 2?1/(2?)
?g in picture gi ?i gt 2?n gi(mj) ?i lt
2?mj
?g in picture gi (since n - 2?1/(2?)) gt 2?(n -
(n - 2?-1/(2?)))
At most 2?1/(2?) observations avail, each Vi
needs gt 2? (2?-1/(2?))
Therefore at most 2/? parents of nodes in band ?
(more complexity needed to get to 3/2?)

72
Analysis

Theorem
At any time n, the total number of tuples stored
in S(n) is at most (11/2?)log(2?n)
Combining Lemmas
4/? pairs per band
At most 3/(2?) parents of children in band
At most 1 singleton per parent
11/(2?) tuples per band
At most log(2?n) bands

73
Experimental Results

Measurement
S
Observed ? (vs. desired ?) max, avg, and for 16
representative quantiles
Optimal max observed ?
Compared 3 algorithms
MRL
Preallocated (1/3 number of stored observations
as MRL)
Adaptive allocate a new quantile only when
observed error is about to exceed desired ?
Optimization in algorithm
Keep entries up to high water mark (can only help)

74
Random Input
Space
Error
75
Handling Deletions

Artificial data set
ATT CDR median length of active phone call?

76
Summarizing Quantile Summaries

Empirically, behaves very well indeed
On average, for random input, seems to use
constant space
Best-known worst-case guarantees
GK used as a black box to improve other
algorithms
Munro Patersons classic p-pass algorithm for
computing median exactly. GK reduces
space/number of passes by a factor of Omega(log
n)
Probabilistic quantile summaries
The basic data structure has applications to
other problems
Order statistics in sensor networks

77
Concluding remarks

AHBHA
seems very promising but a lot of work is
needed to evaluate it properly.
Cing
Identified problems with existing techniques
The hybrid approach was an obvious idea, but
required a lot of work and care to succeed
As accurate as cing, almost universally
applicable.
Quantile Summaries
Exploit as much information as possible.
Proof is unsatisfying inelegant because of
complexity notion of bands and COMPRESS
non-intuitive
Result is significant improvement with several
unexpected applications.

78
General remarks

A small shift in view can sometimes yield large
reductions in complexity
Even simple solutions to large scale problems are
extremely difficult to evaluate --- many details,
many cases, unexpected interactions, many
metrics. As a discipline we do not have a good
methodology for evaluation.
Experimental results are surprisingly difficult
to obtain, confirm, and evaluate. It is worth
persevering.
Formal analysis of sub-problems can give us solid
ground to stand on even when large problem is
analytically intractable. It can also yield
significant practical improvements.
Successful systems research needsvision,
experimental technique, and formal analytic skills

79
Ongoing projects

AHBHA congestion control, network architecture
Cing network delay tomography, large scale
measurement studies
Streaming data summaries
Sensor networks balanced power, order
statistics, communication optimizations
Coverage cooperative virus defense w/ untrusted
peers
EXCHANGE peer2peer incentives
Harmony generic, safe, reconciliation of OTS
apps
Canon consistent security for heterogeneous
systems
NBS practical non-blocking algs, contention in
distributed algorithms,

80
The End
81
Simpler approach?TTL vs. Timestamp

Why arent TTL-limited RTT measurements (ala
pathchar) sufficient?
TTL-limiting removes the routing problem
RTT measurements removes the clock problem.
Accuracy
Asymmetry
One-way vs. round-trip
Not back-to-back on return path

10,931 links
82
Network Tomography feasibility

TIMESTAMP support? 96 response to TIMESTAMPs
TIMESTAMP indicative of normal packets? within
ms resolution
Clock synchronization? robust post-facto
algorithm
Irregular routing limits choice of nodes

Example Path structure, Penn to Sprintlabs
Corresponding feasible measurement partitions,
Penn to Sprintlabs
83
Network tomography feasibility (2)

Data 10k paths from 5 different sources
Metric fraction of nodes usable for tomography
Results 50 nodes are usable, more difficult
as distance from source increases, better when
probing from multiple sources

84
Why is this not ideal?Accepted quibbles

Non-TCP Assumes everything TCP-Friendly
Packet loss due to errors (e.g. wireless)
considered congestion signal
Bad RTE can also cause false signals
Mice (congestion control only kicks in after 6
packets or so)
Large bandwidth-delay pipes
Self-similarity of traffic (bursty)
Buffer occupancy (high)
RED hard to configure to perform well (different
parameters for different scenarios)
Fairness
QOS