AHBHA: Managing Congestion through Adaptive HopByHop Aggregation - PowerPoint PPT Presentation

1 / 85
About This Presentation
Title:

AHBHA: Managing Congestion through Adaptive HopByHop Aggregation

Description:

Congestion: Applications/clients present a larger aggregate load than ... If former, smooth inputs; latter, throttle neighbors. input. input. input. output. MBG 11 ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 86
Provided by: peopleCs
Category:

less

Transcript and Presenter's Notes

Title: AHBHA: Managing Congestion through Adaptive HopByHop Aggregation


1
AHBHA Managing Congestion through Adaptive
Hop-By-Hop Aggregation
  • Michael Greenwald,University of Pennsylvania

2
What is congestion?
  • Congestion Applications/clients present a larger
    aggregate load than intermediate nodes in the
    network can handle.
  • Congestion Control mechanism that ensures
    network remains manageable under overload.
  • Much more difficult than the related problem of
    Flow Control participants are unaware that they
    share a resource

3
What causes congestion? (Isnt bandwidth cheap?)
  • Persistent congestion solved by adequate
    provisioning (True, bandwidth is cheap).
  • Cause Intermittent high load
  • Intermittent emergency (earthquake, 9/11)
  • Extreme loads (expected Mothers day.
    Unexpected Pathfinder pictures)
  • Periods of growth
  • DOS or DDOS attack
  • Effect Congestion
  • Bursty traffic (statistical multiplexing)
  • Many sources converge on a single link
  • Low capacity link becomes bottleneck
  • subset of multicast destinations

4
Why must congestion be controlled?
  • Congestion collapse
  • Links clogged with useless packets that will be
    dropped anyway, or are retransmissions, or are
    out-of-date
  • Long Delay
  • Relevant mostly for short transactions over long
    distances.
  • Variability in delay (jitter)
  • Drop rate
  • Not a problem in itself, since packets only
    dropped if cant make it through bottleneck
    anyway, but
  • Use up bandwidth on other links before being
    dropped.
  • Control over which packets get dropped?
  • Low Utilization (inefficiency)
  • Fairness

5
How is congestion controlled?
  • Slow-start/congestion avoidance
  • Losses-per-epoch/Fast Retransmit
  • Full Buffers, tail-drop gt RED
  • Non-compliant flows gt FRED, Penalty box etc.
  • Pkt drop/buffer size is noisy signal gt Vegas,
  • Adjust parameters gt BLUE, ARED
  • Avoid packet loss gt ECN
  • Explicit feedback gt XCP
  • Fat pipes gt TCP FAST
  • Fairness, lossy wireless gt TCP Westwood
  • Mice, Fairness, QOS, bad RTE

6
Response
  • Concern with robustness, efficiency, and fairness
  • Control theoretic approach
  • Stability, convergence
  • Make world safe for control theory
  • Controller reacts as quickly as signal changes
  • Know RTT, react quickly, change slowly
  • Response to feedback must be predictable
  • Behavior of aggregate independent of of flows.
  • Behavior of client/application/transport
    predictable.

7
A Different View
  • Complexity Epicycles on epicycles
  • Fragility
  • The end-to-end argument misinterpreted
  • Trapped by success, religious dogma, need for
    field testing
  • Congestion control common to all clients
  • Dont optimize for a particular application, even
    TCP

8
A Different View
  • View from routers predictable response to
    feedback
  • View from hosts delivery fabric with predictable
    congestion feedback
  • Stark contrast with current system
  • Extreme example Aggregated small TCP flows do
    not exponentially decrease or linearly increase.
  • 1,000,000 flows, so window for each flow is small
    (approx 1)
  • Congestion notifies 10 of the flows, decrease by
    at most 10 of packets.
  • Regardless, each of 1,000,000 flows increases
    cwnd by 1 each RTT, effectively doubling rate.
    (alternatively, larger fraction in SlowStart)

9
AHBHAAdaptive Hop-by-Hop AggregationA simple
idea
Router architecture
Interconnect
  • Hop-by-hop feedback and controller at each node.
  • Why any different than CreditNet (Kung) or HBH
    (Kanakia)?
  • Aggregate flows based on purely local
    characteristics (next-hop X TOS X QOS) X input
  • What about head-of-line blocking? Local vs.
    global behavior? Isolation of congestion?
  • Transitive renaming of congested links
  • Based on observation that most of the net is
    adequately provisioned.

10
Controlling Utilization of a Resource
  • Consider a Flow Queue with a current queue
    length, a known rate capacity, and a set of input
    flows
  • Capacity may be a physical limit for a physical
    link, or a rate limit imposed by a neighbor for
    finer grained flow.
  • Input flows may be local flows sharing a single
    physical output link, or may be flows coming in
    from neighbors.
  • Queuelength gt threshold triggers congestion
    control (at most once per RTT)
  • Must determine whether queue growth due to
    burstiness or due to input rate exceeding output
    rate
  • If former, smooth inputs latter, throttle
    neighbors.

11
Controlling Utilization of a Resource
  • MAIR (Mean Aggregate Input Rate) Sum (over
    inputs) (smoothed) number of packets /
    interval
  • If MAIR lt output capacity, then input flows need
    only be smoothed.
  • Pacing 1/pkts-per-RTT
  • If MAIR gt output capacity, then input rates must
    be reduced
  • Acceptable rates computed in 2 passes.
  • BaseAllocation Capacity/Nflows. // Can use
    weights per flow instead.
  • ExcessAllocation 0 UncontrolledFlows 0for
    flow in inputs, if 2flowRate lt BaseAllocation,
    then ExcessAllocation BaseAllocation -
    2flowRate UncontrolledFlowsend
  • FairAllocation BaseAllocation
    ExcessAllocation/(Nflows-UncontrolledFlows)
  • Send FairAllocation if input rate gt
    FairAllocation

Input from fairness controller
Drain queue in 2 RTT
12
Renaming to Isolate Congestion
R1
R2
  • R1-gtR2 becomes congested.
  • QuarantineSet N NextHop(N)_at_R1 R2

13
Renaming to Isolate Congestion
R1
R2
R1
  • Create artificial node R1
  • QuarantineSet N NextHop(N)_at_R1 R2
  • Routing update to all neighbors of R1 advertising
    R1 as best path to N.

14
Renaming to Isolate Congestion
R1
R2
R1
  • If queue to R1 is congested, recurse and split
    artificial node and advertise to input queues.

15
Releasing control
  • Record time of transition from uncontrolled to
    controlled.
  • Record time of most recent (last) congestion
    event.
  • NewCongestionInterval LastCongestionEvent -
    FirstCongestionEvent
  • CongestionInterval max(NewCongestionInterval,Ol
    dCongestionInterval)
  • If ((now-LastCongestionEvent) gt
    CongestionInterval)
  • Release control
  • OldCongestionInterval max(OldCongestionInterva
    l/2,NewCongestionInterval)
  • // Release state after 10 CongestionInterval w/
    no congestion
  • // OldCongestionInterval gt 4max(IRTT,ORTT)

16
Miscellaneous Details
  • Uncontrolled flows
  • XMIT If X packets sent in RTT interval t0, then
    at most 2X packets in interval t1
  • Periodic CC packets between immediate neighbors
  • List controlled flows and rate
  • Once per Max(RTT/2, 20 PacketTimes)
  • If no CC pkt from N in RTT interval, then all
    flows are controlled at X/2.
  • CC packets high priority
  • Compute RTT
  • Assume max rate known neighbors.
  • Good assumption for dedicated lines
  • May need to be estimated for Ether/shared channel
    or multi-hop neighbors

17
Advantages
  • Works for Mice, Elephants, Non-TCP flows
  • Long-delay flows ramp-up in log(n) round-trips
    high utilization
  • Doesnt treat loss as congestion signal
  • Not sensitive to parameters
  • Fairness decoupled from CC mechanism agnostic on
    policy, or policy delivery mechanism. Can work
    with either packet marking (diffsrv) or
    flow-weighting (periodic packets from src to dst,
    providing per-flow weights).
  • Aggregation per-hop control makes flows
    smoother and less self-similar
  • Response time to source comparable to e2e packet
    loss.

18
Serendipity
  • Buffer sizes
  • Per-link rather than cross-network
  • 1 buffer per neighbor, rather than 1 per flow
  • Broken routers, misbehaving hosts, DOS attacks
  • Multicast
  • Simplifies TCP

19
Preliminary Observations
  • Significantly simpler than current world (let
    alone more complex world)
  • AHBHA comparable in all cases. Never
    significantly better. Simulations using ns2
  • RED (varying capacities loads), Floyd TCP
    Friendly, FAST (), TCP WESTWOOD(), XCP (),
    AHBHA Regions Legacy, Defective routers, DOS,
    Short flows (1 pkt), Mbottle, SimpleTCP, w/ECN
    to source
  • () Compared to results in paper, () needed to
    compile separate versions of ns2
  • Works with non-cooperating clients and routers.

20
Preliminary Non-Results
  • Stability not proven
  • Convergence not proven
  • Convergence-time not established
  • (On the other hand, intuitive reasons to believe
    stable e.g. bounded input increase,
    superposition of stable systems, RTTs are equal
    (not just by assumption))
  • If many congestion points, then lose congestion
    isolation

21
Unresolved Issues with Naming Controlled
Aggregates
  • 1 bit for renaming next hop? B bits? Exhaustive
    list?
  • Aggregate by next hop? Or 2nd hop (horizon
    effect)?
  • More hysteresis in determining CongestionInterval,
    rather than Releasing control after quiet
    period.
  • Right choices depend on patterns of congestion in
    real network.
  • Measurement required.

22
cing Measuring Network-Internal Delays using
only Existing Infrastructure
  • joint work with Kostas Anagnostakis
  • University of Pennsylvania Raphael RygerYale
    University

23
Remote measurement of per-link delays
  • Network measurement techniques
  • Understanding of control mechanisms (such as TCP
    congestion control) --- both results and workload
  • Gain insights into network performance
  • Fault Isolation, Error reporting
  • Curiosity switch providers?
  • Network parameters such as delay, loss, and
    throughput are easy to measure end-to-end
  • Network parameters such as delay, loss, and
    throughput are difficult to measure on individual
    links inside the network.

RESEARCH
MANAGEMENT
USER
24
Understand your toolsKnow yourself
  • How accurate are the results?
  • Why do we believe it is accurate?
  • What are its limitations?
  • Answering these questions is difficult, sometime
    surprising, and results in a much better tool.

25
Network Delay Tomography A Brief History
X2
B
A
From a remote source, S, estimate the
distribution of link delays
a2
X1
S
1
a5
a1
a4
2
X3
a3
C
  • Direct measurement, using existing tools (e.g.
    pathchar)
  • ltRTT to tailgt - ltRTT to headgt yields RTT on link
    (TTL-expired responses)
  • Only existing infrastructure measure anywhere
    w/o cooperation.
  • But
  • ICMP responses representative?
  • Asymmetric paths return paths vary so (tail -
    head) may not be meaningful.
  • Round trip vs. one-way delay?

26
Network Delay Tomography A Brief History
X2
B
A
From a remote source, S, estimate the
distribution of link delays
a2
X1
S
1
a5
a1
a4
2
X3
a3
C
  • Indirect inference methods (e.g. minc project)
  • One packet to multiple sources, and correlate
    behavior on links in the resulting tree
  • But
  • Deployability (works best with multicast, need
    cooperating rcvrs)
  • Accuracy (assumes independence of delay, quality
    of estimates degrades over longer paths)
  • Robustness (high variance in error)
  • Computational complexity
  • Need for many samples, therefore much time

27
Network Delay Tomography A Brief History
X2
B
A
From a remote source, S, estimate the
distribution of link delays
a2
X1
S
1
a5
a1
a4
2
X3
a3
C
  • Direct methods (e.g. cing project)
  • f(ltTimestamp to tailgt,ltTimestamp to headgt) yields
    delay on link
  • No infrastructure required, highly accurate,
    strong experimental validation
  • But
  • Packet pair may not encounter equal queues
  • ICMP processing may not be representative
  • Clocks are unsynchronized
  • Routing irregularity, so not always applicable

28
Network tomography a direct method
  • Use router ICMP Timestamp messages and
    packet-pair probes to directly estimate queuing
    delay

2
1
3
A
B
2
1
Account for fixed, by subtracting min time over
set of observations
propagation delay
variable
fixed
queueing delay
29
Question your assumptions
2
1
3
A
B
2
1
  • Feasibility basic mechanism supported? Accuracy?
    Stability of routing? etc. etc.
  • Do back-to-back packets really experience the
    same delay on their shared path?
  • Are ICMP processing times indicative of
    processing time for normal packets?
  • How to account for differing offset and skew on
    clocks?
  • Are the paths to adjacent nodes coincident?

30
Back-to-back packets
  • Do packets arrive back-to-back?
  • Do back-to-back packets experience identical
    queuing delays and process time? (and stay
    back-to-back?)
  • Distinctions are irrelevant to algorithm the
    issue is simply difference in timestamped value.
  • Experiment Probe routers with varied load and
    varied path length from source

7300 routers
This issue common to all algorithms
31
ICMP Processing Time
Cooperating rcvr
Allows spoofed src
X2
A
a2
X1
S
a5
a1
a4
2
response
X3
a3
A?2, req
spoof
  • Send direct first, so queuing delays err
    conservatively and overestimate ICMP processing
    time.
  • Median processing time always negligible
  • 95 usually negligible

Dot median, boxinterquartile range, bars
5-95, dots are outliers.
  • This issue common to all direct measurements that
    use ICMP
  • Variation in processing time between head tail
  • Comparison w/non-ICMP traffic

32
ICMP Processing Time
Cooperating rcvr
Allows spoofed src
X2
A
a2
X1
S
a5
a1
a4
2
response
X3
a3
A?2, req
spoof
  • Spoofing and cooperation limits scope of
    experiment (7 targets, 20 routers).
  • Broader study? If processing delays significant
    on head of link, then estimated queuing delay for
    link should sometimes be negative.
  • Occasionally present in 9.9 of sample (1,368)

Dot median, boxinterquartile range, bars
5-95, dots are outliers.
33
Unsynchronized Clocks
2
1
3
A
B
2
1
?I
OA,2
OA,1
OA,2 - OA,1
  • Clock offsets may vary because of clock skew or
    jumps to adjust for skew.
  • Both src dst may jump, and may be skewed in
    opposite directions
  • May distort individual observations, as well as
    provide an erroneous minimum for d2prop
  • Impossible to tell for individual observation
    whether ?t due to queuing or clock artifact

34
Unsynchronized Clocks
Local clock
  • Post processing looks at multiple observations
  • RTT provides valuable clues queuing and max
  • Can recover skew only care if jump occurred
    between request response
  • Look for colinear regions label others cant
    tell

35
Routing Issues
X2
B
A
a2
A routing map, R, is regular over a graph G if
Rs(m) Rs(d) for all m in s?d
X1
S
1
a5
a1
a4
2
X3
3
a3
C
  • Reachability It is easy to see that most links
    are not measurable by the direct method from a
    single source many links connect to a node, but
    only one is on the path from S.
  • (In some sense, S is mainly interested in links
    reachable from S.)
  • Regularity If the path to the head of a link is
    not a prefix of the path to the tail of the link,
    then we cannot meaningfully subtract the
    timestamp responses.

36
Irregular routing
Internet routing is irregular.
  • Nevertheless
  • Coverage for single links ranges from 20 (SRI)
    to 53 (LIACS)
  • Multiple sources increase the likelihood
    measurably
  • Multi-hop segments increase the likelihood of
    coverage

37
Simpler approach?TTL vs. Timestamp
  • Why arent TTL-limited RTT measurements (ala
    pathchar) sufficient?
  • TTL-limiting removes the routing problem
  • RTT measurements removes the clock problem.
  • Accuracy
  • Asymmetry
  • One-way vs. round-trip
  • Not back-to-back on return path

10,931 links
38
A hybrid solution
  • Indirect inference is accurate for small trees
    look only at small, isolated trees.
  • Timestamps and TTL-limited probes make every
    router a cooperating receiver.
  • Indirect inference can isolate return-path delays
    from forward path delays.
  • Indirect inference can determine delay
    distribution in shared portion of overlapping
    segments
  • Deconvolution

39
Putting it all together
  • By combination/choice of Timestamps, RTT,
    TTL-expiration and using either Indirect methods
    of MINC or deconvolution we can cover just about
    every link in the Internet, often by many methods
  • But which methods to use?

40
Putting it all together
  • Shared link is most accurate for MINC
  • Deconvolution is only as accurate as the least
    accurate segment.
  • But which methods to use?

41
Relative Accuracy
  • Estimated vs. Actual Mean delay
  • 2nd row shows effect of divergent paths 200ms
    extra delay on path of 2nd pair

42
Increased Coverage
Multiple sources
Granularity vs. accuracy
43
Collecting vast quantities of data
  • Individual delay measurements over thousands of
    paths, tens of thousands of nodes, millions of
    samples
  • Long running simulation of AHBHA can generate
    petabytes of data for moderate size networks.
  • How can we accurately collect these measurements
    without sinking under their weight?

44
Space-Efficient Online Computation of Quantile
Summaries
  • joint work with Sanjeev KhannaUniversity of
    Pennsylvania

45
Summarizing extremely large data sets
  • The problem
  • Vast quantities of data, perhaps ephemeral
  • Memory is limited and observations are lost once
    observed
  • Therefore construct a proxy data structure of
    manageable size, able to return needed
    information
  • What kind of information do we need?
    Distribution of values
  • Quantile queries Given a quantile, ?, return the
    value whose rank is ??N?
  • e.g. min, max, median, 90th percentile, 99th
    percentile
  • Munro Paterson 1980 (Pohl1969) p-pass
    algorithm to compute exact quantile requires
    ?(N1/p) space.

46
Trading off accuracy for space
  • Explicit a priori guarantee on precision of the
    approximation, but try to use the smallest memory
    footprint possible.
  • Explicit and tunable a priori guarantee on
    maximum memory footprint, and make the
    approximation as accurate as possible.

47
Trading off accuracy for space
  • Explicit a priori guarantee on precision of the
    approximation, but try to use the smallest memory
    footprint possible.
  • Explicit and tunable a priori guarantee on
    maximum memory footprint, and make the
    approximation as accurate as possible.

Histograms
48
Trading off accuracy for space
?-approximate quantile summary
  • Explicit a priori guarantee on precision of the
    approximation, but try to use the smallest memory
    footprint possible.
  • An ?-approximate quantile summary can answer any
    quantile query to within a precision of ?
  • Given a quantile, ?, return a value whose rank is
    guaranteed to be within the interval (? - ? )N,
    (? ? )N

49
Requirements
  • Explicit tunable a priori guarantees on the
    precision of the approximation
  • As small a memory footprint as possible
  • Online Single pass over the data
  • Data Independent Performance guarantees should
    be unaffected by arrival order, distribution of
    values, or cardinality of observations.
  • Data Independent Setup no a priori knowledge
    required about data set (size, range,
    distribution, order).

50
Related Work
  • Manku, Rajagopalan, and Lindsay generalize a
    class of 1-pass algorithms (e.g. Agrawal Swami
    COMAD95, Alsabti, Ranka Singh VLDB97),
  • SIGMOD98
  • a priori knowledge of size of data set
  • O((1/?) log2 (? N)) worst case space
  • does not exploit any structure in observations
  • SIGMOD99
  • Give up deterministic guarantee in exchange for
    dropping the requirement of a priori knowledge of
    size of data set
  • Gibbons, Matias, Poosala VLDB97 Chaudhuri,
    Motwani, Narsayya SIGMOD98
  • Multiple passes ( CMN only probabilistic
    guarantee)

51
Our epsilon-approximate quantile summary
52
Overview of Summary Data Structure
??.01, N1750
192
204
201
529,536
539,540
501,503
  • Keep a data structure that stores vi, rmin(vi),
    and rmax(vi) for each observation.
  • vi value of ith observation stored in the
    summary
  • ltv0, v1, . vi, vS-1gtS can be ltlt N
  • rmin(vi) minimum possible rank of vi
  • rmax(vi) maximum possible rank of vi

53
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1750
15,2
28,7
10,1
192
204
201
529,536
539,540
501,503
  • Keep a data structure that stores vi, rmin(vi),
    and rmax(vi) for each observation. Tuple vi,
    gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
    rmax(vi) - rmin(vi)
  • Quantile ? .3? Compute r and choose best vi

54
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1750
15,2
28,7
10,1
2?N35
192
204
201
529,536
539,540
501,503
  • Keep a data structure that stores vi, rmin(vi),
    and rmax(vi) for each observation. Tuple vi,
    gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
    rmax(vi) - rmin(vi)
  • If (rmax(vi1) - rmin(vi) - 1) lt 2?N, then
    ?-approximate summary.
  • Our goal always maintain this property.

55
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1750
15,2
28,7
10,1
2?N35
192
204
201
529,536
539,540
501,503
  • Keep a data structure that stores vi, rmin(vi),
    and rmax(vi) for each observation. Tuple vi,
    gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
    rmax(vi) - rmin(vi)
  • Goal always maintain ?-approximate summary
    (rmax(vi1) - rmin(vi) - 1) (gi ?I - 1) lt
    2?N
  • Insert new observations into summary

56
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1750
15,2
28,7
10,1
2?N35
197
192
204
201
529,536
539,540
502,536
501,503
  • Keep a data structure that stores vi, rmin(vi),
    and rmax(vi) for each observation. Tuple vi,
    gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
    rmax(vi) - rmin(vi)
  • Goal always maintain ?-approximate
    summary (rmax(vi1) - rmin(vi) - 1) (gi ?I -
    1) lt 2?N
  • Insert new observations into summary

57
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1751
15,2
28,7
1,34
10,1
2?N35.02
197
192
204
201
530,537
540,541
502,536
501,503
  • Keep a data structure that stores vi, rmin(vi),
    and rmax(vi) for each observation. Tuple vi,
    gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
    rmax(vi) - rmin(vi)
  • Goal always maintain ?-approximate
    summary (rmax(vi1) - rmin(vi) - 1) (gi ?I -
    1) lt 2?N
  • Insert new observations into summary

58
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1751
15,2
28,7
1,34
10,1
2?N35.02
197
192
204
201
530,537
540,541
502,536
501,503
  • Keep a data structure that stores vi, rmin(vi),
    and rmax(vi) for each observation. Tuple vi,
    gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
    rmax(vi) - rmin(vi)
  • Goal always maintain ?-approximate
    summary (rmax(vi1) - rmin(vi) - 1) (gi ?I -
    1) lt 2?N
  • Insert new observations into summary
  • Delete all superfluous entries.

59
Overview of Summary Data Structure
? .3
r ?N 525
??.01, N1751
15,2
28,7
1,34
10,1
2?N35.02
192
204
201
530,537
540,541
501,503
  • Keep a data structure that stores vi, rmin(vi),
    and rmax(vi) for each observation. Tuple vi,
    gi, ?i gi rmin(vi) - rmin(vi-1) , ?i
    rmax(vi) - rmin(vi)
  • Goal always maintain ?-approximate
    summary (rmax(vi1) - rmin(vi) - 1) (gi ?I -
    1) lt 2?N
  • Insert new observations into summary
  • Delete all superfluous entries.

60
Reducing space requirement of summary
  • Delete all superfluous entries What do we mean
    by superfluous entries?
  • Goal minimizing workspace --- not size of final
    summary
  • Can always reduce the final summary to size
    O(1/?).
  • Deletion rule (compress) will reduce summary
    size, but will take care to keep workspace small
    regardless of incoming observations.
  • To explain COMPRESS operation, we need to develop
    some more terminology

61
Terminology
  • Full tuple A tuple is full if gi ?I 2?N
  • Full tuple pair A pair of tuples is full if
    deleting the left-hand tuple would overfill the
    right one
  • Capacity number of observations that can be
    counted by gi before the tuple becomes full. (
    2?N - ?I)
  • We say that ti and tj have similar capacities if
    log capacity(ti) ? log capacity(tj) (intuition,
    not defn)
  • Similarity partitions the possible values of ?
    into bands.

62
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14
vi,gi,?i
S
  • The bands can be used to impose a tree structure
    over the tuples.
  • Group tuples with similar capacities into bands

63
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14
S
  • The bands can be used to impose a tree structure
    over the tuples.
  • Group tuples with similar capacities into bands

64
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14
  • The bands can be used to impose a tree structure
    over the tuples.
  • Group tuples with similar capacities into bands

65
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14
  • The bands can be used to impose a tree structure
    over the tuples.
  • Group tuples with similar capacities into bands
  • First (least index) node to the right with higher
    capacity band becomes parent.

66
More TerminologyTree Representation
?-range Capacity Band0-7 8-15 38-11 4-7 212-13
2-3 114 1 0
??.001, N7,000
2?N14
  • The bands can be used to impose a tree structure
    over the tuples.
  • Group tuples with similar capacities into bands
  • First (least index) node to the right with higher
    capacity band becomes parent.

67
COMPRESS operation
  • General strategy delete tuples with small
    capacity and preserve tuples with large capacity.
  • 1) Deletion cannot leave descendants unmerged ---
    it must delete entire subtrees
  • 2) Deletion can only merge a tuple with small
    capacity into a tuple with similar or larger
    capacity.
  • 3) Deletion cannot create an over-full tuple
    (i.e with g? gt floor(2?N))

68
Analysis
  • Theorem
  • At any time n, the total number of tuples stored
    in S(n) is at most (11/2?)log(2?n)
  • Sketch of proof
  • Each tuple requires the support of many
    observations in order to survive a COMPRESS
  • Only n observations
  • Therefore only a relatively small number of
    tuples can survive

69
Useful Lemmas
  • A tuple that survives insertion at time m must
    have ? floor(2?m) (else would be immediately
    deleted (has no descendants, and if smaller ?
    then parent has capacity to absorb it)).
  • If ?i and ?j are ever in the same band, they will
    always be in the same band. (Technical details
    on defn of band band boundaries are only
    deleted, never created).
  • The number of observations covered cumulatively
    by tuples in bands 0.. ? is bounded by 2?/?

70
Limited number of full tuple pairs in each band
  • For any given ?, at most 4/? nodes from band ?
    are right partners in a full tuple pair.
  • Defn If neighbors are a full tuple pair, then
    gj-1 gj ?j gt 2?n
  • Assume p pairs exist. Sum over all such
    pairs ?gj-1 ? gj ? ?j gt 2p?n
  • 2?gj ? ?j gt 2p?n
  • ?gj is bounded by of observations in bands
    0..? (2?)/?
  • ? ?j is bounded by max ?j in ?, 2?n - 2?-1
  • (2?1)/? p(2?n - 2?-1) gt 2p?n
  • 4/? gt p
  • What about non full tuple pairs? At most 1 per
    parent.

71
Each parent requires many descendants to survive
COMPRESS
Vi
Vj
  • At time n, for any ?, at most 3/2? nodes have a
    child in band ?
  • Choose a parent Vi with a child in band ?.
    Choose the rightmost child, Vj. Let mj (lt n -
    2?-1/(2?)) be the time Vj was inserted.
  • Red nodes, descendants of Vj, and anything merged
    into Vj, must have arrived after n - 2?1/(2?)
  • ?g in picture gi ?i gt 2?n gi(mj) ?i lt
    2?mj
  • ?g in picture gi (since n - 2?1/(2?)) gt 2?(n -
    (n - 2?-1/(2?)))
  • At most 2?1/(2?) observations avail, each Vi
    needs gt 2? (2?-1/(2?))
  • Therefore at most 2/? parents of nodes in band ?
    (more complexity needed to get to 3/2?)

72
Analysis
  • Theorem
  • At any time n, the total number of tuples stored
    in S(n) is at most (11/2?)log(2?n)
  • Combining Lemmas
  • 4/? pairs per band
  • At most 3/(2?) parents of children in band
  • At most 1 singleton per parent
  • 11/(2?) tuples per band
  • At most log(2?n) bands

73
Experimental Results
  • Measurement
  • S
  • Observed ? (vs. desired ?) max, avg, and for 16
    representative quantiles
  • Optimal max observed ?
  • Compared 3 algorithms
  • MRL
  • Preallocated (1/3 number of stored observations
    as MRL)
  • Adaptive allocate a new quantile only when
    observed error is about to exceed desired ?
  • Optimization in algorithm
  • Keep entries up to high water mark (can only help)

74
Random Input
Space
Error
75
Handling Deletions
  • Artificial data set
  • ATT CDR median length of active phone call?

76
Summarizing Quantile Summaries
  • Empirically, behaves very well indeed
  • On average, for random input, seems to use
    constant space
  • Best-known worst-case guarantees
  • GK used as a black box to improve other
    algorithms
  • Munro Patersons classic p-pass algorithm for
    computing median exactly. GK reduces
    space/number of passes by a factor of Omega(log
    n)
  • Probabilistic quantile summaries
  • The basic data structure has applications to
    other problems
  • Order statistics in sensor networks

77
Concluding remarks
  • AHBHA
  • seems very promising but a lot of work is
    needed to evaluate it properly.
  • Cing
  • Identified problems with existing techniques
  • The hybrid approach was an obvious idea, but
    required a lot of work and care to succeed
  • As accurate as cing, almost universally
    applicable.
  • Quantile Summaries
  • Exploit as much information as possible.
  • Proof is unsatisfying inelegant because of
    complexity notion of bands and COMPRESS
    non-intuitive
  • Result is significant improvement with several
    unexpected applications.

78
General remarks
  • A small shift in view can sometimes yield large
    reductions in complexity
  • Even simple solutions to large scale problems are
    extremely difficult to evaluate --- many details,
    many cases, unexpected interactions, many
    metrics. As a discipline we do not have a good
    methodology for evaluation.
  • Experimental results are surprisingly difficult
    to obtain, confirm, and evaluate. It is worth
    persevering.
  • Formal analysis of sub-problems can give us solid
    ground to stand on even when large problem is
    analytically intractable. It can also yield
    significant practical improvements.
  • Successful systems research needsvision,
    experimental technique, and formal analytic skills

79
Ongoing projects
  • AHBHA congestion control, network architecture
  • Cing network delay tomography, large scale
    measurement studies
  • Streaming data summaries
  • Sensor networks balanced power, order
    statistics, communication optimizations
  • Coverage cooperative virus defense w/ untrusted
    peers
  • EXCHANGE peer2peer incentives
  • Harmony generic, safe, reconciliation of OTS
    apps
  • Canon consistent security for heterogeneous
    systems
  • NBS practical non-blocking algs, contention in
    distributed algorithms,

80
The End
81
Simpler approach?TTL vs. Timestamp
  • Why arent TTL-limited RTT measurements (ala
    pathchar) sufficient?
  • TTL-limiting removes the routing problem
  • RTT measurements removes the clock problem.
  • Accuracy
  • Asymmetry
  • One-way vs. round-trip
  • Not back-to-back on return path

10,931 links
82
Network Tomography feasibility
  • TIMESTAMP support? 96 response to TIMESTAMPs
  • TIMESTAMP indicative of normal packets? within
    ms resolution
  • Clock synchronization? robust post-facto
    algorithm
  • Irregular routing limits choice of nodes

Example Path structure, Penn to Sprintlabs
Corresponding feasible measurement partitions,
Penn to Sprintlabs
83
Network tomography feasibility (2)
  • Data 10k paths from 5 different sources
  • Metric fraction of nodes usable for tomography
  • Results 50 nodes are usable, more difficult
    as distance from source increases, better when
    probing from multiple sources

84
Why is this not ideal?Accepted quibbles
  • Non-TCP Assumes everything TCP-Friendly
  • Packet loss due to errors (e.g. wireless)
    considered congestion signal
  • Bad RTE can also cause false signals
  • Mice (congestion control only kicks in after 6
    packets or so)
  • Large bandwidth-delay pipes
  • Self-similarity of traffic (bursty)
  • Buffer occupancy (high)
  • RED hard to configure to perform well (different
    parameters for different scenarios)
  • Fairness
  • QOS

85
Conjecture Internet is at local maximum with
very steep slopes
Some locally bad ideas
  • Hop by hop feedback
  • Head of line blocking
  • Local, so cant achieve global fairness
  • Aggregation
  • Fractal nature of traffic
  • Rate-based congestion control
  • unbounded input, oscillatory
  • Explicit out-of-band congestion notification
    packets
  • adds to load under congestion, wastes bandwidth,
    and unstable
  • Most new ideas, taken by themselves, make
    matters worse than Standard TCP.
Write a Comment
User Comments (0)
About PowerShow.com