Gossip Algorithms and Emergent Shape - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Gossip Algorithms and Emergent Shape

Description:

Title: PowerPoint Presentation Last modified by: Administrator Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 68
Provided by: loren103
Category:

less

Transcript and Presenter's Notes

Title: Gossip Algorithms and Emergent Shape


1
Gossip Algorithms and Emergent Shape
  • Ken Birman

2
On Gossip and Shape
  • Why is gossip interesting?
  • Scalability.
  • Protocols are highly symmetric
  • Although latency often forces us to bias them
  • Powerful convergence properties
  • Especially in support of epidemics
  • New forms of consistency
  • Probabilistic but this is often adequate

3
Consistency
  • Captures intuition that if A and B compare their
    states, no contradiction is evident
  • In systems with logical consistency, we say
    things like As history and Bs are closed under
    causality, and A is a prefix of B
  • With probabilistic systems we seek rapidly
    decreasing probability (as time elapses) that A
    knows x but B doesnt
  • . Probabilistic convergent consistency

4
Exponential convergence
  • A subclass of convergence behaviors
  • Not all gossip protocols offer exponential
    convergence
  • But epidemic protocols do have this property and
    many gossip protocols implement epidemics

5
Value of exponential convergence
  • An exponentially convergent protocol overwhelms
    mishaps and even attacks
  • Requires that new information reach relevant
    nodes with at most log(N) delay
  • Can talk of probability 1.0 outcomes
  • Even model simplifications (such as idealized
    network) are washed away!
  • Predictions rarely off by more than a constant
  • A rarity a theory relevant to practice!

6
Convergent consistency
  • To illustrate our point, contrast Cornells
    Kelips system with MITs Chord
  • Kelips is convergent. Chord isnt

7
Kelips (Linga, Gupta, Birman)
Take a a collection of nodes
110
230
202
30
8
Kelips
Affinity Groups peer membership thru consistent
hash
Map nodes to affinity groups
0
1
2
110
230
202
members per affinity group
30
9
Kelips
110 knows about other members 230, 30
Affinity Groups peer membership thru consistent
hash
Affinity group view
id hbeat rtt
30 234 90ms
230 322 30ms
0
1
2
110
230
202
members per affinity group
30
Affinity group pointers
10
Kelips
202 is a contact for 110 in group 2
Affinity Groups peer membership thru consistent
hash
Affinity group view
id hbeat rtt
30 234 90ms
230 322 30ms
0
1
2
110
Contacts
230
202
members per affinity group
group contactNode

2 202
30
Contact pointers
11
Kelips
cnn.com maps to group 2. So 110 tells group 2
to route inquiries about cnn.com to it.
Affinity Groups peer membership thru consistent
hash
Affinity group view
id hbeat rtt
30 234 90ms
230 322 30ms
0
1
2
110
Contacts
230
202
members per affinity group
group contactNode

2 202
30
Resource Tuples
Gossip protocol replicates data cheaply
resource info

cnn.com 110
12
How it works
  • Kelips is entirely gossip based!
  • Gossip about membership
  • Gossip to replicate and repair data
  • Gossip about last heard from time used to
    discard failed nodes
  • Gossip channel uses fixed bandwidth
  • fixed rate, packets of limited size

13
How it works
175
Node 175 is a contact for Node 102 in some
affinity group
HmmNode 19 looks like a much better contact in
affinity group 2
RTT 235ms
Node 102
19
RTT 6 ms
Gossip data stream
  • Heuristic periodically ping contacts to check
    liveness, RTT swap so-so ones for better ones.

14
Work in progress
  • Prakash Linga is extending Kelips to support
    multi-dimensional indexing, range queries,
    self-rebalancing
  • Kelips has limited incoming info rate
  • Behavior when the limit is continuously exceeded
    is not well understood.
  • Will also study this phenomenon

15
Replication makes it robust
  • Kelips should work even during disruptive
    episodes
  • After all, tuples are replicated to ??N nodes
  • Query k nodes concurrently to overcome isolated
    crashes, also reduces risk that very recent data
    could be missed
  • we often overlook importance of showing that
    systems work while recovering from a disruption

16
Chord (MIT group)
  • The MacDonalds of DHTs
  • A data structure mapped to a network
  • Ring of nodes (hashed ids)
  • Superimposed binary lookup trees
  • Other cached hints for fast lookups
  • Chord is not convergently consistent

17
Chord picture
0
255
30
Finger links
248
241
Cached link
64
202
199
108
177
123
18
Chord picture
USA
Europe
0
0
255
255
30
30
248
248
241
64
241
64
202
202
199
108
199
108
177
177
123
123
19
so, who cares?
  • Chord lookups can fail and it suffers from high
    overheads when nodes churn
  • Loads surge just when things are already
    disrupted quite often, because of loads
  • And cant predict how long Chord might remain
    disrupted once it gets that way
  • Worst case scenario Chord can become
    inconsistent and stay that way

The Fine Print The scenario you have been shown
is of low probability. In all likelihood, Chord
would repair itself after any partitioning
failure that might really arise. Caveat emptor
and all that.
20
Saved by gossip!
  • Epidemic gossip remedy for what ails Chord!
  • c.f. Epichord (Liskov), Bambou
  • Key insight
  • Gossip based DHTs, if correctly designed, are
    self-stabilizing!

21
Connection to self-stabilization
  • Self-stabilization theory
  • Describe a system and a desired property
  • Assume a failure in which code remains correct
    but node states are corrupted
  • Proof obligation show that property is
    reestablished within bounded time
  • But doesnt bound badness when transient
    disruption is occuring

22
Beyond self-stabilization
  • Tardos poses a related problem
  • Consider behavior of the system while an endless
    sequence of disruptive events occurs
  • System never reaches a quiescent state
  • Under what conditions will it still behave
    correctly?
  • Results of form if disruptions satisfy ? then
    correctness property is continuously satisfied
  • Hypothesis with convergent consistency we may be
    able to develop a proof framework for systems
    that are continuously safe.

23
Lets look at a second example
  • Astrolabe system uses a different emergent data
    structure a tree
  • Nodes are given an initial location each knows
    its leaf domain
  • Inner nodes are elected using gossip and
    aggregation

24
  • Astrolabe
  • Intended as help for applications adrift in a sea
    of information
  • Structure emerges from a randomized gossip
    protocol
  • This approach is robust and scalable even under
    stress that cripples traditional systems
  • Developed at RNS, Cornell
  • By Robbert van Renesse, with many others helping
  • Today used extensively within Amazon.com

Astrolabe
25
Astrolabe is a flexible monitoring overlay
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1971 1.5 1 0 4.1
cardinal 2004 4.5 1 0 6.0
Name Time Load Weblogic? SMTP? Word Version
swift 2271 1.8 0 1 6.2
falcon 1971 1.5 1 0 4.1
cardinal 2004 4.5 1 0 6.0
swift.cs.cornell.edu
Periodically, pull data from monitored systems
Name Time Load Weblogic? SMTP? Word Version
swift 2003 .67 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
Name Time Load Weblogic? SMTP? Word Version
swift 2003 .67 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2231 1.7 1 1 6.0
cardinal.cs.cornell.edu
26
Astrolabe in a single domain
  • Each node owns a single tuple, like the
    management information base (MIB)
  • Nodes discover one-another through a simple
    broadcast scheme (anyone out there?) and gossip
    about membership
  • Nodes also keep replicas of one-anothers rows
  • Periodically (uniformly at random) merge your
    state with some else

27
State Merge Core of Astrolabe epidemic
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1971 1.5 1 0 4.1
cardinal 2004 4.5 1 0 6.0
swift.cs.cornell.edu
Name Time Load Weblogic? SMTP? Word Version
swift 2003 .67 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
cardinal.cs.cornell.edu
28
State Merge Core of Astrolabe epidemic
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1971 1.5 1 0 4.1
cardinal 2004 4.5 1 0 6.0
swift.cs.cornell.edu
swift 2011 2.0
cardinal 2201 3.5
Name Time Load Weblogic? SMTP? Word Version
swift 2003 .67 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
cardinal.cs.cornell.edu
29
State Merge Core of Astrolabe epidemic
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1971 1.5 1 0 4.1
cardinal 2201 3.5 1 0 6.0
swift.cs.cornell.edu
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
cardinal.cs.cornell.edu
30
Observations
  • Merge protocol has constant cost
  • One message sent, received (on avg) per unit
    time.
  • The data changes slowly, so no need to run it
    quickly we usually run it every five seconds or
    so
  • Information spreads in O(log N) time
  • But this assumes bounded region size
  • In Astrolabe, we limit them to 50-100 rows

31
Big systems
  • A big system could have many regions
  • Looks like a pile of spreadsheets
  • A node only replicates data from its neighbors
    within its own region

32
Scaling up and up
  • With a stack of domains, we dont want every
    system to see every domain
  • Cost would be huge
  • So instead, well see a summary

Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
Name Time Load Weblogic? SMTP? Word Version
swift 2011 2.0 0 1 6.2
falcon 1976 2.7 1 0 4.1
cardinal 2201 3.5 1 1 6.0
cardinal.cs.cornell.edu
33
Astrolabe builds a hierarchy using a P2P protocol
that assembles the puzzle without any servers
Dynamically changing query output is visible
system-wide
SQL query summarizes data
Name Avg Load WL contact SMTP contact
SF 2.6 123.45.61.3 123.45.61.17
NJ 1.8 127.16.77.6 127.16.77.11
Paris 3.1 14.66.71.8 14.66.71.12
Name Avg Load WL contact SMTP contact
SF 2.6 123.45.61.3 123.45.61.17
NJ 1.8 127.16.77.6 127.16.77.11
Paris 3.1 14.66.71.8 14.66.71.12
Name Avg Load WL contact SMTP contact
SF 2.2 123.45.61.3 123.45.61.17
NJ 1.6 127.16.77.6 127.16.77.11
Paris 2.7 14.66.71.8 14.66.71.12
Name Load Weblogic? SMTP? Word Version
swift 2.0 0 1 6.2
falcon 1.5 1 0 4.1
cardinal 4.5 1 0 6.0
Name Load Weblogic? SMTP? Word Version
gazelle 1.7 0 0 4.5
zebra 3.2 0 1 6.2
gnu .5 1 0 6.2
Name Load Weblogic? SMTP? Word Version
swift 2.0 0 1 6.2
falcon 1.5 1 0 4.1
cardinal 4.5 1 0 6.0
Name Load Weblogic? SMTP? Word Version
gazelle 1.7 0 0 4.5
zebra 3.2 0 1 6.2
gnu .5 1 0 6.2
Name Load Weblogic? SMTP? Word Version
swift 1.7 0 1 6.2
falcon 2.1 1 0 4.1
cardinal 3.9 1 0 6.0
Name Load Weblogic? SMTP? Word Version
gazelle 4.1 0 0 4.5
zebra 0.9 0 1 6.2
gnu 2.2 1 0 6.2
New Jersey
San Francisco
34
Large scale fake regions
  • These are
  • Computed by queries that summarize a whole region
    as a single row
  • Gossiped in a read-only manner within a leaf
    region
  • But who runs the gossip?
  • Each region elects k members to run gossip at
    the next level up.
  • Can play with selection criteria and k

35
Hierarchy is virtual data is replicated
Yellow leaf node sees its neighbors and the
domains on the path to the root.
Name Avg Load WL contact SMTP contact
SF 2.6 123.45.61.3 123.45.61.17
NJ 1.8 127.16.77.6 127.16.77.11
Paris 3.1 14.66.71.8 14.66.71.12
Gnu runs level 2 epidemic because it has lowest
load
Falcon runs level 2 epidemic because it has
lowest load
Name Load Weblogic? SMTP? Word Version
swift 2.0 0 1 6.2
falcon 1.5 1 0 4.1
cardinal 4.5 1 0 6.0
Name Load Weblogic? SMTP? Word Version
gazelle 1.7 0 0 4.5
zebra 3.2 0 1 6.2
gnu .5 1 0 6.2
New Jersey
San Francisco
36
Hierarchy is virtual data is replicated
Green node sees different leaf domain but has a
consistent view of the inner domain
Name Avg Load WL contact SMTP contact
SF 2.6 123.45.61.3 123.45.61.17
NJ 1.8 127.16.77.6 127.16.77.11
Paris 3.1 14.66.71.8 14.66.71.12
Name Load Weblogic? SMTP? Word Version
swift 2.0 0 1 6.2
falcon 1.5 1 0 4.1
cardinal 4.5 1 0 6.0
Name Load Weblogic? SMTP? Word Version
gazelle 1.7 0 0 4.5
zebra 3.2 0 1 6.2
gnu .5 1 0 6.2
New Jersey
San Francisco
37
Worst case load?
  • A small number of nodes end up participating in
    O(logfanoutN) epidemics
  • Here the fanout is something like 50
  • In each epidemic, a message is sent and received
    roughly every 5 seconds
  • We limit message size so even during periods of
    turbulence, no message can become huge.

38
Self-stabilization?
  • Like Kelips, it seems that Astrolabe
  • Is convergently consistent, self-stabilizing
  • And would ride out a large class of possible
    failures
  • But Astrolabe would be disrupted by
  • Incorrect aggregation (Byzantine faults)
  • Correlated failure of all representatives of some
    portion of the tree

39
Focus on emergent shape
  • Kelips Nodes start with a-priori assignment to
    affinity groups, end up with a superimposed
    pointer structure
  • Astrolabe Nodes start with a-priori leaf domain
    assignments, build the tree
  • What other data structures can be constructed
    with emergent protocols?

40
Emergent shape
  • We know a lot about a related question
  • Given a connected graph, cost function
  • Nodes have bounded degree
  • Use a gossip protocol to swap links until some
    desired graph emerges
  • Another related question
  • Given a gossip overlay, improve it by selecting
    better links (usually, lower RTT)

Example The Anthill framework of Alberto
Montresor, Ozalp Babaoglu, Hein Meling and
Francesco Russo
41
Example of an open problem
  • Given a description of a data structure (for
    example, a balanced tree)
  • design a gossip protocol such that the system
    will rapidly converge towards that structure even
    if disrupted
  • Do it with bounced per-node message rates, sizes
    (network load less important)
  • Use aggregation to test tree quality?

42
Van Renesses dreadful aggregation tree
?
D
L
B
J
F
N
A
C
E
G
I
K
M
O
An event e occurs at H
P learns O(N) time units later!
G gossips with H and learns e
A B C D E F G H
I J K L M N O P
43
What went wrong?
  • In Robberts horrendous tree, each node has equal
    work to do but the information-space diameter
    is larger!
  • Astrolabe benefits from instant knowledge
    because the epidemic at each level is run by
    someone elected from the level below

44
Insight Two kinds of shape
  • Weve focused on the aggregation tree
  • But in fact should also think about the
    information flow tree

45
Information space perspective
  • Bad aggregation graph diameter O(n)
  • Astrolabe version diameter?O(log(n))

H G E F B A C D L K I J N
M O P
46
Gossip and bias
  • Often useful to bias gossip, particularly if
    some links are fast and others are very slow

Roughly half the gossip will cross this link!
Demers Shows how to adjust probabilities to even
the load. Ziao later showed that must also
fine-tune gossip rate
A
X
C
Y
B
D
Z
F
E
47
Gossip and bias
  • Idea shaped gossip probabilities
  • Gravitational Gossip (Jenkins)
  • Groups carry multicast event streams reporting
    evolution of a continuous function (like electric
    power loads)
  • Some nodes want to watch closely while others
    only need part of the data

48
Gravitational Gossip
Jenkins When a gossips to b, includes
information about topic t in a way weighted by
bs level of interest in topic t
b
a
c
49
Gravitational Gossip
50
How does bias impact information-flow graph?
  • Earlier, all links were the same
  • Now, some links carry
  • Less information
  • And may have longer delays
  • (Bounded capacity messages similar to long
    links?)
  • Question Model biased information flow graphs
    and explore implications

51
Questions about bias
  • When does the biasing of gossip target selection
    break analytic results?
  • Example Alves and Hopcroft show that with fanout
    too small, gossip epidemics can die out,
    logically partitioning a system
  • Question Can we relate the question to flooding
    on an expander graph?

52
Can an overlay learn its shape?
  • The unifying link across the many examples weve
    cited
  • And needed in real world overlays when these
    adapt to underlying topology
  • For example, to pick super-peers
  • Hints of a theory of adaptive tuning?
  • Related to sketches in databases

53
Can an overlay learn its shape?
  • Suppose we wanted to use gossip to implement
    decentralized balanced overlay trees (nodes
    know locations)
  • Build some sort of overlay
  • Randomly select initial upper-level nodes
  • Jiggle until they are nicely spaced
  • Instance of facility location problem

54
Emergent landmarks
55
Emergent landmarks
Too many
Too few
56
Emergent landmarks
57
Emergent landmarks
58
Emergent landmarks
One per cluster
Centroid
59
2-D scenario Gossip variant of Euclidian
k-medians (a popular STOC topic)
  • Nodes know position on a line, can gossip
  • In log(N) time seek k equally spaced nodes
  • Later, after disruption, reconverge in log(N)
    time
  • NB The facility placement problem is just the
    d-dimensional k-centroids problem

K3
60
Why is this hard?
  • After k rounds, each node has only communicated
    with 2k other nodes
  • In log(N) time, aggregate information has time to
    transit the whole graph, but with bounded
    bandwidth we dont have time to send all data

61
Emergent shape problem
  • Can a gossip system learn its own shape in a
    convergently consistent way?
  • Start with centralized solutions draw from the
    optimization community.
  • Then seek decentralized gossip variants that
    achieve convergently consistent approximations,
    (re)converge in log(N) rounds, and use bounded
    bandwidth

62
more questions
Optional content (time permitting)
  • Notice that Astrolabe forces participants to
    agree on what the aggregation hierarchy should
    contain
  • In effect, we need to share interest in the
    aggregation hierarchy
  • This allows us to bound the size of messages
    (expected constant) and the rate (expected
    constant per epidemic)

63
The question
Optional content (time permitting)
  • Could we design a gossip-based system for
    self-centered state monitoring?
  • Each node poses a query, Astrolabe style, on the
    state of the system
  • We dynamically construct an overlay for each of
    these queries
  • The system is the union of these overlays

64
Self-centered monitoring
Optional content (time permitting)
65
Self-centered queries
Optional content (time permitting)
  • Offhand, looks like a bad idea
  • If everyone has an independent query
  • And everyone is iid in all the obvious ways
  • Than everyone must invest work proportional to
    the number of nodes monitored by each query
  • If O(n) queries touch O(n) nodes, workload grows
    as O(n) and global load as O(n2)

66
Aggregation
Optional content (time permitting)
  • but in practice, it seems unlikely that queries
    would look this way
  • More plausible is something Zipf-like
  • A few queries look at broad state of system
  • Most look at relatively few nodes
  • And a small set of aggregates might be shared by
    the majority of queries
  • Assuming this is so, can one build a scalable
    gossip overlay / monitoring infrastructure?

67
Summary of possible topics
  • Consistency models
  • Necessary conditionsfor convergentconsistency
  • Self-stabilization
  • Implications of bias
  • Emergent structures
  • Self-centered aggregation
  • Bandwidth-limited systems sketches
  • Using aggregation to fine-tune a shape with
    constant costs
Write a Comment
User Comments (0)
About PowerShow.com