Introduction to Structured Overlay Networks - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Introduction to Structured Overlay Networks

Description:

Host Identity Payload (HIP) Uses the directory to provide seamless ... Dedicated servers. Nodes can be trusted. Less heterogeneity. Chord as Example of DHT ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 64
Provided by: sei114
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Structured Overlay Networks


1
Introduction to Structured Overlay Networks
  • Seif Haridi
  • KTH/SICS

11/14/2009
1
2
Presentation Overview
  • Gentle introduction to Structured Overlay
    Networks and Distributed Hash Tables
  • General use of SONs and DHTs
  • Chord algorithms and others

3
Whats a Distributed Hash Table (DHT)?
, which is distributed
  • An ordinary hash table
  • Every node provides a lookup operation
  • Provide the value associated with a key
  • Nodes keep routing pointers
  • If item not found, route to another node

11/14/2009
3
4
So what?
Time to find data is logarithmic Size of routing
tables is logarithmic Example log2(1000000)20 E
FFICIENT!
Store number of items proportional to number of
nodes Typically With D items and n nodes Store
D/n items per node Move D/n items when nodes
join/leave/fail EFFICIENT!
  • Self-management routing info
  • Ensure routing information is up-to-date
  • Self-management of items
  • Ensure that data is always replicated and
    available
  • Characteristic properties
  • Scalability
  • Number of nodes can be huge
  • Number of items can be huge
  • Self-manage in presence joins/leaves/failures
  • Routing information
  • Data items

11/14/2009
4
5
Traditional Motivation (1/2)
  • Peer-to-Peer file sharing very popular
  • Napster
  • Completely centralized
  • Central server knows who has what
  • Judicial problems
  • Gnutella
  • Completely decentralized
  • Ask everyone you know to find data
  • Very inefficient

central index
decentralized index
11/14/2009
5
6
Traditional Motivation (2/2)
  • Grand vision of DHTs
  • Provide efficient file sharing
  • Quote from Chord In particular, Chord can
    help avoid single points of failure or control
    that systems like Napster possess, and the lack
    of scalability that systems like Gnutella display
    because of their widespread use of broadcasts.
    Stoica et al. 2001
  • Hidden assumptions
  • Millions of unreliable nodes
  • User can switch off computer any time
    (leavefailure)
  • Extreme dynamism (nodes joining/leaving/failing)
  • Heterogeneity of computers and latencies
  • Untrusted nodes

11/14/2009
6
7
Motivation DHT overlay as communication
infra-structure
  • Internet communication
  • IP/port, TCP and UDP
  • Not suited for 21st century computing
  • Firewalls
  • NATs
  • Changing IP addresses

11/14/2009
7
8
Name based communication
  • DHTs can overcome these
  • How?
  • Use the DHT
  • Map names to locations
  • Bypass firewalls and NATs by routing through
    neighbors

11/14/2009
8
9
Name based communication
  • What about group communication?
  • IP Multicast is not enabled on the Internet
  • Use the overlay to broadcast to all nodes
  • Create multiple groups, broadcast within each

11/14/2009
9
10
Whats it good for?
  • Lets look at 10 applications built using such
    systems

11
Distributed Authorization
  • Defense project at SICS, Swedish Institute of
    Computer Science
  • Store certificates in
  • the directory
  • No central server
  • Survives even if nodes are attacked

12
Distributed Backup
  • Setup
  • Clients installed the backup tool
  • Decide on amount of space to share
  • Choose files for backup
  • Regular backup
  • Data is encrypted
  • Stored in the directory

13
Distributed File System
  • Similar to AFS and NFS
  • Files stored in directory
  • What is new?
  • Application logic self-managed
  • Add/remove servers on the fly
  • Automatically handles failures
  • Automatically load-balances
  • No manual configuration needed

14
P2P Cache
  • A distributed cache
  • Every node in an org. runs a client
  • Want to browse a web page?
  • If exists locally -gt download it from a peer
  • Otherwise, fetch and cache
  • No central proxy needed

15
P2P Web Servers
  • Distributed Web Server
  • Pages stored in the directory
  • What is new?
  • Application logic self-managed
  • Automatically load-balances
  • Add/remove servers on the fly
  • Automatically handles failures

16
P2P SIP
  • Session Initiation Protocol
  • Used to initiate calls on the Internet
  • Is being standardized
  • Use the directory to find end-hosts
  • Improving Skype

17
Host Identity Payload (HIP)
  • Uses the directory to provide seamless mobility
  • Unlike Mobile IP
  • No home agent needed
  • Self-managing

18
PIER (databases)
  • A relational view of the directory
  • Use SQL to fetch data
  • Standard operations (projection, selection,
    equi-join)

19
Summary
  • DHT is a useful data structure
  • Assumptions mentioned might not be true
  • Moderate amount of dynamism
  • Leave not same thing as failure
  • Dedicated servers
  • Nodes can be trusted
  • Less heterogeneity

20
Chord as Example of DHT
21
How to construct a DHT (Chord)?
  • Use a logical name space, called the identifier
    space, consisting of identifiers 0,1,2,, N-1
  • Identifier space is a logical ring modulo N
  • Every node picks a random identifier though Hash
    H
  • Example
  • Space N16 0,,15
  • Five nodes a, b, c, d, e
  • a picks 6
  • b picks 5
  • c picks 0
  • d picks 11
  • e picks 2

11/14/2009
21
22
Definition of Successor
  • The successor of an identifier is the
  • first node met going in clockwise direction
  • starting at the identifier
  • Example
  • succ(12)14
  • succ(15)2
  • succ(6)6

11/14/2009
22
23
Where to store data (Chord) ?
  • Use globally known hash function, H
  • Each item ltkey,valuegt gets
  • identifier H(key) k
  • Store each item at its successor
  • Node n is responsible for item k
  • Example
  • H(Marina)12
  • H(Peter)2
  • H(Seif)9
  • H(Stefan)14

Store number of items proportional to number of
nodes Typically With D items and n nodes Store
D/n items per node Move D/n items when nodes
join/leave/fail EFFICIENT!
11/14/2009
23
24
Where to point (Chord) ?
  • Each node points to its successor
  • The successor of a node n is succ(n1)
  • Known as a nodes succ pointer
  • Each node points to its predecessor
  • First node met in anti-clockwise direction
    starting at n-1
  • Known as a nodes pred pointer
  • Example
  • 0s successor is succ(1)2
  • 2s successor is succ(3)5
  • 5s successor is succ(6)6
  • 6s successor is succ(7)11
  • 11s successor is succ(12)0

11/14/2009
24
25
DHT Lookup
  • To lookup a key k
  • Calculate H(k)
  • Follow succ pointers until item k is found
  • Example
  • Lookup Seif at node 2
  • H(Seif)9
  • Traverse nodes
  • 2, 5, 6, 11 (BINGO)
  • Return Stockholm to initiator

11/14/2009
25
26
DHT Lookup
  • (a, b the segment of the ring moving clockwise
    from but not including a until and including b
  • n.foo(.) denotes an RPC of foo(.) to node n
  • n.bar denotes and RPC to fetch the value of the
    variable bar in node n
  • We call the process of finding the successor of
    an id a LOOKUP
  • // ask node n to find the successor of id
  • procedure n.findSuccessor(id)
  • if predecessor ? nil ? id ? (predecessor, n
    then return n
  • else if id ?(n, successor then
  • return successor
  • else // forward the query around the circle
  • return successor.findSuccessor(id)

11/14/2009
26
27
DHT Lookup and Update
  • // ask node n to find the successor of id
  • procedure n.put(id,value)
  • s findSuccessor(id)
  • s.store(id,value)
  • procedure n.get(id)
  • s findSuccessor(id)
  • return s.retrieve(id)
  • PUT and GET are nothing but lookups!!

11/14/2009
27
28
Speeding up lookups
  • If only pointer to succ(n1) is used
  • Worst case lookup time is N, for N nodes
  • Improving lookup time (finger/routing table)
  • Point to succ(n1)
  • Point to succ(n2)
  • Point to succ(n4)
  • Point to succ(n8)
  • Point to succ(n2M-1)
  • Distance always halved to
  • the destination

Time to find data is logarithmic Size of routing
tables is logarithmic Example log2(1000000)20 E
FFICIENT!
11/14/2009
28
29
Chord Routing (1/7)
Get(15)
0
15
1
15
  • Routing table size M, where N 2M
  • Every node n knows successor(n 2 i-1) ,for i
    1..M
  • Routing entries log2(N)
  • log2(N) hops from any node to any other node

2
14
13
3
12
4
11
5
10
6
9
7
8
11/14/2009
29
30
Chord Routing (2/7)
0
15
1
15
  • Routing table size M, where N 2M
  • Every node n knows successor(n 2 i-1) ,for i
    1..M
  • Routing entries log2(N)
  • log2(N) hops from any node to any other node

2
14
13
3
12
4
11
5
10
6
9
Get(15)
7
8
11/14/2009
30
31
Chord Routing (3/7)
Get(15)
0
15
1
15
  • Routing table size M, where N 2M
  • Every node n knows successor(n 2 i-1) ,for i
    1..M
  • Routing entries log2(N)
  • log2(N) hops from any node to any other node

2
14
13
3
12
4
11
5
10
6
9
7
8
11/14/2009
31
32
Chord Routing (4/7)
Get(15)
0
15
1
15
  • From node 1, only 2 hops to node 0 where item 15
    is stored
  • For an id space of 16 is, the maximum is log2(16)
    4 hops between any two nodes
  • In fact, if nodes are uniformly distributed, the
    maximum is log2( of nodes), i.e. log2(8) hops
    between any two nodes
  • The average complexity is
  • ½ log(nodes)

2
14
13
3
12
4
11
5
10
6
9
7
8
11/14/2009
32
33
Chord Routing (5/7) Pseudo code
findSuccessor(.)
  • // ask node n to find the successor of id
  • procedure n.findSuccessor(id)
  • if predecessor ? nil ? id ? (predecessor, n
    then return n
  • if id ?(n, successor then
  • return successor
  • else
  • n closestPrecedingNode(id)
  • return n.findSuccessor(id)
  • // search locally for the highest predecessor of
    id
  • procedure closestPrecedingNode(id)
  • for i m downto 1 do
  • if fingeri ?(n, id) then return
    fingeri
  • end
  • return n

11/14/2009
33
34
Chord Discussion
  • We are basically done
  • But.
  • What about joins and failures/leaves?
  • Nodes come and go as they wish
  • What about data?
  • Should I lose my doc because some kid decided to
    shut down his machine and he happened to store my
    file? What about storing addresses of files
    instead of files?
  • What did we gain compared to Gnutella? Increased
    guarantees and determinism?
  • So actually we just started..

11/14/2009
34
35
Agenda
  • Handling successor pointers
  • Joins, Leaves
  • Scalability
  • Routing table reducing the cost from O(N) to
    O(logN)
  • Failures (for all the above)

11/14/2009
35
36
Handling SuccessorsRing maintenance
  • Every thing depends on successor pointers, so, we
    better have them right all the time!!
  • In Chord, in addition to the successor pointer,
    every node has a predecessor pointer as well for
    ring maintenance

11/14/2009
36
37
Handling Dynamism
  • Periodic stabilization is used to make pointers
    eventually correct
  • Try pointing succ to closest alive successor
  • Try pointing pred to closest alive predecessor
  • When receiving notify(p) at n
  • if prednil or p is in (pred,n
  • set predp
  • Periodically at n
  • vsucc.pred
  • if v?nil and v is in (n,succ
  • set succv
  • send a notify(n) to succ

11/14/2009
37
38
Handling joins
  • When n joins
  • Find ns successor with lookup(n)
  • Set succ to ns successor
  • Stabilization fixes the rest

15
13
11
  • Periodically at n
  • set vsucc.pred
  • if v?nil and v is in (n,succ
  • set succv
  • send a notify(n) to succ
  • When receiving notify(p) at n
  • if prednil or p is in (pred,n
  • set predp

11/14/2009
S. Haridi, ID2210, Lecture 02
38
39
Handling Successors - Chord Algorithm
nil
11/14/2009
39
40
Handling Join/Leaves For FingersFinger
Stabilization (1/5)
  • Periodically refresh finger table entries, and
    store the index of the next finger to fix
  • This is also the initialization procedure for the
    finger table
  • Local variable next initially 0
  • procedure n.fixFingers()next next1if next gt
    m then next 1fingernext findSuccessor(n
    ? 2next-1)

11/14/2009
40
41
Examplefinger stabilization (2/5)
  • Current situation succ(N48) is N60
  • Succ(N21.Fingerj.start) Succ(53)
    N21.Fingerj.node N60

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N48
N53
11/14/2009
41
42
Examplefinger stabilization (3/5)
  • New node N56 joins and stabilizes successor
    pointer
  • Finger j of node N21 is wrong
  • N53 eventually try to fix finger j by looking up
    53 which stops at N48, however and nothing
    changes

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N48
N53
N56
11/14/2009
42
43
Examplefinger stabilization (4/5)
  • N48 will eventually stabilize its successor
  • This means the ring is correct now.

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N56
N48
N53
11/14/2009
43
44
Examplefinger stabilization (5/5)
  • When N21 tries to fix Finger j again, this time
    the response from N48 will be correct and N21
    corrects the finger

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N56
N48
N53
11/14/2009
44
45
Agenda
  • Handling successor pointers
  • Joins, Leaves,
  • Scalability
  • Routing table reducing the cost from O(N) to
    O(log N)
  • Failures (for all the above)
  • Handling data
  • Joins, Leaves

11/14/2009
45
46
Handling Failures Replication of Successors
  • Evidently the failure of one successor pointer
    means total collapse
  • Solution A node has a successors list of size
    r containing the immediate r successors
  • How big should r be? log(N) or a large constant
    should be ok
  • Enhance periodic stabilization to handle failures

11/14/2009
46
47
Dealing with failures
  • Each node keeps a successor-list
  • Pointer to r closest successors
  • succ(n1)
  • succ(succ(n1)1)
  • succ(succ(succ(n1)1)1)
  • ...
  • If successor fails
  • Replace with closest alive successor
  • If predecessor fails
  • Set pred to nil

11/14/2009
47
48
Handling leaves
  • When n leaves
  • Just dissappear (like failure)
  • When pred detected failed
  • Set pred to nil
  • When succ detected failed
  • Set succ to closest alive in successor list

15
13
11
  • Periodically at n
  • set vsucc.pred
  • if v?nil and v is in (n,succ
  • set succv
  • send a notify(n) to succ
  • When receiving notify(p) at n
  • if prednil or p is in (pred,n
  • set predp

11/14/2009
S. Haridi, ID2210, Lecture 02
48
49
Handling Failures- Ring (1/5)
  • Maintaining the ring
  • Each node maintains a successor list of length r
  • If a nodes immediate successor fails, it uses
    the second entry in its successor list
  • updateSuccessorList copies a successor list from
    s removing last entry, and prepending s
  • Join a Chord containing node n
  • procedure n.join(n) predecessor nil s
    n.findSuccessor(n) updateSuccessorList(s.success
    orList)

11/14/2009
S. Haridi, ID2210, Lecture 02
49
50
Handling Failures- Ring (2/5)
  • Check whether predecessor has failed (Failure
    detector)
  • procedure n.checkPredecessor()if predecessor
    has failed then predecessor nil

11/14/2009
50
51
Handling Failures- Ring (3/5)
  • procedure n.stabilize()
  • s Find first alive node in successorList
  • x s.predecessorif x not nil and x ? (n, s)
    then s x endupdateSuccessorList(s.successorLis
    t) s.notify(n)
  • procedure n.notify(n)if predecessor nil or
    n? (predecessor, n) then predecessor n

11/14/2009
51
52
Failure Ring (4/5)Example Node failure (N26)
  • Initially

suc(N21,2)
suc(N21,1)
suc(N26,1)
N32
N26
N21
pred(N32)
pred(N32)
  • After N21 performed stabilize(), before
    N21.notify(N32)

suc(N21,1)
N32
N26
N21
pred(N32)
11/14/2009
52
53
Failure Ring (5/5)Example - Node failure
(N26)
  • After N21 performed stabilize(), before
    N21.notify(N32)
  • N21.notify(N32) has no effect

suc(N21,1)
N32
N26
N21
pred(N32)
  • After N32.checkPredecessor()

suc(N21,1)
N32
N26
N21
  • Next N21.stabilize() fixes N32s predecessor

11/14/2009
53
54
Failure Lookups (1/5)
  • // ask node n to find the successor of id
  • procedure n.findSuccessor(id)
  • if id ?(n, successor then
  • return successor
  • else
  • n closestPreceedingNode(id)
  • return try
    n.findSuccessor(id) catch failure of n
    then mark n in finger. as
    failed n.findSuccessor(id)
  • // search locally for the highest predecessor of
    id
  • procedure closestPreceedingNode(id)
  • for i m downto 1 do
  • if fingeri.node is alive and
    fingeri ?(n, id) then return fingeri
  • end
  • return n

11/14/2009
54
55
Variations of Chord
  • DKS
  • Chord

56
DKS Routing
  • Generalization of Chord to provide arbitrary
    arity
  • Provide logk(n) hops per lookup
  • k being a configurable parameter
  • n being the number of nodes
  • Instead of only log2(n)

57
Achieving logk(n) lookup
  • Each node logk(N)L levels, NkL
  • Each level contains k intervals,
  • Example, k4, N64 (43), node 0

0
4
8
12
48
16
32
58
Achieving logk(n) lookup
  • Each node logk(N) levels, NkL
  • Each level contains k intervals,
  • Example, k4, N64 (43), node 0

0
4
8
12
48
16
32
59
Achieving logk(n) lookup
  • Each node logk(N) levels, NkL
  • Each level contains k intervals,
  • Example, k4, N64 (43), node 0

0
4
8
12
48
16
32
60
Arity is Important
  • Maximum number of hops can be configured
  • Example, a 2-hop system

61
Chord
  • The routing table has exponentially increasing
    pointers on the ring (node space) and NOT the
    identifier space (skip-list like structure)

62
Routing Table of Chord
  • Building the routing table
  • log2N pointers
  • exponentially spaced pointers

Chord
63
Chord vs. Chord
Good for load balancing
Write a Comment
User Comments (0)
About PowerShow.com