Title: Building Very Large Overlay Networks
1Building Very Large Overlay Networks
Jorg Liebeherr University of Virginia
2HyperCast Project
- HyperCast is a set of protocols for large-scale
overlay multicasting and peer-to-peer networking
- Research problems
- Investigate which topologies are best suited for
- a large number of peers that spuriously join
and leave in a network that is dynamically
changing - Make it easier to write applications for
overlays easier by - providing suitable abstractions (overlay
socket) - shielding application programmer from overlay
topology issues
3Acknowledgements
- Team
- Past Bhupinder Sethi, Tyler Beam, Burton
Filstrup, Mike Nahas, Dongwen Wang, Konrad
Lorincz, Jean Ablutz, Haiyong Wang, Weisheng Si,
Huafeng Lu - Current Jianping Wang, Guimin Zhang, Guangyu
Dong, Josh Zaritsky - This work is supported in part by the National
Science Foundation
D E N A L I
4Applications with many receivers
Sensor networks
Number of Senders
1,000,000
Peer-to-PeerApplications
1,000
Games
10
Streaming
CollaborationTools
SoftwareDistribution
1
10
1,000
1,000,000
Numberof Receivers
5Need for Multicasting ?
- Maintaining unicast connections is not feasible
- Infrastructure or services needs to support a
send to group
6Problem with Multicasting
NAK
NAK
NAK
NAK
NAK
- Feedback Implosion A node is overwhelmed with
traffic or state - One-to-many multicast with feedback (e.g.,
reliable multicast) - Many-to-one multicast (Incast)
7Multicast support in the network infrastructure
(IP Multicast)
- Reality Check
- Deployment has encountered severe scalability
limitations in both the size and number of groups
that can be supported - IP Multicast is still plagued with concerns
pertaining to scalability, network management,
deployment and support for error, flow and
congestion control
8Overlay Multicasting
- Logical overlay resides on top of the layer-3
network infrastructure (underlay) - Data is forwarded between neighbors in the
overlay - Transparent to underlay network
- Overlay topology should match the layer-3
infrastructure
9Overlay-based approaches for multicasting
- Build an overlay mesh network and embed trees
into the mesh - Narada (CMU)
- RMX/Gossamer (UCB)
- more
- Build a shared tree
- Yallcast/Yoid (NTT, ACIRI)
- AMRoute (Telcordia, UMD College Park)
- Overcast (MIT)
- Nice (UMD)
- more
- Build an overlay using distributed hash tables
and embed trees - Chord (UCB, MIT)
- CAN (UCB, ACIRI)
- Pastry/Scribe (Rice, MSR)
- more
10HyperCast Overlay Topologies
- Applications organize themselves to form a
logical overlay network with a given topology - Data is forwarded along the edges of the overlay
network
11Programming in HyperCast Overlay Socket
- Socket-based API
- UDP (unicast or multicast) or TCP
- Supports different semantics for transport of
data - Supports different overlay protocols
- Implementation in Java
- HyperCast Project website http//www.cs.virgini
a.edu/hypercast
12 HyperCast Software Data Exchange
- Each overlay socket has 2 adapters to underlying
network - for protocol to manage the overlay topology
- for data transfer
13Network of overlay sockets
- Overlay socket is an endpoint of communication in
an overlay network - An overlay network is a collection of overlay
sockets - Overlay sockets in the same overlay network have
a common set of attributes - Each overlay network has an overlay ID, which can
be used the access attributes of overlay - Nodes exchange data with neighbors in the overlay
topology
14Unicast and Multicast in overlays
- Unicast and multicast is done along trees that
are embedded in the overlay network. - Requirement Overlay node must be able to
compute the child nodes and parent node with
respect to a given root
15 Overlay Message Format
Loosely modeled after IPv6 ? minimal header with
extensions
Version (4 bit) Version of the protocol
(current Version is 0x0)LAS (2 bit) Size of
logical address fieldDmd (4bit) Delivery Mode
(Multicast, Flood, Unicast, Anycast)Traffic
Class (8 bit) Specifies Quality of Service class
(default 0x00)Flow Label (8 bit) Flow
identifierNext Header (8 bit) Specifies the
type of the next header following this header OL
Message Length (8 bit) Specifies the type of the
next header following this header.Hop Limit (16
bit) TTL field Src LA ((LAS1)4
bytes) Logical address of the source Dest LA
((LAS1)4 bytes Logical address of the
destination
16Socket Based API
- Tries to stay close to Socket API for UDP
Multicast - Program is independent of overlay topology
//Generate the configuration object
OverlayManager om new OverlayManager(hypercast
.prop) String overlayID om.getDefaultProperty
(MyOverlayID")OverlaySocketConfig config new
om.getOverlaySocketConfig(overlayID)
//create an overlay socket OL_Socket socket
config.createOverlaySocket(callback) //Join an
overlay socket.joinGroup() //Create a message
OL_Message msg socket.createMessage(byte
data, int length) //Send the message to all
members in overlay network socket.sendToAll(msg)
//Receive a message from the socket
OL_Message msg socket.receive() //Extract
the payload byte data msg.getPayload()
17Property File
- Stores attributes that configure the overlay
socket (overlay protocol, transport protocol and
addresses)
- This is the Hypercast Configuration File
-
- Â
- LOG FILE
- LogFileName stderr
- Â
- ERROR FILE
- ErrorFileName stderr
- Â
- OVERLAY ServerÂ
- OverlayServer
- OVERLAY ID
- OverlayID 224.228.19.78/9472
- KeyAttributes Socket,Node,SocketAdapter
- Â
- SOCKET
- Socket HCast2-0
- HCAST2-0.TTL 255
SOCKET ADAPTER SocketAdapter
TCPÂ SocketAdapter.TCP.MaximumPacketLength
16384 SocketAdapter.UDP.MessageBufferSize
100 NODE Node HC2-0 HC2-0.SleepTime
400 HC2-0.MaxAge 5 HC2-0.MaxMissingNeighbor
10 HC2-0.MaxSuppressJoinBeacon 3 Â NODE
ADAPTER NodeAdapter UDPMulticast  NodeAdapte
r.UDP.MaximumPacketLength 8192 NodeAdapter.UDP.M
essageBufferSize 18 NodeAdapter.UDPServer.UdpSer
ver0 128.143.71.508081 NodeAdapter.UDPServer.Ma
xTransmissionTime 1000 NodeAdapter.UDPMulticastA
ddress 224.242.224.243/2424
18Hypercast Software Demo Applications
Distributed Whiteboard
Multicast file transfer
Data aggregation in P2P Net ? Homework
assignment in CS757 (Computer Networks)
19Application Emergency Response Network for
Arlington County, Virginia
Project directed by B. Horowitz and S. Patek
20Application of P2P Technology to Advanced
Emergency Response Systems
21Other Features of Overlay Socket
- Current version (2.0)
- Statistics Each part of the overlay socket
maintains statistics which are accessed using a
common interface - Monitor and control XML based protocol for
remote access of statistics and remote control of
experiments - LoToS Simulator for testing and visualization
of overlay protocols - Overlay Manager Server for storing and
downloading overlay configurations - Next version (parts are done, release in Summer
2004) - MessageStore Enhanced data services, e.g.,
end-to-end reliability, persistence, streaming,
etc. - HyperCast for mobile ad-hoc networks on handheld
devices - Service differentiation for multi-stream delivery
- Clustering
- Integrity and privacy with distributed key
management
22HyperCast Approach
- Build overlay network as a graph with known
properties - N-dimensional (incomplete) hypercube
- Delaunay triangulation
- Advantages
- Achieve good load-balancing
- Exploit symmetry
- Next-hop routing in the overlay is free
- Claim Can improve scalability of multicast and
peer-to-peer networks by orders of magnitude over
existing solutions
23Delaunay Triangulation Overlays
24Nodes in a Plane
Nodes are assigned x-y coordinates (e.g., based
on geographic location)
25Voronoi Regions
The Voronoi region of a node is the region of the
plane that is closer to this node than to any
other node.
26Delaunay Triangulation
The Delaunay triangulation has edges between
nodes in neighboring Voronoi regions.
27Delaunay Triangulation
An equivalent definitionA triangulation such
that each circumscribing circle of a triangle
formed by three vertices, no vertex of is in the
interior of the circle.
28Locally Equiangular Property
- Sibson 1977 Maximize the minimum angle
- For every convex quadrilateral formed by
triangles ACB and ABD that share a common edge
AB, the minimum internal angle of triangles ACB
and ABD is at least as large as the minimum
internal angle of triangles ACD and CBD.
29Next-hop routing with Compass Routing
- A nodes parent in a spanning tree is its
neighbor which forms the smallest angle with the
root. - A node need only know information on its
neighbors no routing protocol is needed for the
overlay.
A
30
Root
Node
15
B
B is the Nodes Parent
30Spanning tree when node (8,4) is root. The tree
can be calculated by both parents and children.
4,9
4,9
10,8
0,6
8,4
5,2
12,0
12,0
31Evaluation of Delaunay Triangulation overlays
- Delaunay triangulation can consider location of
nodes in an (x,y) plane, but is not aware of the
network topology - Question How does Delaunay triangulation
compare with other overlay topologies?
32Hierarchical Delaunay Triangulation
- 2-level hierarchy of Delaunay triangulations
- The node with the lowest x-coordinate in a domain
DTis a member in 2 triangulations
33Multipoint Delaunay Triangulation
- Different (implicit) hierarchical organization
- Virtual nodes are positioned to form a
bounding box around a cluster of nodes. All
traffic to nodes in a cluster goes through one
of the virtual nodes
34Overlay Topologies
- Delaunay Triangulation and variants
- DT
- Hierarchical DT
- Multipoint DT
- Hypercube
- Degree-6 Graph
- Similar to graphs generated in Narada
- Degree-3 Tree
- Similar to graphs generated in Yoid
- Logical MST
- Minimum Spanning Tree
overlays used by HyperCast
overlays that assume knowledge of
network topology
35Transit-Stub Network
- Transit-Stub
- GeorgiaTech topology generator
- 4 transit domains
- 4?16 stub domains
- 1024 total routers
- 128 hosts on stub domain
36Evaluation of Overlays
- Simulation
- Network with 1024 routers (Transit-Stub
topology) - 2 - 512 hosts
- Performance measures for trees embedded in an
overlay network - Degree of a node in an embedded tree
- Stretch Ratio of delay in overlay to shortest
path delay - Stress Number of duplicate transmissions
over a physical link
37Illustration of Stress and Stretch
A
B
Stretch for A?B 1.5
38Average Stretch
Stretch (90th Percentile)
3990th Percentile of Stretch
Delaunay triangulation
Stretch (90th Percentile)
40Average Stress
Delaunay triangulation
4190th Percentile of Stress
Delaunay triangulation
42The DT Protocol
- Protocol which organizes members of a network in
a Delaunay Triangulation - Each member only knows its neighbors
- soft-state protocol
- Topics
- Nodes and Neighbors
- Example A node joins
- State Diagram
- Rendezvous
- Measurement Experiments
43Each node sends Hello messages to its neighbors
periodically
4,9
10,8
0,6
5,2
12,0
44- Each Hello contains the clockwise (CW) and
counterclockwise (CCW) neighbors - Receiver of a Hello runs a Neighbor test (?
locally equiangular prop.) - CW and CCW are used to detect new neighbors
4,9
10,8
Neighborhood Table of 10,8
0,6
Neighbor
CCW
CW
5,2 12,0 4,9 4,9 5,2 12,0 10,8
5,2
12,0
45A node that wants to join the triangulation
contacts a node that is close
4,9
10,8
0,6
5,2
12,0
46Node (5,2) updates its Voronoi region, and the
triangulation
47(5,2) sends a Hello which contains info for
contacting its clockwise and counterclockwise
neighbors
4,9
10,8
0,6
8,4
5,2
12,0
48(8,4) contacts these neighbors ...
4,9
4,9
10,8
0,6
8,4
5,2
12,0
12,0
49 which update their respective Voronoi regions.
4,9
12,0
50(4,9) and (12,0) send Hellos and provide info
for contacting their respective clockwise and
counterclockwise neighbors.
4,9
4,9
10,8
0,6
8,4
5,2
12,0
12,0
51(8,4) contacts the new neighbor (10,8) ...
4,9
4,9
10,8
10,8
0,6
8,4
5,2
12,0
12,0
52which updates its Voronoi region...
4,9
4,9
10,8
0,6
8,4
5,2
12,0
12,0
53and responds with a Hello
4,9
4,9
10,8
0,6
8,4
5,2
12,0
12,0
54This completes the update of the Voronoi regions
and the Delaunay Triangulation
4,9
12,0
55Rendezvous Methods
- Rendezvous Problems
- How does a new node detect a member of the
overlay? - How does the overlay repair a partition?
- Three solutions
- Announcement via broadcast
- Use of a rendezvous server
- Use likely members (Buddy List)
56Rendezvous Method 1 Announcement via broadcast
(e.g., using IP Multicast)
4,9
10,8
0,6
5,2
12,0
57Rendezvous Method 1 A Leader is a node with a
Y-coordinate higher than any of its neighbors.
Leader
58Rendezvous Method 2 New node and leader contact
a rendezvous server. Server keeps a cache of some
other nodes
4,9
10,8
0,6
Server
5,2
12,0
59Rendezvous Method 3 Each node has a list of
likely members of the overlay network
4,9
10,8
0,6
5,2
12,0
60State Diagram of a Node
61Sub-states of a Node
- A node is stable when all nodes that appear in
the CW and CCW neighbor columns of the
neighborhood table also appear in the neighbor
column
62Measurement Experiments
- Experimental Platform Centurion cluster at UVA
(cluster of 300 Linux PCs) - 2 to 10,000 overlay members
- 1100 members per PC
- Random coordinate assignments
63Experiment Adding Members
How long does it take to add M members to an
overlay network of N members ?
Time to Complete (sec)
MN members
64Experiment Throughput of Multicasting
100 MB bulk transfer for N2-100 members (1 node
per PC) 10 MB bulk transfer for N20-1000
members (10 nodes per PC)
Bandwidth bounds(due to stress)
Average throughput (Mbps)
Measuredvalues
Number of Members N
65Experiment Delay
100 MB bulk transfer for N2-100 members (1 node
per PC) 10 MB bulk transfer for N20-1000
members (10 nodes per PC)
Delay of a packet (msec)
Number of Nodes N
66Wide-area experiments
- PlanetLab worldwide network of Linux PCs
- Obtain coordinates via triangulation
- Delay measurements to 3 PCs
- Global Network Positioning (Eugene Ng, CMU)
67Planetlab Video Streaming Demo
68Summary
- Overlay socket is general API for programming P2P
systems - Several proof-of-concept applications
- Intensive experimental testing
- Local PC cluster (on 100 PCs)
- 10,000 node overlay in lt 1 minute
- Throughput of 2 Mbps for 1,000 receivers
- PlanetLab (on 60 machines)
- Up to 2,000 nodes
- Video streaming experiment with 80 receivers at
800 kbps had few losses - HyperCast (Version 2.0) site http//www.cs.virgi
nia.edu/hypercast - Design documents, download software, user manual