Title: Reliable multicast from end-to-end solutions to active solutions
1Reliable multicast from end-to-end solutions
to active solutions
- C. Pham
- RESO/LIP-Univ. Lyon 1, France
- DEA DIF Nov. 13th, 2002
- ENS-Lyon, France
2QA
- Q1 How many people in the audience have heard
about multicast? - Q2 How many people in the audience know
basically what multicast is? - Q3 How many people in the audience have ever
tried multicast technologies? - Q4 How many people think they need multicast?
3My guess on the answers
- Q1 How many people in the audience have heard
about multicast? - 100
- Q2 How many people in the audience know
basically what multicast is? - about 40
- Q3 How many people in the audience have ever
tried multicast technologies? - 0
- Q4 How many people think they need multicast?
- 0
4Purpose of this tutorial
- Provide a comprehensive overview of current
multicast technologies - Show the evolution in multicast technologies
- Achieve 100, 100, 30 and 70 to the previous
answers next time!
5How multicast can change the way people use the
Internet?
multicast!
Everybody's talking about multicast! Really
annoying ! Why would I need multicast for by the
way?
multicast!
multicast!
multicast!
multicast!
multicast!
multicast!
multicast!
multicast!
multicast!
multicast!
multicast!
alone
multicast!
multicast!
multicast!
6From unicast
Sender
- Problem
- Sending same data to many receivers via unicast
is inefficient - Example
- Popular WWW sites become serious bottlenecks
data
data
data
data
data
data
Receiver
Receiver
Receiver
7to multicast on the Internet.
Sender
- Not n-unicast from the sender perspective
- Efficient one to many data distribution
- Towards low latence, high bandwidth
data
IP multicast
data
data
data
Receiver
Receiver
Receiver
8New applications for the Internet
Think about
- high-speed www
- video-conferencing
- video-on-demand
- interactive TV programs
- remote archival systems
- tele-medecine, white board
- high-performance computing, grids
- virtual reality, immersion systems
- distributed interactive simulations/gaming
9A whole new world for multicast
10A very simple example
- File replication
- 10MBytes file
- 1 source, n receivers (replication sites)
- 512KBits/s upstream access
- n100
- Tx 4.55 hours
- n1000
- Tx 1 day 21 hours 30 mins!
11A real example LHC (DataGrid)
1 TIPS 25,000 SpecInt95 PC (1999) 15
SpecInt95
Online System
100 MBytes/sec
PBytes/sec
Offline Farm20 TIPS
Bunch crossing per 25 nsecs.100 triggers per
secondEvent is 1 MByte in size
100 MBytes/sec
CERN Computer Center gt 20 TIPS
Tier 0
622 Mbits/sec
or Air Freight
Tier 1 4 TIPS
Fermilab
France Regional Center
Italy Regional Center
UK Regional Center
2.4 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Physicists work on analysis channels. Each
institute has 10 physicists working on one or
more channels Data for these channels should be
cached by the institute server
Institute 0.25TIPS
Institute
Institute
Institute
Physics data cache
100 - 1000 Mbits/sec
Workstations
source DataGrid
12Multicast for computational grids
application user
from Dorian Arnold Netsolve Happenings
13Some grid applications
Astrophysics Black holes, neutron stars,
supernovae
Mechanics Fluid dynamic, CAD, simulation.
Distributed interactive simulations DIS,
HLA,Training.
Chemistrybiology Molecular simulations,
Genomic simulations.
14Reliable multicast a big win for grids
Data replications Code data transfers,
interactive job submissions Data communications
for distributed applications (collective gather
operations, sync. barrier) Databases, directories
services
SDSC IBM SP 1024 procs 5x12x17 1020
224.2.0.1
NCSA Origin Array 256128128 5x12x(422) 480
CPlant cluster 256 nodes
Multicast address group 224.2.0.1
15From reliable multicast to Nobel prize!
OK! Resource Estimator Says need 5TB, 2TF. Where
can I do this?
Resource Broker LANL is best match but down for
the moment
From President_at_earth.org Congratulations, you
have done a great job, it's the discovery of the
century!! The phenomenon was short but we manage
to react quickly. This would have not been
possible without efficient multicast facilities
to enable quick reaction and fast distribution of
data. Nobel Prize is on the way -)
Resource Broker 7 sites OK, but need to send
data fast
16Wide-area interactive simulations
computer-based sub-marine simulator
display
INTERNET
battle field simulation
human in the loop flight simulator
17The challenges of multicast
SCALABILITY
SCALABILITY
SCALABILITY
SCALABILITY
18Part I
19A look back in history of multicast
- History
- Long history of usage on shared medium networks
- Data distribution
- Resource discovery ARP, Bootp, DHCP
- Ethernet
- Broadcast (software filtered)
- Multicast (hardware filtered)
- Multiple LAN multicast protocols
- DECnet, AppleTalk, IP
20IP Multicast - Introduction
- Efficient one to many data distribution
- Tree style data distribution
- Packets traverse network links only once
- replication/multicast engine at the network layer
- Location independent addressing
- IP address per multicast group
- Receiver-oriented service model
- Receivers subscribe to any group
- Senders do not know who is listening
- Routers find receivers
- Similar to television model
- Contrasts with telephone network, ATM
21The Internet group model
- multicast/group communications means...
- 1 ? n as well as n ? m
- a group is identified by a class D IP address
(224.0.0.0 to 239.255.255.255) - abstract notion that does not identify any host!
from V. Roca
22Example video-conferencing
from UREC, http//www.urec.fr
23The Internet group model... (cont)
- local-area multicast
- use the potential diffusion capabilities of the
physical layer (e.g. Ethernet) - efficient and straightforward
- wide-area multicast
- requires to go through multicast routers, use
IGMP/multicast routing/...(e.g. DVMRP, PIM-DM,
PIM-SM, PIM-SSM, MSDP, MBGP, BGMP, etc.) - routing in the same administrative domain is
simple and efficient - inter-domain routing is complex, not fully
operational
from V. Roca
24IP Multicast Architecture
Service model
Hosts
Host-to-router protocol(IGMP)
Routers
Multicast routing protocols(various)
25Multicast and the TCP/IP layered model
Application
higher-level services
user space kernel space
Socket layer
UDP
TCP
multicast routing
IP / IP multicast
ICMP
IGMP
device drivers
from V. Roca
26Internet Group Management Protocol
- IGMP signaling protocol to establish,
maintain, remove groups on a subnet. - Objective keep router up-to-date with group
membership of entire LAN - Routers need not know who all the members are,
only that members exist - Each host keeps track of which mcast groups are
subscribed to - Socket API informs IGMP process of all joins
27IGMP subscribe to a group (1)
224.2.0.1 224.5.5.5
224.2.0.1
224.2.0.1
Host 1
Host 2
Host 3
periodically sendsIGMP Query at 224.0.0.1
empty
224.0.0.1 reach all multicast host on the subnet
from UREC
28IGMP subscribe to a group (2)
somebody has already subscribed for the group
224.2.0.1 224.5.5.5
224.2.0.1
224.2.0.1
Host 1
Host 2
Host 3
Sends Reportfor 224.2.0.1
224.2.0.1
from UREC
29IGMP subscribe to a group (3)
224.2.0.1 224.5.5.5
224.2.0.1
224.2.0.1
Host 1
Host 2
Host 3
Sends Reportfor 224.5.5.5
224.2.0.1 224.5.5.5
from UREC
30Data distribution example
224.2.0.1 224.5.5.5
224.2.0.1
224.2.0.1
Host 1
Host 2
Host 3
224.2.0.1 224.5.5.5
OK
data 224.2.0.1
from UREC
31IGMP leave a group (1)
224.2.0.1 224.5.5.5
224.2.0.1
Host 1
Host 2
Host 3
Sends Leavefor 224.2.0.1 at 224.0.0.2
224.2.0.1 224.5.5.5
224.0.0.2 reach the multicast enabled router in
the subnet
from UREC
32IGMP leave a group (2)
224.2.0.1 224.5.5.5
224.2.0.1
Host 1
Host 2
Host 3
Sends IGMP Query for 224.2.0.1
224.2.0.1 224.5.5.5
from UREC
33IGMP leave a group (3)
Hey, I'm still here!
224.2.0.1 224.5.5.5
224.2.0.1
Host 1
Host 2
Host 3
Sends Report for 224.2.0.1
224.2.0.1 224.5.5.5
from UREC
34IGMP leave a group (4)
224.2.0.1
224.2.0.1
Host 1
Host 2
Host 3
Sends Leavefor 224.5.5.5 at 224.0.0.2
224.2.0.1 224.5.5.5
from UREC
35IGMP leave a group (5)
224.2.0.1
224.2.0.1
Host 1
Host 2
Host 3
Sends IGMP Query for 244.5.5.5
224.2.0.1
from UREC
36Part II
37User perspective of the Internet
from UREC, http//www.urec.fr
38Links the basic element in networks
- Backbone links
- optical fibers
- 2.5 to 160 GBits/s with DWDM techniques
- End-user access
- 9.6Kbits/s (GSM) to 2Mbits/s (UMTS) V.90
56Kbits/s modem on twisted pair - 64Kbits/s to 1930Kbits/s ISDN access
- 512Kbits/s to 2Mbits/s with xDSL modem
- 1Mbits/s to 10Mbits/s Cable-modem
- 155Mbits/s to 2.5Gbits/s SONET/SDH
39Routers key elements of internetworking
- Routers
- run routing protocols and build routing table,
- receive data packets and perform relaying,
- may have to consider Quality of Service
constraints for scheduling packets, - are highly optimized for packet forwarding
functions.
40The Wild Wild Web
heterogeneity, link failures, congested
routers packet loss, packet drop, bit errors
important data
?
41Multicast difficulties
- At the routing level
- management of the group address (IGMP)
- dynamic nature of the group membership
- construction of the multicast tree (DVMRP, PIM,
CBT) - multicast packet forwarding
- At the transport level
- reliability, loss recovery strategies
- flow control
- congestion avoidance
42Reliability Models
- Reliability gt requires redundancy to recover
from uncertain loss or other failure modes. - Two types of redundancy
- Spatial redundancy independent backup copies
- Forward error correction (FEC) codes
- Problem requires huge overhead, since the FEC is
also part of the packet(s) it cannot recover from
erasure of all packets - Temporal redundancy retransmit if packets
lost/error - Lazy trades off response time for reliability
- Design of status reports and retransmission
optimization important
43Temporal Redundancy Model
Packets
- Sequence Numbers
- CRC or Checksum
Timeout
Status Reports
Retransmissions
44Part III
45End-to-end solutions for reliability
- Sender-reliable
- Sender detects packet losses by gap in ACK
sequence - Easy resource management
- Receiver-reliable
- Receiver detect the packet losses and send NACK
towards the source
46Challenge Reliable multicast scalability
- many problems arise with 10,000 receivers...
- Problem 1 scalable control traffic
- ACK each data packet (Ã la TCP)...oops,
10000ACKs/pkt! - NAK (negative ack) only if failure... oops, if
pkt is lost close to src,10000 NAKs!
NACK4
NACK4
NACK4
NACK4
NACK4
NACK4
NACK4
source
source implosion!
NACK4
47Challenge Reliable multicast scalability
- problem 2 exposure
- receivers may receive several time the same packet
48One example SRMScalable Reliable Multicast
- Receiver-reliable
- NACK-based
- Not much per-receiver state at the sender
- Every member may multicast NACK or retransmission
49SRM (cont)
- NACK/Retransmission suppression
- Delay before sending
- Based on RTT estimation
- Deterministic Stochastic
- Periodic session messages
- Sequence number detection of loss
- Estimation of distance matrix among members
50SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
51SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
52SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
53SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
54SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
55SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
56SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
57SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
58SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
59SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
60SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
61SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
62SRM Request Suppression
Src
from Haobo Yu , Christos Papadopoulos
63Deterministic Suppression
3d
time
d
data
2d
session msg
d
d
d
repair
nack
3d
d
4d
requestor
Delay C1?dS,R
from Haobo Yu , Christos Papadopoulos
64SRM Star Topology
Src
from Haobo Yu , Christos Papadopoulos
65SRM Stochastic Suppression
0
time
d
data
1
d
repair
session msg
d
2
NACK
d
3
Delay U0,C2 ? dS,R
requestor
from Haobo Yu , Christos Papadopoulos
66Whats missing?
- Losses at link (A,C) causes retransmission to the
whole group - Only retransmit to those members who lost the
packet - Only request from the nearest responder
S
A
B
E
F
C
D
from Haobo Yu , Christos Papadopoulos
67Idea Perform Local Recovery with scope limitation
- TTL scoped multicast
- use the TTL field of IP packets to limit the
scope of the repair packet
Src
TTL1
TTL2
TTL3
68Example RMTP
- Reliable Multicast Transport Protocol by Purdue
and ATT Research Labs - Designed for file dissemination (single-sender)
- Deployed in ATTs billing network
from Haobo Yu , Christos Papadopoulos
69RMTP Fixed hierarchy
- Rcvrs grouped into local regions
- Rcvr unicasts periodic ACK to its ACK Processor
(AP), AP unicasts its own ACK to its parent - Rcvr dynamically chooses closest statically
configured Designated Receiver (DR) as its AP
S
A
R2
A
R1
R5
R3
R4
A
A
A
A
A
A
A
router
receiver
A
A
R
DR/AP
from Haobo Yu , Christos Papadopoulos
70RMTP Error control
- DR checks retx request periodically
- Mcast or unicast retransmission
- Based on percentage of requests
- Scoped mcast for local recovery
from Haobo Yu , Christos Papadopoulos
71RMTP Comments
- ? Heterogeneity
- Lossy link or slow receiver will only affect a
local region - ? Position of DR critical
- Static hierarchy cannot adapt local recovery zone
to loss points
from Haobo Yu , Christos Papadopoulos
72Summary reliability problems
- What is the problem of loss recovery?
- feedback (ACK or NACK) implosion
- replies/repairs duplications
- difficult adaptability to dynamic membership
changes - Design goals
- reduces the feedback traffic
- reduces recovery latencies
- improves recovery isolation
73Summary end-to-end solutions
- ACK/NACK aggregation based on timers are
approximative! - TTL-scoped retransmissions are approximative!
- Not really scalable!
- Can not exploit in-network information.
74Part IV
75What is active networks?
- Programmable nodes/routers
- Customized computations on packets
- Standardized execution environment and
programming interface - However, adds extra processing cost
76Motivations behind active networking
- user applications can implement, and deploy
customized services and protocols - specific data filtering criteria (DIS, HLA)
- fast collective and gather operations
- globally better performances by reducing the
amount of traffic - high throughput
- low end-to-end latency
77Active networks implementations
- Discrete approach (operator's approach)
- Adds dynamic deployment features in nodes/routers
- New services can be downloaded into router's
kernel - Integrated approach
- Adds executable code to data packets
- Capsule data code
- Granularity set to the packets
78The discrete approach
- Separates the injection of programs from the
processing of packets
79The integrated approach
- User packets carry code to be applied on the data
part of the packet - High flexibility to define new services
data
80An active router
some layer for executing code. Let's call it
Active Layer
81Where to put active components?
- In the core network?
- routers already have to process millions of
packets per second - gigabit rates make additional processing
difficult without a dramatic slow down - At the edge?
- to efficiently handle heterogeneity of user
accesses - to provide QoS, implement intelligent congestion
avoidance mechanisms
82Users' accesses
residentials
offices
PSTN ADSL Cable
Internet Data Center
metro ring
Network Provider
Network Provider
campus
Internet
83Solutions
- Traditional
- end-to-end retransmission schemes
- scoped retransmission with the TTL fields
- receiver-based local NACK suppression
- Active contributions
- cache of data to allow local recoveries
- feedback aggregation
- subcast
- early lost packet detection
84The reliable multicast universe
85Router supported, active networking
- Routers have specific functionalities/services
for supporting multicast flows. - Active networking goes a step further by opening
routers to dynamic code provided by end-users. - Open new perspectives for efficient in-network
services and rapid deployment.
86A step toward active services LBRM
87Active local recovery
- routers perform cache of data packets
- repair packets are sent by routers, when
available
data
data
data5
data1
data2
data1
data3
data2
data4
data3
data5
data4
data5
data4
data1
data2
data3
data5
88Global NACKs suppression
89Local NACKs suppression
90Early lost packet detection
The repair latency can be reduced if the lost
packet could be requested as soon as possible
These NACKs are ignored!
91Active subcast features
- Send repair packet only to the relevant set of
receivers
92The DyRAM framework(Dynamic Replier Active
Reliable Multicast)
- Motivations for DyRAM
- low recovery latency using local recovery
- low memory usage in routers local recovery is
performed from the receivers (no cache in
routers) - low processing overheads in routers light
active services
93DyRAM's main active services
- DyRAM is NACK-based with
- Global NACK suppression
- Early packet loss detection
- Subcast of repair packets
- Dynamic replier election
94Replier election
- A receiver is elected to be a replier for each
lost packet (one recovery tree per packet) - Load balancing can be taken into account for the
replier election
95Replier election and repair subcast
D0
DyRAM
0
2
1
D1
DyRAM
Repair 2
R1
1
0
R2
R3
R4
R6
R5
R7
96The DyRAM framework for grids
The backbone is very fast so nothing else than
fast forwarding functions.
source
- Nacks suppresion
- Subcast
- Loss detection
1000 Base FX
active router
active router
Any receiver can be elected as a replier for a
loss packet.
core network Gbits rate
active router
A hierarchy of active routers can be used for
processing specific functions at different layers
of the hierarchy.
active router
100 Base FX
active router
- Nacks suppression
- Subcast
- Replier election
97Network model
10 MBytes file transfer
Source router
98Local recovery from the receivers
4 receivers/group
- Local recoveries reduces the end-to-end delay
(especially for high loss rates and a large
number of receivers).
grp 624
p0.25
99Local recovery from the receivers
- As the group size increases, doing the recoveries
from the receivers greatly reduces the bandwidth
consumption
48 receivers distributed in g groups ? grp 224
100DyRAM vs ARM
- ARM performs better than DyRAM only for very low
loss rates and with considerable caching
requirements
101Simulation results
grp 624
4 receivers/group
simulation results very close to those of the
analytical study
EPLD is very beneficial to DyRAM
p0.25
grp 624
102DyRAM implementation
- Preliminary experimental results
103Testbed configuration
- TAMANOIR active execution environment
- Java 1.3.1 and a linux kernel 2.4
- A set of PCs receivers and 2 PC-based routers (
Pentium II 400 MHz 512 KB cache 128MB RAM) - Data packets are of 4KB
104Packets format ANEP
Data /Repair packets
NACK packets
105The data path
106Routers data structures
- The track list TL which maintains for each
multicast session, - lastOrdered the sequence number of the last
received packet in order - lastReceived the sequence number of the last
received data packet - lostList list of not received data packets in
between. - The Nack structure NS that keeps for each lost
data packet, - seq sequene number of the lost data packet
- subList list of IP addresses of the downstream
receivers (active routers) that have lost it.
107The first configuration
ike
resamo
resama
resamd
stan
108Active service costs
- NACK 135µs
- DP 20µs if no seq gap, 12ms-17ms otherwise.
Only 256µs without timer setting - Repair 123µs
109The second configuration
ike
resamo
NACK
110The replier election cost
The election is performed on-the-fly. It depends
on the number of downstream links. ranges from
0.1 to 1ms for 5 to 25 links per router.
111Conclusions
- Reliability on large-scale multicast is
difficult. - End-to-end solutions have to face many critical
problems with approximative solutions - Active services can provide more efficient
solutions. - The main DyRAM design goal is reducing the
end-to-end latencies using active services - Preliminary results are very encouraging