Title: Fair Queuing for Aggregated Multiple Links
1Fair Queuing for Aggregated Multiple Links
- Josep M. Blanquer and Banu Özden
- Proceedings of the ACM SIGCOMM, August 2001
2ABSTRACT
- Fair Queuing algorithms
- Proportionally sharing a single server among
competing flows - Do not address the problem of sharing multiple
servers. - Multiserver applications
- Link aggregation
- Multiprocessors
- Multi-path storage I/O
3- We introduce a new service discipline for
multi-server systems, MSF2Q, that provides
guarantees for competing flows. - We prove that this new service discipline is a
close approximation of the idealized Generalized
Processor Sharing (GPS) discipline. - We calculate its maximum packet delay and service
discrepancy with respect to GPS.
41. INTRODUCTION
- A large increase in networked services ? a much
larger variety of traffic? different network
requirements to be met simultaneously over the
same links. - High bandwidth guarantee ? backups
- low jitter guarantees? video streaming low
delay guarantees ?network data acquisition - Network resources must be appropriately
scheduled.
5- Fair Queuing service disciplines allocates
bandwidth fairly among competing traffic. - Protection from misbehaving traffic
- Effective congestion control
- Better services for rate-adaptive applications
- Strict QoS guarantees, with admission control.
6- Growing demand for bandwidth ? Incremental
scaling techniques? Grouping multiple links into
a single logical interface 3 - Implementations
- 1 3Coms Dynamic Access
- 2 Adaptec Duralink Software Suite
- 12 Hewlett Packards Auto-Port Aggregation
- 14 Intel Load Balancing
- 6 J. Blanquer, al. et. Resource Management for
QoS in Eclipse/BSD, Proceedings of the First
FreeBSD Conference, Berkeley, California, Oct.
1999.
7Adaptec Duralink
8HP Auto Port Aggregation
9Intel Load Balancing
102. BACKGROUND
- GPS (Generalized Processor Sharing)
- Guaranteed fairness
- Wx(t, t) the amount of traffic for flow x
served in the interval t, t, while any flow x
that is continuously backlogged during t, t. - ?x weight of flow x proportion of the
server bandwidth that flow x receives when it
is backlogged. - Guaranteed rate
- ri rate of flow ir server rate
11- Generalized Processor Sharing (GPS)
- An idealized system that serves as a reference
model for the fair queuing disciplines. - The server transmits more than one flow
simultaneously and that the traffic is infinitely
divisible. - A number of packetized approximations to GPS have
been devised. - WFQ (Weighted Fair Queueing) 89 Demers et al.
- VC (Virtual Clock) 90 Zhang
- GPS (General Processor Sharing) 93 Parekh et al.
- SCFQ (Self-Clocked Fair Queueing) 94 Golestani
- WF2Q (Worst-case Fair Weighted Fair Queueing) 96
Bennett et al. - SFQ (Start Time Fair Queueing) 96 Goyal et al.
12 A New Priority Calculation Method for
Sorted-priority Fair Queuing Liu et al., 2004
- B. Current packet priority calculation methods
- Three best known packet priority calculation
methods are 9 - Smallest Finish time First (SFF)
- Packet selection PiX(t) li/?I (li packet
length) - WFQ and SCFQ
- Smallest Start time First (SSF)
- Packet selection PiX(t)
- SFQ
- Smallest Eligible Finish time First (SEFF)
- Pre-selection sessions with session potentials
smaller than the system potential. - Packet selection (SFF) PiX(t) li/?i
- WF2Q
133. PROPORTIONAL SHARING OF MULTISERVER SYSTEMS
- Numerous applications utilizing multi-server
systems that can benefit from service guarantees - Network Multiple network adapters to a web or
file server - Storage Multiple I/O channels to a RAID server
14WFQ
15WFQ
163.1 A Packetized Fair Queuing Discipline for
Multi-Servers
- MSFQs Scheduling discipline is the same as GPS
- When a server is idle and there is a packet
waiting for service, MSFQ schedules the next
packet. - The next packet is defined as the first packet
that would complete service in the (GPS, 1,Nr)
system if no more packets were to arrive. - To compare how well a (MSFQ ,N, r) system
approximates a (GPS, 1,Nr) system, calculate - (i) the worst case delay
- (ii) the traffic discrepancy
173.2 Preliminary Properties
- Delay and service properties of MSFQ do not
trivially follow from the single server case,
WFQ. - GPS and MSFQ busy periods do not coincide.
Nr
Finish Time?1 L / Nr
(GPS, 1,Nr)
Bits left L r (L/Nr) L (L/N) (N-1)L
/ N
r
r
(MSFQ ,N, r)
r
Finish Time ?2 L / r
t
?W(0, t) W (0, t)
18- When GPS is busy, MSFQ is busy. However, the
converse is not true. - Thus for any t , W(0, t) W(0, t),
(2)where W(0, t) and W (0, t) denote the total
number of bits serviced by GPS and MSFQ ,
respectively, by time t. - We will use the term busy period to refer to a
busy period in the reference (GPS, 1,Nr) system.
19- Work from previous busy periods can accumulate
under MSFQ. - This may happen either at the beginning or in the
middle of a busy period.
Arrival Time
Delayed Finish Service Time
20Arrival Time
Delayed Start Service Time
21- Theorem 1 For any t, W(0, t) - W (0, t) (N
- 1) Lmaxwhere Lmax denote the maximum packet
length. - Proof
- The slope of W (GPS) alternates between Nr (when
a busy period resumes) and 0 (idle, between two
consecutive busy periods). - The slope of W (MSFQ) is at most Nr at any
given time,
22(No Transcript)
23- Case 1 At most N - 1 MSFQ servers are busy at
t - Since MSFQ is work-conserving, if a server is
idle, we know that there is no packet waiting for
transmission. - In the worst case, all the k busy servers have
just started transmitting a packet of maximum
length (Lmax). - W(0, t) - W (0, t) k Lmax (a)
- where k N 1
24- Case 2 All MSFQ servers are busy at t
- Let to, t be the largest interval in which all
MSFQ servers are busy. - Since in to, t the slope of W is Nr , W(0,
t) - W(0, t) W(0, to) - W(0, to) (b)
25- If to 0, then W(0, t) W(0, t).Otherwise, if
to gt 0, we know from (a), W(0, to) - W(0, to)
(N - 1) Lmax (c) - From (b) and (c), we have
- W(0, t) - W (0, t) (N - 1) Lmax ?
- This theorem implies the need for a buffer space
of (N - 1) Lmax.
26- The discrepancy of packet departure times (i.e.
begin transmitting/servicing) between
multi-server and single-server - Let dp be the time at which packet p departs from
(GPS, 1,Nr) system. - MSFQ packets may not depart in increasing order
of dp.
27- Lemma 1 Packet k will be scheduled no later
than - where ak and bk be respectively the arrival
time and scheduling time of packet k over N
servers, each with a rate of r, P be the set of
packets scheduled before packet k since time ak,
including the packets in service at ak, Li be
the length of packet i.
28- Proof
- Given a load that must be scheduled before packet
k, a work conserving service discipline schedules
packet k latest, if the load is equally divided
among the N servers such that all of them finish
the work at the same time. ?
294. PACKET DELAY
- Theorem 2 For all packets p,
- where dp and dp be the time at which packet p
departs from the (MSFQ,N, r) and (GPS,1, Nr)
system, respectively. - Proof
- Skipped
305. SERVICE PER-FLOW
- Theorem 3 For any t ,
- Wi(0, t) - Wi (0, t) N Lmax
- Proof
- Skipped
316. FAIRNESS
- Example 3
- 4 servers
- 11 flows (fixed packet length)
- F1 Weight 0.5, 10 packets at t 0
- F2 F11 Weight 0.05, each with 1 packet at t
0
32- GPS Scheduled by WFQ (? finish time)
F1A 0 L / 0.5
F1B F1A L / 0.5 2L / 0.5
F2 0 L / 0.05
F3 0 L / 0.05
33- MSFQ Scheduled by WFQ (? finish time)
34- GPS Scheduled by WF2Q (eligible start time (HOL)
finish time)
Not Smooth?
?
35- The direct application of WF2Q technique to
multi-server systems does not fix the undesired
burstiness problem and moreover, it makes the
discipline non-workconserving.
Not eligible until the previous pkt is
scheduled? non-workconserving
366.1 MSF2Q
- (MSF2Q,N, r)
- A packet is outstanding if it is being
transmitted. - Let ôi(t) denote the number of outstanding flow i
packets at the MSF2Q system at time t. - Wi(t, t) the work completed for flow i under
MSF2Q over the interval t, t
37- At time t, when a server is idle and there is a
packet waiting for service, MSF2Q schedules among
the flows (eligible) that satisfy or
and - That would complete service in the GPS system
earliest
Example 3 F1 r1 0.5 F2F10 rx 0.05 r
1/4 0.25 ? ô1 ?0.5/0.25? 2 ôx ?0.05/0.25?
1
38- The output of MSF2Q in Example 3
Smooth scheduling
Example 3 F1 r1 0.5 F2F10 rx 0.05 r
1/4 0.25 ? ô1 ?0.5/0.25? 2 ôx ?0.05/0.25?
1
396.2 Properties of MSF2Q
- Theorem 4 Let Li,max denote the maximum packet
length of flow i. For any time t and flow i, the
following property holds (8) - Proof
- Skipped
407. APPLICATIONS
- Link Aggregation
- Logical grouping of several Ethernet network
interfaces to allow for cost-effective, load
balancing, better scalability, and
fault-tolerance. - IEEE 802.3ad
- Currently ranges from two to eight Fast/Gigabit
Ethernet ports in either servers or switching
elements.
41- Access of storage I/O
- To connect the RAID system to a host (e.g., Web
server) with multiple SCSI or Fiber Channels to
improve the I/O performance. - Load balancing, failover
428. RELATEDWORK
439. CONTRIBUTIONS AND FUTUREWORK
- Link aggregation, or the aggregation of multiple
interfaces into a single logical link, is
becoming the predominant approach for bandwidth
scaling. - Numerous fair queuing results previously obtained
for single server systems do not directly apply
to multi-server systems.
44- We first analyzed the cumulative service, packet
delay and per-flow cumulative service bounds for
Weighted Fair Queuing (WFQ) applied to a
multi-server system. - We then presented a new fair queuing algorithm -
MSF2Q that leads to smooth and fair schedules in
finer time scales.
45- Our future plans include
- Investigation of implementation issues
- Quantitative comparison of the approach presented
in this paper to the alternative approach of
partitioning flows among servers - Enhancing the algorithms for multiprocessors and
cluster of servers - Hierarchal GPS
- Servers with different rates
- Misordering of packets