Experimental Evaluation of Load Balancers in Packet Processing Systems - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Experimental Evaluation of Load Balancers in Packet Processing Systems

Description:

Maps a flow to a processor. Advantage: Localization of per flow state to a processor. Disadvantages: ... mi = Number of memory access instructions. Performance Metric ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 31
Provided by: dbh1
Category:

less

Transcript and Presenter's Notes

Title: Experimental Evaluation of Load Balancers in Packet Processing Systems


1
Experimental Evaluation of Load Balancers in
Packet Processing Systems
  • Taylor L. Riché, Jayaram Mudigonda, and Harrick
    M. Vin

2
Background
  • Packet processing systems must support
  • High-bandwidth links
  • Increasingly complex applications
  • Implication
  • Processing time dominated by memory access
    latency
  • Packet Processing time gt Packet Inter-arrival
    time (IAT)
  • Utilize multi-core parallel architectures
  • Load balancers Key building block
  • Scale the throughput of a system with parallel
    resources

3
Flow-level Load Balancer
  • Maps a flow to a processor
  • Advantage
  • Localization of per flow state to a processor
  • Disadvantages
  • Coarse-grain load balancing ) Limits concurrency
  • Non-uniformity in flow characteristics ) Load
    imbalance

Local Memory
Processor
Global Memory
Load Balancer
Local Memory
Processor
Local Memory
Processor
4
Packet-level Load Balancer
  • Independently maps each packet
  • Advantage
  • Fine-grain load balancing
  • Disadvantage
  • Higher overhead for accessing per-flow state
  • Lock overhead for ensuring mutual exclusion
  • Higher latency to access shared memory levels

Local Memory
Processor
Global Memory
Load Balancer
Local Memory
Processor
Local Memory
Processor
5
How Does One Select a Load Balancer?
  • Relative performance depends upon
  • System characteristics
  • Memory access latency
  • Application characteristics
  • Application length
  • Length of critical code segment
  • Flow definition
  • Traffic characteristics
  • Inter-arrival time of packets
  • Arrival rate of flows
  • Holding time for flows
  • Size of flows

6
Outline
  • Introduction
  • Methodology
  • Simulation Model
  • Performance Metric
  • Experimental Evaluation
  • Setup
  • Results
  • Concluding Remarks

7
Simulation Model
  • System Model
  • Homogeneous multi-processors
  • Memory system
  • Local memory Single-cycle accesses
  • Shared global memory Various access latencies
  • Application Model
  • A A1, A2, A3, , An
  • Ai non-critical segment Aj critical segment
  • Ai (ci, mi) where
  • ci Number of computation instructions
  • mi Number of memory access instructions

8
Performance Metric
  • Processor provisioning to meet trace throughput
  • Depends on
  • Processor capacity Ci(n)
  • Number of packets processed within average IAT
  • Per processor offered load Oi(n)
  • Number of packets that arrive at a processor
    within average IAT
  • Formal definition
  • Pscheme min n 8 i lt n. Oi(n) lt Ci(n)
  • Performance metric
  • Processor provisioning ratios Pflow/Ppkt

9
Outline
  • Introduction
  • Methodology
  • Simulation Model
  • Performance Metric
  • Experimental Evaluation
  • Setup
  • Results
  • Concluding Remarks

10
Experimental Setup
  • System
  • Local memory access 1 cycle
  • Shared memory access latencies 50, 100, 200
    cycles
  • Application
  • A A1, A2, A3
  • Computation to memory access ratio selected using
    benchmarks
  • Effective Critical Segment (ECS) size (c2 M
    m2)
  • Traces UNC (edge), MRA (core)

11
Experimental Results Preview
  • Per processor offered load and capacity
  • How do these quantities change with
  • Number of processors
  • Application length
  • ECS
  • Present guidelines for load balancer selection
  • System Properties
  • Trace Properties
  • Application Properties

12
Processor Capacity (UNC)
For the packet-level scheme, large ECS ) small
processor capacity
For the flow-level scheme, packet processing time
is independent of the number of processors )
processor capacity is constant
For the packet-level scheme, lock overhead
increases with number of processors (N) )
processor capacity decreases with N
13
Lock Delay vs. Number of Processors
Lock delay increases with number of processors
and ECS
14
Per Processor Offered Load (UNC)
Non-uniformity in flows ) per-processor offered
load reduces sub-linearly
Per-processor offered load reduces linearly with
number of processors
15
Determining Processor Provisioning
1
FL Capacity
FL Load
PL ECS.140IAT
PL Load
PL ECS1.14IAT
PL ECS2.28IAT
PL ECS4.30IAT
PL ECS8.59IAT
Pkts. per Avg. IAT
0.1
0.01
1
2
4
8
16
32
Number of Processors
16
Provisioning Ratio for UNC Trace
PC .5 Pkts/IAT
With lower processor capacities, the packet-level
scheme will out perform the flow-level scheme in
more cases.
17
Provisioning Ratio for MRA Trace
For MRA, the cross-over ECS value is lower than
that of the UNC trace for the same processor
capacity
18
Flow Characteristics UNC vs. MRA
  • Key characteristics
  • Per-flow packet inter-arrival time distribution
  • Measure of non-uniformity in flow characteristics
  • Observation
  • MRA flows are more uniform than UNC flows
  • Flow-level scheme does better for MRA than UNC

19
Provisioning Ratio for Flow Types
Coarser flows increase lock delay (affects
packet-level), but also creates fatter flows
(affects flow-level), but changes are small )
cross-over point does not change much!
20
Concluding Remarks
  • Load balancer a key building block
  • Key question
  • How to select between a packet-level and
    flow-level load balancer?
  • Answer depends on system, application, and trace
    characteristics

20
UNC
MRA
18
16
14
12
Crossover ECS (Multiple of IAT)
10
Flow-Level
8
6
4
Packet-Level
2
0
0
0.05
0.1
0.15
0.2
0.25
0.3
Processor Capacity (Packets per Avg. IAT)
21
Thank you!
  • For more information on this work
  • http//www.cs.utexas.edu/users/riche/
  • For more information on Shangri-La
  • http//www.cs.utexas.edu/users/vin/research/shangr
    ila.shtml
  • Questions?

22
Backup Slides
23
Hypothetical Load Balancer
  • Balances load on a packet per packet basis.
  • Is not hindered by
  • Global memory access latency
  • Synchronization costs

24
Ratios for Hypothetical System
25
Conclusion 7 (Packet Level)
  • Small capacities
  • Proc. time dominated by non-critical segments
  • Performs very similar to hypothetical system
  • Large capacities
  • Protected code a greater portion
  • Increase on processing time is higher

26
Conclusion 7 (Flow Level)
  • Small Capacities
  • Can only service a small number of flows
  • Non-uniformity results in large provisioning
  • Large Capacities
  • Load-balancer better opp. to even out imbalance

27
Simulator Design
  • Event-driven model
  • Implemented in C and driven by TCL scripts
  • Four main components
  • Packet reader
  • Load distributor
  • A set of processors
  • Lock manager

28
System Performance
  • Performance is based on many parameters
  • System characteristics
  • Memory access latency Global vs. Local
  • Application characteristics
  • Application length
  • Length of critical code segment
  • Flow definition
  • Workload characteristics
  • Inter-arrival time of packets
  • Arrival rate of flows
  • Holding time for flows
  • Size of flows

29
Average Packets/Flow in a Window
Small number of packets/flow arrive in ECS
30
Experimental Results
  • Determine which factors effect Pflow/Ppkt
  • I.e. where does one scheme outperform another?
  • Packet-level capacity is reduced by
  • Lock delay
  • Memory latency
  • Flow-level load does not scale
  • The non-uniformity of flows
  • Trace properties directly affect ratio
  • Show the effect of the flow definition
Write a Comment
User Comments (0)
About PowerShow.com