Experimental Evaluation of Load Balancers in Packet Processing Systems - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Experimental Evaluation of Load Balancers in Packet Processing Systems

Description:

Maps a flow to a processor. Advantage: Localization of per flow state to a processor. Disadvantages: ... mi = Number of memory access instructions. Performance Metric ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 31

Provided by: dbh1

Category:

more less

Transcript and Presenter's Notes

Title: Experimental Evaluation of Load Balancers in Packet Processing Systems

1
Experimental Evaluation of Load Balancers in
Packet Processing Systems

Taylor L. Riché, Jayaram Mudigonda, and Harrick
M. Vin

2
Background

Packet processing systems must support
High-bandwidth links
Increasingly complex applications
Implication
Processing time dominated by memory access
latency
Packet Processing time gt Packet Inter-arrival
time (IAT)
Utilize multi-core parallel architectures
Load balancers Key building block
Scale the throughput of a system with parallel
resources

3
Flow-level Load Balancer

Maps a flow to a processor
Advantage
Localization of per flow state to a processor
Disadvantages
Coarse-grain load balancing ) Limits concurrency
Non-uniformity in flow characteristics ) Load
imbalance

Local Memory
Processor
Global Memory
Load Balancer
Local Memory
Processor
Local Memory
Processor
4
Packet-level Load Balancer

Independently maps each packet
Advantage
Fine-grain load balancing
Disadvantage
Higher overhead for accessing per-flow state
Lock overhead for ensuring mutual exclusion
Higher latency to access shared memory levels

Local Memory
Processor
Global Memory
Load Balancer
Local Memory
Processor
Local Memory
Processor
5
How Does One Select a Load Balancer?

Relative performance depends upon
System characteristics
Memory access latency
Application characteristics
Application length
Length of critical code segment
Flow definition
Traffic characteristics
Inter-arrival time of packets
Arrival rate of flows
Holding time for flows
Size of flows

6
Outline

Introduction
Methodology
Simulation Model
Performance Metric
Experimental Evaluation
Setup
Results
Concluding Remarks

7
Simulation Model

System Model
Homogeneous multi-processors
Memory system
Local memory Single-cycle accesses
Shared global memory Various access latencies
Application Model
A A1, A2, A3, , An
Ai non-critical segment Aj critical segment
Ai (ci, mi) where
ci Number of computation instructions
mi Number of memory access instructions

8
Performance Metric

Processor provisioning to meet trace throughput
Depends on
Processor capacity Ci(n)
Number of packets processed within average IAT
Per processor offered load Oi(n)
Number of packets that arrive at a processor
within average IAT
Formal definition
Pscheme min n 8 i lt n. Oi(n) lt Ci(n)
Performance metric
Processor provisioning ratios Pflow/Ppkt

9
Outline

Introduction
Methodology
Simulation Model
Performance Metric
Experimental Evaluation
Setup
Results
Concluding Remarks

10
Experimental Setup

System
Local memory access 1 cycle
Shared memory access latencies 50, 100, 200
cycles
Application
A A1, A2, A3
Computation to memory access ratio selected using
benchmarks
Effective Critical Segment (ECS) size (c2 M
m2)
Traces UNC (edge), MRA (core)

11
Experimental Results Preview

Per processor offered load and capacity
How do these quantities change with
Number of processors
Application length
ECS
Present guidelines for load balancer selection
System Properties
Trace Properties
Application Properties

12
Processor Capacity (UNC)
For the packet-level scheme, large ECS ) small
processor capacity
For the flow-level scheme, packet processing time
is independent of the number of processors )
processor capacity is constant
For the packet-level scheme, lock overhead
increases with number of processors (N) )
processor capacity decreases with N
13
Lock Delay vs. Number of Processors
Lock delay increases with number of processors
and ECS
14
Per Processor Offered Load (UNC)
Non-uniformity in flows ) per-processor offered
load reduces sub-linearly
Per-processor offered load reduces linearly with
number of processors
15
Determining Processor Provisioning
1
FL Capacity
FL Load
PL ECS.140IAT
PL Load
PL ECS1.14IAT
PL ECS2.28IAT
PL ECS4.30IAT
PL ECS8.59IAT
Pkts. per Avg. IAT
0.1
0.01
1
2
4
8
16
32
Number of Processors
16
Provisioning Ratio for UNC Trace
PC .5 Pkts/IAT
With lower processor capacities, the packet-level
scheme will out perform the flow-level scheme in
more cases.
17
Provisioning Ratio for MRA Trace
For MRA, the cross-over ECS value is lower than
that of the UNC trace for the same processor
capacity
18
Flow Characteristics UNC vs. MRA

Key characteristics
Per-flow packet inter-arrival time distribution
Measure of non-uniformity in flow characteristics
Observation
MRA flows are more uniform than UNC flows
Flow-level scheme does better for MRA than UNC

19
Provisioning Ratio for Flow Types
Coarser flows increase lock delay (affects
packet-level), but also creates fatter flows
(affects flow-level), but changes are small )
cross-over point does not change much!
20
Concluding Remarks

Load balancer a key building block
Key question
How to select between a packet-level and
flow-level load balancer?
Answer depends on system, application, and trace
characteristics

20
UNC
MRA
18
16
14
12
Crossover ECS (Multiple of IAT)
10
Flow-Level
8
6
4
Packet-Level
2
0
0
0.05
0.1
0.15
0.2
0.25
0.3
Processor Capacity (Packets per Avg. IAT)
21
Thank you!