CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees - PowerPoint PPT Presentation

About This Presentation

Title:

CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

Description:

CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees Zefu Dai, Mark Jarvin and Jianwen Zhu University of Toronto – PowerPoint PPT presentation

Number of Views:99

Avg rating:3.0/5.0

Slides: 51

Provided by: zdai

Category:

more less

Transcript and Presenter's Notes

Title: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

1
CBR Sharing DRAM with Minimum Latency and
Bandwidth Guarantees

Zefu Dai, Mark Jarvin and Jianwen Zhu

University of Toronto
2
Background

Consumer Electronics is part of everyday life!

SoC
Mem Contr.
DRAM
3
Background

A portable media player SoC example

4
Background

A portable media player SoC example

5
Background

A portable media player SoC example

6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
6
Background

A portable media player SoC example

6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
1000x
7
Background
Give me 10 KB in 1 us, please.

A portable media player SoC example

6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
8
Background
Give me 10 KB in 1 us, please.

A portable media player SoC example

I want the data NOW!!!
6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
9
Background
Give me 10 KB in 1 us, please.

A portable media player SoC example

I want the data NOW!!!
6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
I can only supply a maximum of 6.4 GB every
second.
10
Challenges

Simultaneously satisfy
Bandwidth requirements
Latency requirements

11
Previous Work

QoS aware
Bandwidth or latency is heuristically improved
QoS guaranteed
Guaranteed minimum bandwidth and / or latency

12
Main Ideas

Start with Bandwidth Guaranteed Prioritized
Queuing (BGPQ) algorithm
Bandwidth guarantee
Improve it using Credit Borrow and Repay (CBR)
mechanism
Minimum latency guarantee

13
Bandwidth Guaranteed Prioritized Queuing

Combine both the benefits of the Priority Queuing
and Weighted Fair Queuing
Credit based Weighted Fair Queuing
Prioritized service for residual bandwidth
allocation
Residual bandwidth
The bandwidth assigned to one user that is unused
at a specific point of time

14
BGPQ Algorithm

Case 1 all queues are busy
No residual bandwidth
Act as WFQ

BGPQ Scheduler
Initial state everybody has a credit of zero.
0.0
0.0
0.0
0
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
15
BGPQ Algorithm

Case 1 all queues are busy
No residual bandwidth
Act as WFQ

BGPQ Scheduler
Step 1 calculate dynamic credit for each queue.
0.5
0.3
0.2
0
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
16
BGPQ Algorithm

Case 1 all queues are busy
No residual bandwidth
Act as WFQ

BGPQ Scheduler
Step 2 turn on switch box and transfer data from
granted queue.
0.5
0.3
0.2
0
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
17
BGPQ Algorithm

Case 1 all queues are busy
No residual bandwidth
Act as WFQ

BGPQ Scheduler
Step 3 subtract 1 from the credit of granted
queue.
0.3
0.2
0
-0.5
One Scheduling cycle is Done!! Sum of credits
0!
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
18
BGPQ Algorithm

Case 2 some queues are empty
Has residual bandwidth
Prioritized service on residual bandwidth

BGPQ Scheduler
Before new scheduling cycle Q1 is empty.
0.3
0.2
0
-0.5
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
19
BGPQ Algorithm

Case 2 some queues are empty
Has residual bandwidth
Prioritized service on residual bandwidth

BGPQ Scheduler
Step 1 Calculate a dynamic credit for each
queue. Credit of empty queue remain unchanged
0.6
0.0
0.2
0
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
20
BGPQ Algorithm

Case 2 some queues are empty
Has residual bandwidth
Prioritized service on residual bandwidth

BGPQ Scheduler
Step 2 allocate residual bandwidth to non-empty
queue with highest priority.
0.6
0.2
0.2
0
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
21
BGPQ Algorithm

Case 2 some queues are empty
Has residual bandwidth
Prioritized service on residual bandwidth

BGPQ Scheduler
Step 3 transfer data from granted queue.
0.6
0.2
0.2
0
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
22
BGPQ Algorithm

Case 2 some queues are empty
Has residual bandwidth
Prioritized service on residual bandwidth

BGPQ Scheduler
Step 4 subtract 1 from the credit of granted
queue.
0.2
0.2
0
-0.4
Priority Q0gtQ1gtQ2
One Scheduling cycle is Done!! Sum of credits
0!
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
23
BGPQ Advantages

BGPQ WFQ PQ
bandwidth guarantee
prioritized access to residual bandwidth
Low implementation cost
3 adders for credit calculation
1 comparator tree to find the highest dynamic
credit

24
BGPQ Disadvantage

Low latency, low bandwidth requirement class
No minimum latency guarantee
Minimum latency
No need to wait for any request that has lower
priority

25
Latency Problem of BGPQ

Example
Optimal Scheduling

26
Credit Borrow and Repay Mechanism

Borrow
Allow low latency requirement class to borrow the
scheduling opportunity from other classes
Repay
Return the credit later when convenient

27
CBR Mechanism

Case 3 Credit Borrow and Repay
Maintain a debt queue for Q0 a borrowed ID FIFO

CBR Scheduler
0.7
0.0
0.3
Step 1 calculate dynamic credit, and allocate
the residual bandwidth
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
28
CBR Mechanism

Case 3 Credit Borrow and Repay
Maintain a debt queue for Q0

CBR Scheduler
0.7
0.0
0.3
Step 2 re-assign the scheduling opportunity to
Q0. And record the borrowed ID.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
29
CBR Mechanism

Case 3 Credit Borrow and Repay
Maintain a debt queue for Q0

CBR Scheduler
0.7
0.0
0.3
Step 3 transfer data
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
30
CBR Mechanism

Case 3 Credit Borrow
Maintain a debt queue for Q0

CBR Scheduler
0.0
0.3
Step 4 subtract 1 from original scheduled queue.
0
-0.3
DebtQ
Priority Q0gtQ1gtQ2
One Scheduling cycle is Done!! Sum of credits
0!
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
31
CBR Mechanism

Case 4 Credit Repay
It is time to repay the credit

CBR Scheduler
0.0
0.3
Initial state Q0 is empty but has debt. It will
appear to be non-empty
0
-0.3
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
32
CBR Mechanism

Case 4 Credit Repay
It is time to repay the credit

CBR Scheduler
0.6
0.0
0.4
Step 1 calculate dynamic credits and allocate
the residual bandwidth.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
33
CBR Mechanism

Case 4 Credit Repay
It is time to repay the credit

CBR Scheduler
0.6
0.0
0.4
Step 2 return the scheduling opportunity and
clear the DebtQ.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
34
CBR Mechanism

Case 4 Credit Repay
It is time to repay the credit

CBR Scheduler
0.6
0.0
0.4
Step 3 transfer data.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
35
CBR Mechanism

Case 4 Credit Repay
It is time to repay the credit

CBR Scheduler
0.0
0.4
Step 4 subtract 1 from scheduled queue.
0
-0.4
DebtQ
Priority Q0gtQ1gtQ2
One Scheduling cycle is Done!! Sum of credits
0!
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
36
CBR Mechanism

Minimum Latency Guarantee using CBR
No need to wait for requests in other queues
Worst case Q0 is not empty while DebtQ is full
No minimum latency guarantee under such case

37
Implementation in FPGA

CBR MPMC top level diagram
Instantiation-time configurable port number
Run-time programmable priority and bandwidth

38
Implementation in FPGA
Credit calculation circuit
Sorting Network and CBR
39
Implementation Cost

8 port CBR-MPMC with 16-depth DebtQ
Xilinx Virtex-5 XC5VLX50T
Speedy DDR backend memory controller

40
Evaluation

Simulation Framework
Cycle accurate C model of MPMC
Simple close-page DDR memory model
Trace capturing and converting method

41
Evaluation

CPU workload trace file (from B. Jacob)
Cache simulation on standard SPEC2000 integer
benchmark

Irregular and low bandwidth requirement 0.4
memory transactions per 1k instructions.
42
Evaluation

Accelerator Workload
ALPBench suite of parallel multimedia applications

43
Evaluation

Accelerator Workload
ALPBench suite of parallel multimedia applications

Periodically repeated access pattern, high
bandwidth requirement 18.3 memory transactions
per 1k instructions.
44
Results

BGPQ Scheduler
Latency number of clock cycles
Bandwidth number of memory transaction per 1k
clock cycles

45
Results

CBR Scheduler with a 16-depth debtQ

46
Impact of DebtQ Size

Repay conditions
DebtQ is full
Q0 is empty

CBR Scheduler
0.6
0.0
0.4
When DebtQ is full, remaining requests in Q0 will
not be served with minimum latency guarantee!
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
47
Impact of DebtQ Size