Title: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees
1CBR Sharing DRAM with Minimum Latency and
Bandwidth Guarantees
- Zefu Dai, Mark Jarvin and Jianwen Zhu
University of Toronto
2Background
- Consumer Electronics is part of everyday life!
SoC
Mem Contr.
DRAM
3Background
- A portable media player SoC example
4Background
- A portable media player SoC example
5Background
- A portable media player SoC example
6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
6Background
- A portable media player SoC example
6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
1000x
7Background
Give me 10 KB in 1 us, please.
- A portable media player SoC example
6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
8Background
Give me 10 KB in 1 us, please.
- A portable media player SoC example
I want the data NOW!!!
6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
9Background
Give me 10 KB in 1 us, please.
- A portable media player SoC example
I want the data NOW!!!
6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
I can only supply a maximum of 6.4 GB every
second.
10Challenges
- Simultaneously satisfy
- Bandwidth requirements
- Latency requirements
11Previous Work
- QoS aware
- Bandwidth or latency is heuristically improved
- QoS guaranteed
- Guaranteed minimum bandwidth and / or latency
12Main Ideas
- Start with Bandwidth Guaranteed Prioritized
Queuing (BGPQ) algorithm - Bandwidth guarantee
- Improve it using Credit Borrow and Repay (CBR)
mechanism - Minimum latency guarantee
13Bandwidth Guaranteed Prioritized Queuing
- Combine both the benefits of the Priority Queuing
and Weighted Fair Queuing - Credit based Weighted Fair Queuing
- Prioritized service for residual bandwidth
allocation - Residual bandwidth
- The bandwidth assigned to one user that is unused
at a specific point of time
14BGPQ Algorithm
- Case 1 all queues are busy
- No residual bandwidth
- Act as WFQ
BGPQ Scheduler
Initial state everybody has a credit of zero.
0.0
0.0
0.0
0
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
15BGPQ Algorithm
- Case 1 all queues are busy
- No residual bandwidth
- Act as WFQ
BGPQ Scheduler
Step 1 calculate dynamic credit for each queue.
0.5
0.3
0.2
0
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
16BGPQ Algorithm
- Case 1 all queues are busy
- No residual bandwidth
- Act as WFQ
BGPQ Scheduler
Step 2 turn on switch box and transfer data from
granted queue.
0.5
0.3
0.2
0
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
17BGPQ Algorithm
- Case 1 all queues are busy
- No residual bandwidth
- Act as WFQ
BGPQ Scheduler
Step 3 subtract 1 from the credit of granted
queue.
0.3
0.2
0
-0.5
One Scheduling cycle is Done!! Sum of credits
0!
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
18BGPQ Algorithm
- Case 2 some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
BGPQ Scheduler
Before new scheduling cycle Q1 is empty.
0.3
0.2
0
-0.5
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
19BGPQ Algorithm
- Case 2 some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
BGPQ Scheduler
Step 1 Calculate a dynamic credit for each
queue. Credit of empty queue remain unchanged
0.6
0.0
0.2
0
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
20BGPQ Algorithm
- Case 2 some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
BGPQ Scheduler
Step 2 allocate residual bandwidth to non-empty
queue with highest priority.
0.6
0.2
0.2
0
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
21BGPQ Algorithm
- Case 2 some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
BGPQ Scheduler
Step 3 transfer data from granted queue.
0.6
0.2
0.2
0
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
22BGPQ Algorithm
- Case 2 some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
BGPQ Scheduler
Step 4 subtract 1 from the credit of granted
queue.
0.2
0.2
0
-0.4
Priority Q0gtQ1gtQ2
One Scheduling cycle is Done!! Sum of credits
0!
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
23BGPQ Advantages
- BGPQ WFQ PQ
- bandwidth guarantee
- prioritized access to residual bandwidth
- Low implementation cost
- 3 adders for credit calculation
- 1 comparator tree to find the highest dynamic
credit
24BGPQ Disadvantage
- Low latency, low bandwidth requirement class
- No minimum latency guarantee
- Minimum latency
- No need to wait for any request that has lower
priority
25Latency Problem of BGPQ
- Example
- Optimal Scheduling
26Credit Borrow and Repay Mechanism
- Borrow
- Allow low latency requirement class to borrow the
scheduling opportunity from other classes - Repay
- Return the credit later when convenient
27CBR Mechanism
- Case 3 Credit Borrow and Repay
- Maintain a debt queue for Q0 a borrowed ID FIFO
CBR Scheduler
0.7
0.0
0.3
Step 1 calculate dynamic credit, and allocate
the residual bandwidth
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
28CBR Mechanism
- Case 3 Credit Borrow and Repay
- Maintain a debt queue for Q0
CBR Scheduler
0.7
0.0
0.3
Step 2 re-assign the scheduling opportunity to
Q0. And record the borrowed ID.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
29CBR Mechanism
- Case 3 Credit Borrow and Repay
- Maintain a debt queue for Q0
CBR Scheduler
0.7
0.0
0.3
Step 3 transfer data
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
30CBR Mechanism
- Case 3 Credit Borrow
- Maintain a debt queue for Q0
CBR Scheduler
0.0
0.3
Step 4 subtract 1 from original scheduled queue.
0
-0.3
DebtQ
Priority Q0gtQ1gtQ2
One Scheduling cycle is Done!! Sum of credits
0!
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
31CBR Mechanism
- Case 4 Credit Repay
- It is time to repay the credit
CBR Scheduler
0.0
0.3
Initial state Q0 is empty but has debt. It will
appear to be non-empty
0
-0.3
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
32CBR Mechanism
- Case 4 Credit Repay
- It is time to repay the credit
CBR Scheduler
0.6
0.0
0.4
Step 1 calculate dynamic credits and allocate
the residual bandwidth.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
33CBR Mechanism
- Case 4 Credit Repay
- It is time to repay the credit
CBR Scheduler
0.6
0.0
0.4
Step 2 return the scheduling opportunity and
clear the DebtQ.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
34CBR Mechanism
- Case 4 Credit Repay
- It is time to repay the credit
CBR Scheduler
0.6
0.0
0.4
Step 3 transfer data.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
35CBR Mechanism
- Case 4 Credit Repay
- It is time to repay the credit
CBR Scheduler
0.0
0.4
Step 4 subtract 1 from scheduled queue.
0
-0.4
DebtQ
Priority Q0gtQ1gtQ2
One Scheduling cycle is Done!! Sum of credits
0!
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
36CBR Mechanism
- Minimum Latency Guarantee using CBR
- No need to wait for requests in other queues
- Worst case Q0 is not empty while DebtQ is full
- No minimum latency guarantee under such case
37Implementation in FPGA
- CBR MPMC top level diagram
- Instantiation-time configurable port number
- Run-time programmable priority and bandwidth
38Implementation in FPGA
Credit calculation circuit
Sorting Network and CBR
39Implementation Cost
- 8 port CBR-MPMC with 16-depth DebtQ
- Xilinx Virtex-5 XC5VLX50T
- Speedy DDR backend memory controller
40Evaluation
- Simulation Framework
- Cycle accurate C model of MPMC
- Simple close-page DDR memory model
- Trace capturing and converting method
41Evaluation
- CPU workload trace file (from B. Jacob)
- Cache simulation on standard SPEC2000 integer
benchmark
Irregular and low bandwidth requirement 0.4
memory transactions per 1k instructions.
42Evaluation
- Accelerator Workload
- ALPBench suite of parallel multimedia applications
43Evaluation
- Accelerator Workload
- ALPBench suite of parallel multimedia applications
Periodically repeated access pattern, high
bandwidth requirement 18.3 memory transactions
per 1k instructions.
44Results
- BGPQ Scheduler
- Latency number of clock cycles
- Bandwidth number of memory transaction per 1k
clock cycles
45Results
- CBR Scheduler with a 16-depth debtQ
46Impact of DebtQ Size
- Repay conditions
- DebtQ is full
- Q0 is empty
CBR Scheduler
0.6
0.0
0.4
When DebtQ is full, remaining requests in Q0 will
not be served with minimum latency guarantee!
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
47Impact of DebtQ Size
- How big is enough for DebtQ?
- Determined by instant time bandwidth requirement
- Irregular access pattern means
- Large range of DebtQ size requirement
- Tradeoff
- Resource efficiency VS performance
48Results
- Impact of debt queue size
49Conclusions
- CBR scheduler can provide minimum bandwidth and
latency guarantees - Low implementation cost, power consumption
- We expect its successful use in a wide range of
multimedia applications
50Questions?
CBR Scheduler
0.0
0.3
0
-0.3
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer