CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees - PowerPoint PPT Presentation

About This Presentation
Title:

CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

Description:

CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees Zefu Dai, Mark Jarvin and Jianwen Zhu University of Toronto – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 51
Provided by: zdai
Category:

less

Transcript and Presenter's Notes

Title: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees


1
CBR Sharing DRAM with Minimum Latency and
Bandwidth Guarantees
  • Zefu Dai, Mark Jarvin and Jianwen Zhu

University of Toronto
2
Background
  • Consumer Electronics is part of everyday life!

SoC
Mem Contr.
DRAM
3
Background
  • A portable media player SoC example

4
Background
  • A portable media player SoC example

5
Background
  • A portable media player SoC example

6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
6
Background
  • A portable media player SoC example

6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
1000x
7
Background
Give me 10 KB in 1 us, please.
  • A portable media player SoC example

6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
8
Background
Give me 10 KB in 1 us, please.
  • A portable media player SoC example

I want the data NOW!!!
6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
9
Background
Give me 10 KB in 1 us, please.
  • A portable media player SoC example

I want the data NOW!!!
6.4
9.6
1.2
164.8
0.09
31.0
156.7
94
MB/s
I can only supply a maximum of 6.4 GB every
second.
10
Challenges
  • Simultaneously satisfy
  • Bandwidth requirements
  • Latency requirements

11
Previous Work
  • QoS aware
  • Bandwidth or latency is heuristically improved
  • QoS guaranteed
  • Guaranteed minimum bandwidth and / or latency

12
Main Ideas
  • Start with Bandwidth Guaranteed Prioritized
    Queuing (BGPQ) algorithm
  • Bandwidth guarantee
  • Improve it using Credit Borrow and Repay (CBR)
    mechanism
  • Minimum latency guarantee

13
Bandwidth Guaranteed Prioritized Queuing
  • Combine both the benefits of the Priority Queuing
    and Weighted Fair Queuing
  • Credit based Weighted Fair Queuing
  • Prioritized service for residual bandwidth
    allocation
  • Residual bandwidth
  • The bandwidth assigned to one user that is unused
    at a specific point of time

14
BGPQ Algorithm
  • Case 1 all queues are busy
  • No residual bandwidth
  • Act as WFQ

BGPQ Scheduler
Initial state everybody has a credit of zero.
0.0
0.0
0.0
0
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
15
BGPQ Algorithm
  • Case 1 all queues are busy
  • No residual bandwidth
  • Act as WFQ

BGPQ Scheduler
Step 1 calculate dynamic credit for each queue.
0.5
0.3
0.2
0
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
16
BGPQ Algorithm
  • Case 1 all queues are busy
  • No residual bandwidth
  • Act as WFQ

BGPQ Scheduler
Step 2 turn on switch box and transfer data from
granted queue.
0.5
0.3
0.2
0
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
17
BGPQ Algorithm
  • Case 1 all queues are busy
  • No residual bandwidth
  • Act as WFQ

BGPQ Scheduler
Step 3 subtract 1 from the credit of granted
queue.
0.3
0.2
0
-0.5
One Scheduling cycle is Done!! Sum of credits
0!
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
18
BGPQ Algorithm
  • Case 2 some queues are empty
  • Has residual bandwidth
  • Prioritized service on residual bandwidth

BGPQ Scheduler
Before new scheduling cycle Q1 is empty.
0.3
0.2
0
-0.5
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
19
BGPQ Algorithm
  • Case 2 some queues are empty
  • Has residual bandwidth
  • Prioritized service on residual bandwidth

BGPQ Scheduler
Step 1 Calculate a dynamic credit for each
queue. Credit of empty queue remain unchanged
0.6
0.0
0.2
0
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
20
BGPQ Algorithm
  • Case 2 some queues are empty
  • Has residual bandwidth
  • Prioritized service on residual bandwidth

BGPQ Scheduler
Step 2 allocate residual bandwidth to non-empty
queue with highest priority.
0.6
0.2
0.2
0
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
21
BGPQ Algorithm
  • Case 2 some queues are empty
  • Has residual bandwidth
  • Prioritized service on residual bandwidth

BGPQ Scheduler
Step 3 transfer data from granted queue.
0.6
0.2
0.2
0
Priority Q0gtQ1gtQ2
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
22
BGPQ Algorithm
  • Case 2 some queues are empty
  • Has residual bandwidth
  • Prioritized service on residual bandwidth

BGPQ Scheduler
Step 4 subtract 1 from the credit of granted
queue.
0.2
0.2
0
-0.4
Priority Q0gtQ1gtQ2
One Scheduling cycle is Done!! Sum of credits
0!
Q0
50
Shared Resource
Q1
20
Q2
30
Multiplexer
23
BGPQ Advantages
  • BGPQ WFQ PQ
  • bandwidth guarantee
  • prioritized access to residual bandwidth
  • Low implementation cost
  • 3 adders for credit calculation
  • 1 comparator tree to find the highest dynamic
    credit

24
BGPQ Disadvantage
  • Low latency, low bandwidth requirement class
  • No minimum latency guarantee
  • Minimum latency
  • No need to wait for any request that has lower
    priority

25
Latency Problem of BGPQ
  • Example
  • Optimal Scheduling

26
Credit Borrow and Repay Mechanism
  • Borrow
  • Allow low latency requirement class to borrow the
    scheduling opportunity from other classes
  • Repay
  • Return the credit later when convenient

27
CBR Mechanism
  • Case 3 Credit Borrow and Repay
  • Maintain a debt queue for Q0 a borrowed ID FIFO

CBR Scheduler
0.7
0.0
0.3
Step 1 calculate dynamic credit, and allocate
the residual bandwidth
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
28
CBR Mechanism
  • Case 3 Credit Borrow and Repay
  • Maintain a debt queue for Q0

CBR Scheduler
0.7
0.0
0.3
Step 2 re-assign the scheduling opportunity to
Q0. And record the borrowed ID.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
29
CBR Mechanism
  • Case 3 Credit Borrow and Repay
  • Maintain a debt queue for Q0

CBR Scheduler
0.7
0.0
0.3
Step 3 transfer data
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
30
CBR Mechanism
  • Case 3 Credit Borrow
  • Maintain a debt queue for Q0

CBR Scheduler
0.0
0.3
Step 4 subtract 1 from original scheduled queue.
0
-0.3
DebtQ
Priority Q0gtQ1gtQ2
One Scheduling cycle is Done!! Sum of credits
0!
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
31
CBR Mechanism
  • Case 4 Credit Repay
  • It is time to repay the credit

CBR Scheduler
0.0
0.3
Initial state Q0 is empty but has debt. It will
appear to be non-empty
0
-0.3
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
32
CBR Mechanism
  • Case 4 Credit Repay
  • It is time to repay the credit

CBR Scheduler
0.6
0.0
0.4
Step 1 calculate dynamic credits and allocate
the residual bandwidth.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
33
CBR Mechanism
  • Case 4 Credit Repay
  • It is time to repay the credit

CBR Scheduler
0.6
0.0
0.4
Step 2 return the scheduling opportunity and
clear the DebtQ.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
34
CBR Mechanism
  • Case 4 Credit Repay
  • It is time to repay the credit

CBR Scheduler
0.6
0.0
0.4
Step 3 transfer data.
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
35
CBR Mechanism
  • Case 4 Credit Repay
  • It is time to repay the credit

CBR Scheduler
0.0
0.4
Step 4 subtract 1 from scheduled queue.
0
-0.4
DebtQ
Priority Q0gtQ1gtQ2
One Scheduling cycle is Done!! Sum of credits
0!
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
36
CBR Mechanism
  • Minimum Latency Guarantee using CBR
  • No need to wait for requests in other queues
  • Worst case Q0 is not empty while DebtQ is full
  • No minimum latency guarantee under such case

37
Implementation in FPGA
  • CBR MPMC top level diagram
  • Instantiation-time configurable port number
  • Run-time programmable priority and bandwidth

38
Implementation in FPGA
Credit calculation circuit
Sorting Network and CBR
39
Implementation Cost
  • 8 port CBR-MPMC with 16-depth DebtQ
  • Xilinx Virtex-5 XC5VLX50T
  • Speedy DDR backend memory controller

40
Evaluation
  • Simulation Framework
  • Cycle accurate C model of MPMC
  • Simple close-page DDR memory model
  • Trace capturing and converting method

41
Evaluation
  • CPU workload trace file (from B. Jacob)
  • Cache simulation on standard SPEC2000 integer
    benchmark

Irregular and low bandwidth requirement 0.4
memory transactions per 1k instructions.
42
Evaluation
  • Accelerator Workload
  • ALPBench suite of parallel multimedia applications

43
Evaluation
  • Accelerator Workload
  • ALPBench suite of parallel multimedia applications

Periodically repeated access pattern, high
bandwidth requirement 18.3 memory transactions
per 1k instructions.
44
Results
  • BGPQ Scheduler
  • Latency number of clock cycles
  • Bandwidth number of memory transaction per 1k
    clock cycles

45
Results
  • CBR Scheduler with a 16-depth debtQ

46
Impact of DebtQ Size
  • Repay conditions
  • DebtQ is full
  • Q0 is empty

CBR Scheduler
0.6
0.0
0.4
When DebtQ is full, remaining requests in Q0 will
not be served with minimum latency guarantee!
0
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
47
Impact of DebtQ Size
  • How big is enough for DebtQ?
  • Determined by instant time bandwidth requirement
  • Irregular access pattern means
  • Large range of DebtQ size requirement
  • Tradeoff
  • Resource efficiency VS performance

48
Results
  • Impact of debt queue size

49
Conclusions
  • CBR scheduler can provide minimum bandwidth and
    latency guarantees
  • Low implementation cost, power consumption
  • We expect its successful use in a wide range of
    multimedia applications

50
Questions?
CBR Scheduler
0.0
0.3
0
-0.3
DebtQ
Priority Q0gtQ1gtQ2
Q0
10
Shared Resource
Q1
20
Q2
70
Multiplexer
Write a Comment
User Comments (0)
About PowerShow.com