Adaptive History-Based Memory Schedulers - PowerPoint PPT Presentation

About This Presentation

Title:

Adaptive History-Based Memory Schedulers

Description:

Memory system performance is not increasing as fast as CPU performance ... previous approaches with data intensive applications: Stream, NAS, and microbenchmarks ... – PowerPoint PPT presentation

Number of Views:15

Avg rating:3.0/5.0

Slides: 30

Provided by: IBMU537

Learn more at: https://microarch.org

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive History-Based Memory Schedulers

1
Adaptive History-Based Memory Schedulers

Ibrahim Hur and Calvin Lin
IBM Austin
The University of Texas at Austin

2
Memory Bottleneck

Memory system performance is not increasing as
fast as CPU performance
Latency Use caches, prefetching,
Bandwidth Use parallelism inside memory system

3
How to Increase Memory Command Parallelism?

Similar to instruction scheduling,
can reorder commands for higher bandwidth

time
4
Inside the Memory System
not FIFO
Read Queue
FIFO
Memory Queue
DRAM
arbiter
caches
Write Queue
Memory Controller
not FIFO
the arbiter schedules memory operations
5
Our Work

Study memory command scheduling
in the context of the IBM Power5
Present new memory arbiters
20 increased bandwidth
Very little cost 0.04 increase in chip area

6
Outline

The Problem
Characteristics of DRAM
Previous Scheduling Methods
Our approach
History-based schedulers
Adaptive history-based schedulers
Results
Conclusions

7
Understanding the ProblemCharacteristics of DRAM

Multi-dimensional structure
Banks, rows, and columns
IBM Power5 ranks and ports as well
Access time is not uniform
Bank-to-Bank conflicts
Read after Write to the same rank conflict
Write after Read to different port conflict

8
Previous Scheduling Approaches FIFO Scheduling
caches
DRAM
Read Queue
caches
arbiter
Memory Queue (FIFO)
Write Queue
9
Memoryless Scheduling
Adapted from Rixner et al, ISCA2000
caches
DRAM
Read Queue
caches
arbiter
Memory Queue (FIFO)
Write Queue
long delay
10
What we really want

Keep the pipeline full dont hold commands in
the reorder queues until conflicts are totally
resolved
Forward them to memory queue in an order to
minimize future conflicts

To do this we need to know history of the
commands

memory queue
Read/Write Queues
arbiter
11
Another Goal Match Applications Memory Command
Behavior

Arbiter should select commands from queues
roughly in the ratio in which the application
generates them
Otherwise, read or write queue may be congested
Command history is useful here too

12
Our Approach History-Based Memory Schedulers

Benefits
Minimize contention costs
Consider multiple constraints
Match applications memory access behavior
2 Reads per Write?
1 Read per Write?
The Result less congested memory system, i.e.
more bandwidth

13
How does it work?

Use a Finite State Machine (FSM)
Each state in the FSM represents one possible
history
Transitions out of a state are prioritized
At any state, scheduler selects the available
command with the highest priority
FSM is generated at design time

14
An Example
available commands in reorder queues
next state
First Preference
current state
Second Preference
Third Preference
Fourth Preference
most appropriate command to memory
15
How to determine priorities?

Two criteria
A Minimize contention costs
B Satisfy programs Read/Write command mix
First Method Use A, break ties with B
Second Method Use B, break ties with A
Which method to use?
Combine two methods probabilistically
(details in the paper)

16
Limitation of the History-Based Approach

Designed for one particular mix of Read/Writes
Solution Adaptive History-Based Schedulers
Create multiple state machines one for each
Read/Write mix
Periodically select most appropriate state
machine

17
Adaptive History-Based Schedulers
2R1W 1R1W 1R2W
18
Evaluation

Used a cycle accurate simulator for the IBM
Power5
1.6 GHz, 266-DDR2, 4-rank, 4-bank, 2-port
Evaluated and compared our approach with previous
approaches with data intensive applications
Stream, NAS, and microbenchmarks

19
The IBM Power5

2 cores on a chip
SMT capability
Large on-chip L2 cache
Hardware prefetching
276 million transistors

Memory Controller
(1.6 of chip area)
20
Results 1 Stream Benchmarks
21
Results 2 NAS Benchmarks
(1 core active)
22
Results 3 Microbenchmarks
23

12 concurrent commands
caches
DRAM
Read Queue
caches
arbiter
Memory Queue (FIFO)
Write Queue
24
DRAM Utilization
Memoryless Approach
Our Approach
Number of Active Commands in DRAM
25
Why does it work?
detailed analysis in the paper
Read Queue
Memory Queue
DRAM
arbiter
caches
Write Queue
Memory Controller
Low Occupancy in Reorder Queues
Full Memory Queue
Busy Memory System
Full Reorder Queues
26
Other Results