Designing Efficient Memory for Future Computing Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Designing Efficient Memory for Future Computing Systems

Description:

Title: PowerPoint Presentation Author: Udipi, Aniruddha Last modified by: Aniruddha Udipi Document presentation format: On-screen Show (4:3) Company – PowerPoint PPT presentation

Number of Views:180

Avg rating:3.0/5.0

Slides: 50

Provided by: UdipiAn

Learn more at: https://users.cs.utah.edu

Category:

more less

Transcript and Presenter's Notes

Title: Designing Efficient Memory for Future Computing Systems

1
Designing Efficient Memory for Future Computing
Systems
Aniruddha N. Udipi University of Utah
Ph.D. Dissertation Defense, March 7, 2012
Advisor Rajeev Balasubramonian
www.cs.utah.edu/udipi
2
My other computer is..
3
Scaling server farms

Facebook 30,000 servers, 80 Billion images
stored, serves 600,000 photos a second, logs 25
TB of data per day the statistics can go on..
The primary challenge to scaling efficient
supply of data to thousands of cores
Its all about the memory!

4
Performance Trends

Demand-side
Multi-socket, multi-core, multi-thread
Large datasets - big data analytics, scientific
computation models
RAMCloud-like designs
1 TB/s per node by 2017
Supply-side
Pin count, per pin BW, capacity
Severely power limited

5
Energy Trends

Datacenters consume 2 of all power generated in
the US
Operation cooling
100 Billion kWh, 7.4 Billion
25-40 of total power in large systems consumed
in memory
As processors get simpler, this fraction likely
to increase

6
Cost-per-bit

Traditionally the holy grail of DRAM design
Operational expenditure over 3 years Capital
expenditure in datacenter servers
Cost-per-bit less important than before

3.00 13W
0.30 60W
7
Complexity Trends

The job of the memory controller is hard
18 timing parameters for DRAM!
Maintenance operations
Refresh, scrub, power down, etc.
Several DIMM and controller variants
Hard to provide interoperability
Need processor-side support for new
memory features
Now throw in heterogeneity
Memristors, PCM, STT-RAM, etc.

8
Reliability Trends

Shrinking feature sizes not helping
Nor is the scale
64 x 1015 DRAM cells in a typical datacenter
DRAM errors the 1 reason for servers at Google
to enter repair
Datacenters are the backbone of web-connected
infrastructure
Reliability is essential
Server downtime has huge economic impact
Breached SLAs, for example

9
Thesis statement

Main memory systems are at an inflection point
Convergence of several trends
Major overhaul required to achieve a system that
is
Energy-efficient, high-performance,
low-complexity, reliable, and cost effective
Combination of two things
Prudent application of novel technologies
Fundamental rethinking of conventional design
decisions

10
Designing Future Memory Systems
CPU
DIMM

4
1
2
MC
3
2
3
1
4
Memory Chip Architecture reducing overfetch
increasing parallelism ISCA 10
Memory protocol Streamlined Slot-based
Interface with semi-autonomous memory ISCA 11
3
1
Memory Interconnect Prudent use of Silicon
Photonics, without modifying DRAM dies ISCA
11
Memory Reliability Efficient RAID-based
high-availability Chipkill memory ISCA
12
2
4
11
PART 1 Memory Chip Organization
12
Key bottleneck
DRAM Chip
DRAM Chip
DRAM Chip
DRAM Chip
RAS
CAS
Cache Line
Row Buffer
One bank shown in each chip
13
Why this is a problem
14

15
SSA Architecture
ONE DRAM CHIP
ADDR/CMD BUS
64 Bytes
DIMM
Subarray
Bank
Bitlines
Row buffer
8
8
8
8
8
8
8
8
8
DATA BUS
MEMORY CONTROLLER
Global Interconnect to I/O
16
SSA Operation
DRAM Chip
DRAM Chip
DRAM Chip
DRAM Chip
Subarray
Subarray
Subarray
Subarray
Subarray
Subarray
Subarray
Subarray
Address
Subarray
Subarray
Subarray
Subarray
Subarray
Subarray
Subarray
Subarray
Sleep Mode (or other parallel accesses)
Cache Line
17
SSA Impact

Energy reduction
Dynamic fewer bitlines activated
Static smaller activation footprint more and
longer spells of inactivity better power down
Latency impact
Limited pins per cache line serialization
latency
Higher bank-level parallelism shorter queuing
delays
Area increase
More peripheral circuitry and I/O at finer
granularities area overhead (lt 5)

18
Key Contributions

Up to 6X reduction in DRAM chip dynamic energy
Up to 5X reduction in DRAM chip static energy
Up to 50 improvements in performance in
applications limited by bank contention
All for 5 increase in area

19
PART 2 Memory Interconnect
20
Key Bottleneck

Fundamental nature of electrical pins
Limited pin count, per pin bandwidth, memory
capacity, etc.
Diverging growth rates of core count and pin
count
Limited by physics, not engineering!

21
Silicon Photonic Interconnects

We need something that can break the
edge-bandwidth bottleneck
Ring modulator based photonics
Off chip light source
Indirect modulation using resonant rings
Relatively cheap coupling on- and off-chip
DWDM for high bandwidth density
As many as 67 wavelengths possible
Limited by Free Spectral Range, and coupling
losses between rings

Source Xu et al. Optical Express 16(6), 2008
22
The Questions Were Trying to Answer
How do we make photonics less invasive to memory
die design?
What should the role of electrical signaling be?
Should we replace all interconnects with
photonics? On-chip too?
What should the role of 3D be in an optically
connected memory?
Should we be designing photonic DRAM dies?
Stacks? Channels?
23
Design Considerations I

Photonic interconnects
Large static power dissipation ring tuning
Rings are designed to resonate at a specific
frequency
Processing defects and temperature change this
Need to heat the rings to correct for this
Much lower dynamic energy consumption
relatively independent of distance
Electrical interconnects
Relatively small static power dissipation
Large dynamic energy consumption

24
Design Considerations II

Should not over-provision photonic bandwidth, use
only where necessary
Use photonics where theyre really useful
To break the off-chip pin barrier
Exploit 3D-Stacking and TSVs
High bandwidth, low static power, decouples
memory dies
Exploit low-swing wires
Cheap on-chip communication

25
Proposed Design
ADVANTAGE 1 Increased activity factor, more
efficient use of photonics
ADVANTAGE 3 Not disruptive to the design of
commodity memory dies
ADVANTAGE 2 Rings are co-located easier to
isolate or tune thermally
DRAM chips
Processor
DIMM
Waveguide
Photonic Interface die
Memory controller
26
Key Contributions
DRAM chips
Processor
DIMM
Waveguide
Photonic Interface die
Memory controller

23 reduced energy consumption
4X capacity per channel
Potential for performance improvements due
to increased bank count
Less disruptive to memory die design

Makes the job of the memory controller difficult!
27
PART 3 Memory Access Protocol
28
Key Bottleneck

Large capacity, high bandwidth, and evolving
technology trends will increase pressure on the
memory interface
Memory controller micro-manages every operation
of the memory system
Processor-side support required for every memory
innovation
Several signals between processor and memory
Heavy pressure on address/command bus
Worse with several independent banks, large
amounts of state

29
Proposed Solution

Release MCs tight control, make memory stack
more autonomous
Move mundane tasks to the interface die
Maintenance operation (refresh, scrub, etc.)
Routine operations (DRAM precharge, NVM wear
leveling)
Timing control (18 constraints for DRAM alone)
Coding and any other special requirements
Processor-side controller only schedules requests
and controls data bus

30
Memory Access Operation
ML
ML
gt ML
x
x
x
S1
S2
Arrival
Issue
First free slot
Start looking
Backup slot
Time
Slot Cache line data bus occupancy X
Reserved Slot ML Memory Latency Addr.
latency Bank access Data bus
latency
31
Performance Impact Synthetic Traffic
lt 9 latency impact, even at maximum load
Virtually no impact on achieved bandwidth
32
Performance Impact PARSEC/STREAM
Apps have very low BW requirements Scaled down
system, similar trends
33
Key Contributions

Plug and play
Everything is interchangeable and interoperable
Only interface-die support required (communicate
ML)
Better support for heterogeneous systems
Easier DRAM-NVM data movement on the same channel
More innovation in the memory system
Without processor-side support constraints
Fewer commands between processor and memory
Energy, performance advantages

34
PART 4 Memory Reliability
35
Key Bottleneck

Increased access granularity
Every data access is spread across 36 DRAM chips
DRAM industry standards define minimum access
granularity from each chip
Massive overfetch of data at multiple levels
Wastes energy
Wastes bandwidth
Occupies ranks/banks for longer, hurting
performance
x4 device width restriction
fewer ranks for given DIMM real estate
x8/x16/x32 more power efficient per capacity
Reliability level 1 failed chip out of 36

36
A new approach LOT-ECC

Operate on a single rank of memory 9 chips
and support failure of 1 chip per rank (9 chips)
Multiple tiers of localized protection
Tier-1 Local Error Detection (checksums)
Tier 2 Global Error Correction (parity)
T3 T4 to handle specific failure cases
Error correction data stored in data memory
Data mapping handled by memory controller with
firmware support
Transparent to OS, caches, etc.

37
LOT-ECC Design
38
The Devil is in the Details

Were borrowing one bit from data LED to
use in the GEC
Put them all in the same DRAM row
When a cache line is written,
Write data, LED, GEC all self-contained
no read-before-write
Guaranteed row-buffer hit

39
Key Benefits

Energy Efficiency Fewer chips activated per
access, reduced access granularity, reduced
static energy through better use of low-power
modes
Performance Gains More rank-level parallelism,
reduced access granularity
Improved Protection Can handle 1 failed chip out
of 9, compared to 1 in 36 currently
Flexibility Works with a single rank of x4 DRAMs
or more efficient wide-I/O x8/x16 DRAMs
Implementation Ease Changes to memory controller
and system firmware only commodity
processor/memory/OS

40
Power Results
-55
41
Performance Results
Latency Reduction LOT-ECC x8 43 GEC
Coalescing 47 Oracular 57
42
Exploiting features in SSA
43
Putting it all together
44
Summary

Tremendous pressure on the memory system
Bandwidth, energy, complexity, reliability
Prudently apply novel technologies
Silicon photonics
Low-swing wires
3D-stacking
Rethink some fundamental design choices
Micromanagement by the memory controller
Overfetch in the face of diminishing locality
Conventional ECC codes

45
Impact

Significant static/dynamic energy reduction
Memory core, channel, controller, reliability
Significant performance improvement
Bank parallelism, channel bandwidth, reliability
Significant complexity reduction
Memory controller
Improved reliability

46
Synergies

SSA Photonics
Photonics Autonomous memory
SSA Reliability
SSA, Photonics, and LOT-ECC provide additive
energy benefits
Each targets one of three major sources of energy
consumption DRAM array, off-chip channel,
reliability
SSA, Photonics, and LOT-ECC also provide additive
performance benefits
Each targets one of three major performance
bottleneck Bank-contention, off-chip BW,
reliability

47
Research Contributions