Title: COMP 4300 Computer Architecture Block Placement
1COMP 4300 Computer ArchitectureBlock Placement
Dr. Xiao Qin Auburn Universityhttp//www.eng.aubu
rn.edu/xqin xqin_at_auburn.edu
Fall, 2008
2Memory Systems - the Big Picture
- Memory provides processor with
- Instructions
- Data
- Problem memory is too slow and too small
3Why Cares About Memory Hierarchy?
Processor-DRAM Memory Gap (latency)
4Levels of Memory Hierarchy
5Memory Configuration in Current PCs
- Static RAM (SRAM) - used for L1, L2 cache
- Fast - 0.5-25ns access time (less for on-chip)
- Larger, More Expensive
- Higher power consumption
- Dynamic RAM (DRAM) - used for PC main memory
- Slower - 80-250ns access time
- Smaller, Cheaper
- Lower power consumption
6System Components
CPU Core 1 GHz - 3.6 GHz 4-way Superscaler RISC
or RISC-core (x86) Deep Instruction
Pipelines Dynamic scheduling Multiple
FP, integer FUs Dynamic branch prediction
Hardware speculation
All Non-blocking caches L1 16-128K 1-2
way set associative (on chip), separate or
unified L2 256K- 2M 4-32 way set associative
(on chip) unified L3 2-16M 8-32 way
set associative (on or off chip) unified
L1 L2 L3
CPU
Caches
SDRAM PC100/PC133 100-133MHz 64-128 bits
wide 2-way inteleaved 900 MBYTES/SEC
)64bit) Double Date Rate (DDR)
SDRAM PC3200 200 MHz DDR 64-128 bits wide 4-way
interleaved 3.2 GBYTES/SEC (64bit) DDR2
SDRAM 667MHZ 816 bit wide
Examples Alpha, AMD K7 EV6, 200-400 MHz
Intel PII, PIII GTL 133
MHz Intel P4
800 MHz
(FSB)
(possibly on-chip)
System Bus
Bus Adapter
Main I/O Bus
Example PCI, 33-66MHz 32-64 bits
wide 133-528 MB/s PCI-X 133MHz 64-bits
wide 1066 MB/s
Memory Bus
I/O Controllers
Disks Displays Keyboards
Networks
Chipset
I/O Devices
Chipset
I/O Subsystem
North Bridge
South Bridge
Important issue Which component creates a system
performance bottleneck?
7The Principle of Locality
- The Principle of Locality
- Program access a relatively small portion of the
address space at any instant of time. - Two Different Types of Locality
- Temporal Locality (Locality in Time) If an item
is referenced, it will tend to be referenced
again soon (e.g., loops, reuse) - Spatial Locality (Locality in Space) If an item
is referenced, items whose addresses are close by
tend to be referenced soon (e.g., straightline
code, array access) - Last 15 years, HW relied on locality for speed
8Why Hierarchy Works
- The principle of locality
- Programs access a relatively small portion of the
address space at any instant of time. - Temporal locality recently accessed data is
likely to be used again - Spatial locality data near recently accessed
data is likely to be used soon - Result the illusion of large, fast memory
9Cache Operation
- Insert between CPU, Main Mem.
- Implement with fast Static RAM
- Holds some of a programs
- data
- instructions
- Operation
CPU
Processor
addr
data
Cache Memory
addr
data
DRAM Memory
10Cache Performance Measures
- Hit rate fraction found in the cache
- So high that we usually talk about Miss rate 1
- Hit Rate - Hit time time to access the cache
- Miss penalty time to replace a block from lower
level, including time to replace in CPU - access time time to acccess lower level
- transfer time time to transfer block
- Average memory-access time (AMAT)
- Hit time Miss rate x Miss penalty (ns or
clocks)
11Fundamental Questions
- Q1 Where can a block be placed in the upper
level? - (Block placement)
- Q2 How is a block found if it is in the upper
level? - (Block identification)
- Q3 Which block should be replaced on a miss?
- (Block replacement)
- Q4 What happens on a write?
- (Write strategy)
12Q1 Block Placement
- Where can block be placed in cache?
- In one predetermined place - direct-mapped
- Use fragment of address to calculate block
location in cache - Compare cache block with tag to test if block
present - Anywhere in cache - fully associative
- Compare tag to every block in cache
- In a limited set of places - set-associative
- Use address fragment to calculate set (like
direct-mapped) - Place in any block in the set
- Compare tag to every block in set
- Hybrid of direct mapped and fully associative
13Direct Mapped Block Placement
address maps to block location (block address
MOD blocks in cache)
14Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 1 Mapping 0 modulo 4 0
15Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 1 Mapping 0 mod 4 0
Set 0 is empty write Mem0
16Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 2 Mapping 8 mod 4 0
17Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 2 Mapping 8 mod 4 0
Set 0 contains Mem0. Overwrite with Mem8
18Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 3 Mapping 0 mod 4 0
19Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 3 Mapping 0 mod 4 0
Set 0 contains Mem8. Overwrite with Mem0
20Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 4 Mapping 6 mod 4 2
21Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 4 Mapping 6 mod 4 2
Set 2 empty. Write Mem6
22Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 5 Mapping 8 mod 4 0
23Example Accessing A Direct-Mapped Cache
- DM cache contains 4 1-word blocks. Find the
Misses for each cache given this sequence of
memory block accesses 0, 8, 0, 6, 8 - DM Memory Access 5 Mapping 8 mod 4 0
Set 0 contains Mem0. Overwrite with Mem8
24Direct-Mapped Cache with n one-word blocks
- Pros find data fast
- Con What if access 00001 and 10001 repeatedly?
- ? We always miss
25Fully Associative Block Placement
Cache
arbitrary block mapping location any
04
00
08
0C
10
14
18
1C
20
24
28
2C
30
34
38
3C
40
44
48
4C
Memory
26Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 1
FA Block Replacement Rule replace least recently
used block in set
27Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 1
Set 0 is empty write Mem0 to Block 0
28Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 2
29Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 2
Blocks 1-3 are LRU write Mem8 to Block 1
30Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 3
31Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 3
Block 0 contains Mem0
32Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 4
33Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 4
Blocks 2-3 are LRU write Mem6 to Block 2
34Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 5
35Example Accessing A Fully-Associative Cache
- Fully-Associative cache contains 4 1-word blocks.
Find the Misses for each cache given this
sequence of memory block accesses 0, 8, 0, 6, 8 - FA Memory Access 5
Block 1 contains Mem8
36Fully-Associative Cache Basics
1 set, n blocks no mapping restrictions on
how blocks are stored in cache many ways, e.g.
least recently used is overwritten (LRU)
Example 1-set, 8-block FA cache
PRO CON
Less likely to overwrite needed data Must
search entire cache for hit/miss
37Set-Associative Block Placement
Cache
4
0
8
C
0
4
8
C
address maps to set location (block address
MOD sets in cache)(arbitrary location in set)
Set 3
Set 2
Set 0
Set 1
04
00
08
0C
10
14
18
1C
20
24
28
2C
30
34
38
3C
40
44
48
4C
Memory