COMP 4300 Computer Architecture Block Placement - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

COMP 4300 Computer Architecture Block Placement

Description:

DM cache contains 4 1-word blocks. ... DM Memory Access 1: Mapping: 0 modulo 4 = 0. Block 3. Block 2. Block 1. Block 0. 0. DM Hit/Miss. Mem Block. 15. Example: ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 38
Provided by: Xiao89
Category:

less

Transcript and Presenter's Notes

Title: COMP 4300 Computer Architecture Block Placement


1
COMP 4300 Computer ArchitectureBlock Placement
Dr. Xiao Qin Auburn Universityhttp//www.eng.aubu
rn.edu/xqin xqin_at_auburn.edu
Fall, 2008
2
Memory Systems - the Big Picture
  • Memory provides processor with
  • Instructions
  • Data
  • Problem memory is too slow and too small

3
Why Cares About Memory Hierarchy?
Processor-DRAM Memory Gap (latency)
4
Levels of Memory Hierarchy
5
Memory Configuration in Current PCs
  • Static RAM (SRAM) - used for L1, L2 cache
  • Fast - 0.5-25ns access time (less for on-chip)
  • Larger, More Expensive
  • Higher power consumption
  • Dynamic RAM (DRAM) - used for PC main memory
  • Slower - 80-250ns access time
  • Smaller, Cheaper
  • Lower power consumption

6
System Components
CPU Core 1 GHz - 3.6 GHz 4-way Superscaler RISC
or RISC-core (x86) Deep Instruction
Pipelines Dynamic scheduling Multiple
FP, integer FUs Dynamic branch prediction
Hardware speculation
All Non-blocking caches L1 16-128K 1-2
way set associative (on chip), separate or
unified L2 256K- 2M 4-32 way set associative
(on chip) unified L3 2-16M 8-32 way
set associative (on or off chip) unified
L1 L2 L3
CPU
Caches
SDRAM PC100/PC133 100-133MHz 64-128 bits
wide 2-way inteleaved 900 MBYTES/SEC
)64bit) Double Date Rate (DDR)
SDRAM PC3200 200 MHz DDR 64-128 bits wide 4-way
interleaved 3.2 GBYTES/SEC (64bit) DDR2
SDRAM 667MHZ 816 bit wide
Examples Alpha, AMD K7 EV6, 200-400 MHz
Intel PII, PIII GTL 133
MHz Intel P4
800 MHz
(FSB)
(possibly on-chip)
System Bus
Bus Adapter
Main I/O Bus
Example PCI, 33-66MHz 32-64 bits
wide 133-528 MB/s PCI-X 133MHz 64-bits
wide 1066 MB/s
Memory Bus
I/O Controllers
Disks Displays Keyboards
Networks
Chipset
I/O Devices
Chipset
I/O Subsystem
North Bridge
South Bridge
Important issue Which component creates a system
performance bottleneck?
7
The Principle of Locality
  • The Principle of Locality
  • Program access a relatively small portion of the
    address space at any instant of time.
  • Two Different Types of Locality
  • Temporal Locality (Locality in Time) If an item
    is referenced, it will tend to be referenced
    again soon (e.g., loops, reuse)
  • Spatial Locality (Locality in Space) If an item
    is referenced, items whose addresses are close by
    tend to be referenced soon (e.g., straightline
    code, array access)
  • Last 15 years, HW relied on locality for speed

8
Why Hierarchy Works
  • The principle of locality
  • Programs access a relatively small portion of the
    address space at any instant of time.
  • Temporal locality recently accessed data is
    likely to be used again
  • Spatial locality data near recently accessed
    data is likely to be used soon
  • Result the illusion of large, fast memory

9
Cache Operation
  • Insert between CPU, Main Mem.
  • Implement with fast Static RAM
  • Holds some of a programs
  • data
  • instructions
  • Operation

CPU
Processor
addr
data
Cache Memory
addr
data
DRAM Memory
10
Cache Performance Measures
  • Hit rate fraction found in the cache
  • So high that we usually talk about Miss rate 1
    - Hit Rate
  • Hit time time to access the cache
  • Miss penalty time to replace a block from lower
    level, including time to replace in CPU
  • access time time to acccess lower level
  • transfer time time to transfer block
  • Average memory-access time (AMAT)
  • Hit time Miss rate x Miss penalty (ns or
    clocks)

11
Fundamental Questions
  • Q1 Where can a block be placed in the upper
    level?
  • (Block placement)
  • Q2 How is a block found if it is in the upper
    level?
  • (Block identification)
  • Q3 Which block should be replaced on a miss?
  • (Block replacement)
  • Q4 What happens on a write?
  • (Write strategy)

12
Q1 Block Placement
  • Where can block be placed in cache?
  • In one predetermined place - direct-mapped
  • Use fragment of address to calculate block
    location in cache
  • Compare cache block with tag to test if block
    present
  • Anywhere in cache - fully associative
  • Compare tag to every block in cache
  • In a limited set of places - set-associative
  • Use address fragment to calculate set (like
    direct-mapped)
  • Place in any block in the set
  • Compare tag to every block in set
  • Hybrid of direct mapped and fully associative

13
Direct Mapped Block Placement
address maps to block location (block address
MOD blocks in cache)
14
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 1 Mapping 0 modulo 4 0

15
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 1 Mapping 0 mod 4 0

Set 0 is empty write Mem0
16
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 2 Mapping 8 mod 4 0

17
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 2 Mapping 8 mod 4 0

Set 0 contains Mem0. Overwrite with Mem8
18
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 3 Mapping 0 mod 4 0

19
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 3 Mapping 0 mod 4 0

Set 0 contains Mem8. Overwrite with Mem0
20
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 4 Mapping 6 mod 4 2

21
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 4 Mapping 6 mod 4 2

Set 2 empty. Write Mem6
22
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 5 Mapping 8 mod 4 0

23
Example Accessing A Direct-Mapped Cache
  • DM cache contains 4 1-word blocks. Find the
    Misses for each cache given this sequence of
    memory block accesses 0, 8, 0, 6, 8
  • DM Memory Access 5 Mapping 8 mod 4 0

Set 0 contains Mem0. Overwrite with Mem8
24
Direct-Mapped Cache with n one-word blocks
  • Pros find data fast
  • Con What if access 00001 and 10001 repeatedly?
  • ? We always miss

25
Fully Associative Block Placement
Cache
arbitrary block mapping location any
04
00
08
0C
10
14
18
1C
20
24
28
2C
30
34
38
3C
40
44
48
4C
Memory
26
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 1

FA Block Replacement Rule replace least recently
used block in set
27
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 1

Set 0 is empty write Mem0 to Block 0
28
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 2

29
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 2

Blocks 1-3 are LRU write Mem8 to Block 1
30
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 3

31
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 3

Block 0 contains Mem0
32
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 4

33
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 4

Blocks 2-3 are LRU write Mem6 to Block 2
34
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 5

35
Example Accessing A Fully-Associative Cache
  • Fully-Associative cache contains 4 1-word blocks.
    Find the Misses for each cache given this
    sequence of memory block accesses 0, 8, 0, 6, 8
  • FA Memory Access 5

Block 1 contains Mem8
36
Fully-Associative Cache Basics
1 set, n blocks no mapping restrictions on
how blocks are stored in cache many ways, e.g.
least recently used is overwritten (LRU)
Example 1-set, 8-block FA cache
PRO CON
Less likely to overwrite needed data Must
search entire cache for hit/miss
37
Set-Associative Block Placement
Cache
4
0
8
C
0
4
8
C
address maps to set location (block address
MOD sets in cache)(arbitrary location in set)
Set 3
Set 2
Set 0
Set 1
04
00
08
0C
10
14
18
1C
20
24
28
2C
30
34
38
3C
40
44
48
4C
Memory
Write a Comment
User Comments (0)
About PowerShow.com