Memory and IO Systems

About This Presentation

Title:

Memory and IO Systems

Description:

subset of lower levels (contains most recently used data) ... Historically, it predates caches. Comparing the 2 levels of hierarchy ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 95

Provided by: aash63

Category:

more less

Transcript and Presenter's Notes

Title: Memory and IO Systems

1

EE 382N Superscalar Microprocessor
Architecture Chapter 3

Memory and I/O Systems
Prof. Lizy Kurian John

2
A Typical Computer System
3
Memory Hierarchy
4
Properties of ideal memory system

Infinite capacity
Infinite bandwidth
Instantaneous or zero latency
Persistence or non-volatility
Low implementation cost

5
Memory Hierarchy Components
6
Memory Hierarchy
Higher
Lower
As we move to deeper levels the latency goes up
and price per bit goes down.
7
Memory Hierarchy

If level closer to Processor, it must be
smaller
faster
subset of lower levels (contains most recently
used data)
Lowest Level (usually disk) contains all
available data
Other levels?

8
Attributes of memory hierarchy components
9
Why We Use Caches
µProc 60/yr.
1000
CPU
Moores Law
100
Processor-Memory Performance Gap(grows 50 /
year)
Performance
10
DRAM 7/yr.
DRAM
1
1989
1990
1980
1981
1983
1984
1985
1986
1987
1988
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982

1989 first Intel CPU with cache on chip
1998 Pentium III has two levels of cache on chip
2007 many chips have 3 levels of caches

10
Memory Hierarchy Basis

Disk contains everything.
When Processor needs something, bring it into to
all higher levels of memory.
Cache contains copies of data in memory that are
being used.
Memory contains copies of data on disk that are
being used.
Entire idea is based on Locality

11
Locality

Temporal Locality if we use it now, well want
to use it again soon (a Big Idea)
Spatial Locality if we use something now, well
want to use things near it very soon
Caches contain the hardware mechanism to capture
the temporal and spatial locality in programs

12
Temporal and Spatial Locality
13
Capturing Locality

Temporal Locality Save what you bring
Spatial LocalityBring in nearby items too Use
large blocks

14
Cache Design

How do we decide what to bring into the cache?
How do we decide where to put it into?
How do we know which elements are in cache?
How do we quickly locate them?
When we bring something in, if there is no space,
how do we make space for it?

15
Cache design

Mapping Strategies
Direct mapped
Set Associative
Fully Associative
Replacement Strategies
LRU (Least recently used)
Random, FIFO, OPTIMAL

16
Cache Organization schemes
(a) Direct Mapped
(b) Fully Associative
(c) Set Associative
17
Cache Mapping Strategies

Direct-Mapped Cache Each memory address or block
can go into only one specific location in the
cache
Set Assoc block can occupy any position within a
set
Fully Associative block can be written into any
position

18
Direct-Mapped Cache

Cache Location 0 can be occupied by data from
Memory location 0, 4, 8, ...
4 blocks gt any memory location that is multiple
of 4

19
Associative Cache Example

Heres a simple 2 way set associative cache.

20
Fully Associative Cache

Any Cache Location can be occupied by data from
any blocks

21
Tag and Index bits

Since multiple memory addresses map to same cache
index, how do we tell which one is in there?
What if we have a block size gt 1 byte?

22
Locating Stuff in Cache

Index specifies the cache index (which row of
the cache we should look in)
Offset once weve found correct block, specifies
which byte within the block we want
Tag the remaining bits after offset and index
are determined these are used to distinguish
between all the memory addresses that map to the
same location

23
Direct-Mapped Cache Example

Index (index into an array of blocks)
need to specify correct row in cache
cache contains 16 KB 214 bytes
block contains 24 bytes (4 words)
blocks/cache
bytes/cache bytes/block
214 bytes/cache 24 bytes/block
210 blocks/cache
need 10 bits to specify this many rows

24
Direct-Mapped Cache Example

Tag use remaining bits as tag
tag length addr length offset - index
32 - 4 - 10 bits 18 bits
so tag is leftmost 18 bits of memory address
Why not full 32 bit address as tag?
All bytes within block need same address (4b)
Index must be same for every address within a
block, so its redundant in tag check, thus can
leave off to save memory (here 10 bits)

25
Accessing data in a direct mapped cache

4 Addresses
0x00000014, 0x0000001C, 0x00000034, 0x00008014
4 Addresses divided (for convenience) into Tag,
Index, Byte Offset fields

000000000000000000 0000000001 0100 000000000000000
000 0000000001 1100 000000000000000000 0000000011
0100 000000000000000010 0000000001 0100 Tag
Index Offset
26
Fully Associative Cache

Fully Associative Cache (e.g., 32 B block)
compare tags in parallel

27
Fully Associative Cache (1/2)

What does this mean?
no rows any block can go anywhere in the cache
must compare with all tags in entire cache to see
if data is there
Memory address fields
Tag same as before
Offset same as before
Index non-existent

28
Fully Associative Cache (2/2)

Benefit of Fully Assoc Cache
No Conflict Misses (since data can go anywhere)
Drawbacks of Fully Assoc Cache
Need hardware comparator for every single entry
if we have a 64KB of data in a cache with 4B
entries, we need 16K comparators infeasible

29
Caching Terminology

When we try to read memory, 3 things can happen
cache hit cache block is valid and contains
proper address, so read desired word
cache miss nothing in cache in appropriate
block, so fetch from memory
cache miss, block replacement required data not
in cache some other data in the space fetch
desired data from memory and replace

30
Block Replacement Policy (1/2)

Direct-Mapped Cache index completely specifies
position which position a block can go in on a
miss
N-Way Set Assoc index specifies a set, but block
can occupy any position within the set on a miss
Fully Associative block can be written into any
position
Question if we have the choice, where should we
write an incoming block?

31
Block Replacement Policy (2/2)

If there are any locations with valid bit off
(empty), then usually write the new block into
the first one.
If all possible locations already have a valid
block, we must pick a replacement policy rule by
which we determine which block gets cached out
on a miss.

32
Block Replacement Policy LRU

LRU (Least Recently Used)
Idea cache out block which has been accessed
(read or write) least recently
Pro temporal locality ? recent past use implies
likely future use in fact, this is a very
effective policy
Con with 2-way set assoc, easy to keep track
(one LRU bit) with 4-way or greater, requires
complicated hardware and much time to keep track
of this

33
Block Replacement Example

We have a 2-way set associative cache with a four
word total capacity and one word blocks. We
perform the following word accesses (ignore bytes
for this problem)
0, 2, 0, 1, 4, 0, 2, 3, 5, 4
How many hits and how many misses will there be
for the LRU block replacement policy?

34
Block Replacement Example LRU
loc 0
loc 1
lru
0

Addresses 0, 2, 0, 1, 4, 0, ...

0 miss, bring into set 0 (loc 0)
2
2 miss, bring into set 0 (loc 1)
0 hit
1 miss, bring into set 1 (loc 0)
lru
1
4 miss, bring into set 0 (loc 1, replace 2)
0 hit
35
Block Size Tradeoff Conclusions
36
Block Size Tradeoff (1/3)

Benefits of Larger Block Size
Spatial Locality if we access a given word,
were likely to access other nearby words soon
Very applicable with Stored-Program Concept if
we execute a given instruction, its likely that
well execute the next few as well
Works nicely in sequential array accesses too

37
Block Size Tradeoff (2/3)

Drawbacks of Larger Block Size
Larger block size means larger miss penalty
on a miss, takes longer time to load a new block
from next level
If block size is too big relative to cache size,
then there are too few blocks
Result miss rate goes up
In general, minimize Average Memory Access Time
(AMAT)
Hit Time Miss Penalty x Miss Rate

38
Block Size Tradeoff (3/3)

Hit Time time to find and retrieve data from
current level cache
Miss Penalty average time to retrieve data on a
current level miss (includes the possibility of
misses on successive levels of memory hierarchy)
Hit Rate of requests that are found in
current level cache
Miss Rate 1 - Hit Rate

39
Cache Design Parameters
40
What to do on a write hit?

Write-through
update the word in cache block and corresponding
word in memory
Write-back
update word in cache block
allow memory word to be stale
Write back later
? add dirty bit to each block indicating that
memory needs to be updated when block is replaced

41
Write Allocate/No-Write-Allocate

If WT strategy, what happens at cache miss? Is
the block brought to cache at a write?
WTNWA - NO
WTWA -YES

42
Types of Cache Misses (1/2)

Three Cs Model of Misses
1st C Compulsory Misses
occur when a program is first started
cache does not contain any of that programs data
yet, so misses are bound to occur
cant be avoided easily, so wont focus on these
in this course

43
Types of Cache Misses (2/2)

2nd C Conflict Misses
miss that occurs because two distinct memory
addresses map to the same cache location
two blocks (which happen to map to the same
location) can keep overwriting each other
big problem in direct-mapped caches
how do we lessen the effect of these?
Dealing with Conflict Misses
Solution 1 Make the cache size bigger
Fails at some point
Solution 2 Multiple distinct blocks can fit in
the same cache Index?

44
Third Type of Cache Miss

Capacity Misses
miss that occurs because the cache has a limited
size
miss that would not occur if we increase the size
of the cache
sketchy definition, so just get the general idea
This is the primary type of miss for Fully
Associative caches.

Average Memory Access Time (AMAT)
Hit Time Miss Penalty x Miss
Rate
CPI Ideal CPI (Core CPI) MCPI
MCPI Memory CPI

46
Example

Assume
Hit Time 1 cycle
Miss rate 5
Miss penalty 20 cycles
Calculate AMAT
Avg mem access time
1 0.05 x 20
1 1 cycles
2 cycles

47
Cache Area Overhead

Cache contains useful data and Tag, Valid bit,
dirty bit etc.
If a cache is described to be16K bytes, often
16KB is the useful data capacity
The cache RAM is often 20 or 24K bytes
The amount of area spent as tag depends on the
mapping strategy and block size
Fully assoc means more tag area
Small block size means more tag area

48
A Typical Memory Hierarchy
49
A Typical Main Memory Organization
50
DRAM Chip Organization
51
Memory Module Organization
52
Virtual Memory System
53
Another View of the Memory Hierarchy
Regs
Upper Level
Instr. Operands
Faster
Cache
Blocks
L2 Cache
Blocks
Memory
Pages
Disk
Files
Larger
Tape
Lower Level
54
Memory Hierarchy Requirements

If Principle of Locality allows caches to offer
(close to) speed of cache memory with size of
DRAM memory,then recursively why not use at next
level to give speed of DRAM memory, size of Disk
memory?
While were at it, what other things do we need
from our memory system?

55
Virtual Memory

Allows OS to share memory, protect programs from
each other
Today, more important for protection vs. just
another level of memory hierarchy
Each process thinks it has all the memory to
itself
Historically, it predates caches

56
Comparing the 2 levels of hierarchy

Cache Version Virtual Memory vers.
Block or Line Page
Miss Page Fault
Block Size 32-64B Page Size 4K-8KB
Placement Fully AssociativeDirect Mapped,
N-way Set Associative
Replacement Least Recently UsedLRU or
Random (LRU)
Write Thru or Back Write Back

57
Virtual to Physical Addr. Translation
Program operates in its virtual address space
Physical memory (incl. caches)
HW mapping
virtual address (inst. fetch load, store)
physical address (inst. fetch load, store)

Each program operates in its own virtual address
space only program running
Each is protected from the other
OS can decide where each goes in memory
Hardware (HW) provides virtual ? physical mapping

58
Mapping Virtual Memory to Physical Memory
Virtual Memory

Divide into equal sizedchunks (about 4 KB - 8 KB)

Stack

Any chunk of Virtual Memory assigned to any chuck
of Physical Memory (page)

Physical Memory
64 MB
0
0
59
Paging Organization (assume 1 KB pages)
Page is unit of mapping
Page also unit of transfer from disk to physical
memory
60

Virtual Memory Mapping

Use table lookup (Page Table) for mappings
Page number is index
Physical Page Number PageTableVirtual Page
Number
(P.P.N. also called Page Frame)

61
Page Table

A page table is an operating system structure
which contains the mapping of virtual addresses
to physical locations

62
Address Mapping Page Table
Page Table located in physical memory
63
Paging/Virtual Memory Multiple Processes
User B Virtual Memory
User A Virtual Memory

Physical Memory
Stack
Stack
64 MB
Heap
Heap
Static
Static
0
Code
Code
0
0
64
Virtual Memory Problem 1

Map every address ? 1 indirection via Page Table
in memory per virtual address ? 1 virtual memory
accesses 2 physical memory accesses ? SLOW!
Observation since locality in pages of data,
there must be locality in virtual address
translations of those pages
Since small is fast, why not use a small cache of
virtual to physical address translations to make
translation fast?
For historical reasons, cache is called a
Translation Lookaside Buffer, or TLB

65
Translation Look-Aside Buffers (TLBs)

TLBs usually small, typically 128 - 256 entries
Like any other cache, the TLB can be direct
mapped, set associative, or fully associative

hit
PA
miss
VA
TLB Lookup
Cache
Main Memory
Processor
miss
hit
Trans- lation
data
On TLB miss, get page table entry from main memory
66
What if not in TLB?

Option 1 Hardware checks page table and loads
new Page Table Entry into TLB
Option 2 Hardware traps to OS, up to OS to
decide what to do
MIPS follows Option 2 Hardware knows nothing
about page table

67
What if the data is on disk?

We load the page off the disk into a free block
of memory, using a DMA (Direct Memory Access
very fast!) transfer
Meantime we switch to some other process waiting
to be run
When the DMA is complete, we get an interrupt and
update the process's page table
So when we switch back to the task, the desired
data will be in memory

68
What if we dont have enough memory?

We chose some other page belonging to a program
and transfer it onto the disk if it is dirty
If clean (disk copy is up-to-date), just
overwrite that data in memory
We chose the page to evict based on replacement
policy (e.g., LRU)
And update that program's page table to reflect
the fact that its memory moved somewhere else
If continuously swap between disk and memory,
called Thrashing

69
Virtual Memory Overview (1/4)

User program view of memory
Contiguous
Start from some set address
Infinitely large
Is the only running program
Reality
Non-contiguous
Start wherever available memory is
Finite size
Many programs running at a time

70
Virtual Memory Overview (2/4)

Virtual memory provides
illusion of contiguous memory
all programs starting at same set address
illusion of infinite memory (232 or 264 bytes)
protection

71
Virtual Memory Overview (3/4)

Implementation
Divide memory into chunks (pages)
Operating system controls page table that maps
virtual addresses into physical addresses
Think of memory as a cache for disk
TLB is a cache for the page table

72
Virtual Memory Overview (4/4)

Lets say were fetching some data
Check TLB (input VPN, output PPN)
hit fetch translation
miss check page table (in memory)
Page table hit fetch translation
Page table miss page fault, fetch page from disk
to memory, return translation to TLB
Check cache (input PPN, output data)
hit return value
miss fetch value from memory

73
Overview of Address Translation
74
Virtual Memory System
75
Handling a Page Fault
76
A Typical Page Table Entry
77
Multilevel Forward Page Table
78
Hashed Page Table
79
Memory Hierarchy Implementation
80
Direct Mapped Cache
(a) Single Word Per Block
(a) Multi-Word Per Block
81
Fully Associative Cache
82
Set Associative Cache
83
Translation of Virtual Word Address
84
Translation of Virtual Page Address
85
Direct Mapped TLB
86
Other configurations of TLB
(a) Set Associative TLB
(b) Fully Associative TLB
87
Interaction between TLB and D-cache
88
Virtually Indexed D-cache
89
Input/Output Systems
90
Disk Drive Structures
91
Striping Data in Disk Arrays
92
Placement of Parity Blocks
93
Bus Design Parameters
94
Time Sharing the CPU

Write a Comment

User Comments (0)