Memory and IO Systems - PowerPoint PPT Presentation

1 / 94
About This Presentation
Title:

Memory and IO Systems

Description:

subset of lower levels (contains most recently used data) ... Historically, it predates caches. Comparing the 2 levels of hierarchy ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 95
Provided by: aash63
Category:

less

Transcript and Presenter's Notes

Title: Memory and IO Systems


1

EE 382N Superscalar Microprocessor
Architecture Chapter 3
  • Memory and I/O Systems
  • Prof. Lizy Kurian John

2
A Typical Computer System
3
Memory Hierarchy
4
Properties of ideal memory system
  • Infinite capacity
  • Infinite bandwidth
  • Instantaneous or zero latency
  • Persistence or non-volatility
  • Low implementation cost

5
Memory Hierarchy Components
6
Memory Hierarchy
Higher
Lower
As we move to deeper levels the latency goes up
and price per bit goes down.
7
Memory Hierarchy
  • If level closer to Processor, it must be
  • smaller
  • faster
  • subset of lower levels (contains most recently
    used data)
  • Lowest Level (usually disk) contains all
    available data
  • Other levels?

8
Attributes of memory hierarchy components
9
Why We Use Caches
µProc 60/yr.
1000
CPU
Moores Law
100
Processor-Memory Performance Gap(grows 50 /
year)
Performance
10
DRAM 7/yr.
DRAM
1
1989
1990
1980
1981
1983
1984
1985
1986
1987
1988
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
  • 1989 first Intel CPU with cache on chip
  • 1998 Pentium III has two levels of cache on chip
  • 2007 many chips have 3 levels of caches

10
Memory Hierarchy Basis
  • Disk contains everything.
  • When Processor needs something, bring it into to
    all higher levels of memory.
  • Cache contains copies of data in memory that are
    being used.
  • Memory contains copies of data on disk that are
    being used.
  • Entire idea is based on Locality

11
Locality
  • Temporal Locality if we use it now, well want
    to use it again soon (a Big Idea)
  • Spatial Locality if we use something now, well
    want to use things near it very soon
  • Caches contain the hardware mechanism to capture
    the temporal and spatial locality in programs

12
Temporal and Spatial Locality
13
Capturing Locality
  • Temporal Locality Save what you bring
  • Spatial LocalityBring in nearby items too Use
    large blocks

14
Cache Design
  • How do we decide what to bring into the cache?
  • How do we decide where to put it into?
  • How do we know which elements are in cache?
  • How do we quickly locate them?
  • When we bring something in, if there is no space,
    how do we make space for it?

15
Cache design
  • Mapping Strategies
  • Direct mapped
  • Set Associative
  • Fully Associative
  • Replacement Strategies
  • LRU (Least recently used)
  • Random, FIFO, OPTIMAL

16
Cache Organization schemes
(a) Direct Mapped
(b) Fully Associative
(c) Set Associative
17
Cache Mapping Strategies
  • Direct-Mapped Cache Each memory address or block
    can go into only one specific location in the
    cache
  • Set Assoc block can occupy any position within a
    set
  • Fully Associative block can be written into any
    position

18
Direct-Mapped Cache
  • Cache Location 0 can be occupied by data from
  • Memory location 0, 4, 8, ...
  • 4 blocks gt any memory location that is multiple
    of 4

19
Associative Cache Example
  • Heres a simple 2 way set associative cache.

20
Fully Associative Cache
  • Any Cache Location can be occupied by data from
    any blocks

21
Tag and Index bits
  • Since multiple memory addresses map to same cache
    index, how do we tell which one is in there?
  • What if we have a block size gt 1 byte?

22
Locating Stuff in Cache
  • Index specifies the cache index (which row of
    the cache we should look in)
  • Offset once weve found correct block, specifies
    which byte within the block we want
  • Tag the remaining bits after offset and index
    are determined these are used to distinguish
    between all the memory addresses that map to the
    same location

23
Direct-Mapped Cache Example
  • Index (index into an array of blocks)
  • need to specify correct row in cache
  • cache contains 16 KB 214 bytes
  • block contains 24 bytes (4 words)
  • blocks/cache
  • bytes/cache bytes/block
  • 214 bytes/cache 24 bytes/block
  • 210 blocks/cache
  • need 10 bits to specify this many rows

24
Direct-Mapped Cache Example
  • Tag use remaining bits as tag
  • tag length addr length offset - index
    32 - 4 - 10 bits 18 bits
  • so tag is leftmost 18 bits of memory address
  • Why not full 32 bit address as tag?
  • All bytes within block need same address (4b)
  • Index must be same for every address within a
    block, so its redundant in tag check, thus can
    leave off to save memory (here 10 bits)

25
Accessing data in a direct mapped cache
  • 4 Addresses
  • 0x00000014, 0x0000001C, 0x00000034, 0x00008014
  • 4 Addresses divided (for convenience) into Tag,
    Index, Byte Offset fields

000000000000000000 0000000001 0100 000000000000000
000 0000000001 1100 000000000000000000 0000000011
0100 000000000000000010 0000000001 0100 Tag
Index Offset
26
Fully Associative Cache
  • Fully Associative Cache (e.g., 32 B block)
  • compare tags in parallel

27
Fully Associative Cache (1/2)
  • What does this mean?
  • no rows any block can go anywhere in the cache
  • must compare with all tags in entire cache to see
    if data is there
  • Memory address fields
  • Tag same as before
  • Offset same as before
  • Index non-existent

28
Fully Associative Cache (2/2)
  • Benefit of Fully Assoc Cache
  • No Conflict Misses (since data can go anywhere)
  • Drawbacks of Fully Assoc Cache
  • Need hardware comparator for every single entry
    if we have a 64KB of data in a cache with 4B
    entries, we need 16K comparators infeasible

29
Caching Terminology
  • When we try to read memory, 3 things can happen
  • cache hit cache block is valid and contains
    proper address, so read desired word
  • cache miss nothing in cache in appropriate
    block, so fetch from memory
  • cache miss, block replacement required data not
    in cache some other data in the space fetch
    desired data from memory and replace

30
Block Replacement Policy (1/2)
  • Direct-Mapped Cache index completely specifies
    position which position a block can go in on a
    miss
  • N-Way Set Assoc index specifies a set, but block
    can occupy any position within the set on a miss
  • Fully Associative block can be written into any
    position
  • Question if we have the choice, where should we
    write an incoming block?

31
Block Replacement Policy (2/2)
  • If there are any locations with valid bit off
    (empty), then usually write the new block into
    the first one.
  • If all possible locations already have a valid
    block, we must pick a replacement policy rule by
    which we determine which block gets cached out
    on a miss.

32
Block Replacement Policy LRU
  • LRU (Least Recently Used)
  • Idea cache out block which has been accessed
    (read or write) least recently
  • Pro temporal locality ? recent past use implies
    likely future use in fact, this is a very
    effective policy
  • Con with 2-way set assoc, easy to keep track
    (one LRU bit) with 4-way or greater, requires
    complicated hardware and much time to keep track
    of this

33
Block Replacement Example
  • We have a 2-way set associative cache with a four
    word total capacity and one word blocks. We
    perform the following word accesses (ignore bytes
    for this problem)
  • 0, 2, 0, 1, 4, 0, 2, 3, 5, 4
  • How many hits and how many misses will there be
    for the LRU block replacement policy?

34
Block Replacement Example LRU
loc 0
loc 1
lru
0
  • Addresses 0, 2, 0, 1, 4, 0, ...

0 miss, bring into set 0 (loc 0)
2
2 miss, bring into set 0 (loc 1)
0 hit
1 miss, bring into set 1 (loc 0)
lru
1
4 miss, bring into set 0 (loc 1, replace 2)
0 hit
35
Block Size Tradeoff Conclusions
36
Block Size Tradeoff (1/3)
  • Benefits of Larger Block Size
  • Spatial Locality if we access a given word,
    were likely to access other nearby words soon
  • Very applicable with Stored-Program Concept if
    we execute a given instruction, its likely that
    well execute the next few as well
  • Works nicely in sequential array accesses too

37
Block Size Tradeoff (2/3)
  • Drawbacks of Larger Block Size
  • Larger block size means larger miss penalty
  • on a miss, takes longer time to load a new block
    from next level
  • If block size is too big relative to cache size,
    then there are too few blocks
  • Result miss rate goes up
  • In general, minimize Average Memory Access Time
    (AMAT)
  • Hit Time Miss Penalty x Miss Rate

38
Block Size Tradeoff (3/3)
  • Hit Time time to find and retrieve data from
    current level cache
  • Miss Penalty average time to retrieve data on a
    current level miss (includes the possibility of
    misses on successive levels of memory hierarchy)
  • Hit Rate of requests that are found in
    current level cache
  • Miss Rate 1 - Hit Rate

39
Cache Design Parameters
40
What to do on a write hit?
  • Write-through
  • update the word in cache block and corresponding
    word in memory
  • Write-back
  • update word in cache block
  • allow memory word to be stale
  • Write back later
  • ? add dirty bit to each block indicating that
    memory needs to be updated when block is replaced

41
Write Allocate/No-Write-Allocate
  • If WT strategy, what happens at cache miss? Is
    the block brought to cache at a write?
  • WTNWA - NO
  • WTWA -YES

42
Types of Cache Misses (1/2)
  • Three Cs Model of Misses
  • 1st C Compulsory Misses
  • occur when a program is first started
  • cache does not contain any of that programs data
    yet, so misses are bound to occur
  • cant be avoided easily, so wont focus on these
    in this course

43
Types of Cache Misses (2/2)
  • 2nd C Conflict Misses
  • miss that occurs because two distinct memory
    addresses map to the same cache location
  • two blocks (which happen to map to the same
    location) can keep overwriting each other
  • big problem in direct-mapped caches
  • how do we lessen the effect of these?
  • Dealing with Conflict Misses
  • Solution 1 Make the cache size bigger
  • Fails at some point
  • Solution 2 Multiple distinct blocks can fit in
    the same cache Index?

44
Third Type of Cache Miss
  • Capacity Misses
  • miss that occurs because the cache has a limited
    size
  • miss that would not occur if we increase the size
    of the cache
  • sketchy definition, so just get the general idea
  • This is the primary type of miss for Fully
    Associative caches.

45
  • Average Memory Access Time (AMAT)
  • Hit Time Miss Penalty x Miss
    Rate
  • CPI Ideal CPI (Core CPI) MCPI
  • MCPI Memory CPI

46
Example
  • Assume
  • Hit Time 1 cycle
  • Miss rate 5
  • Miss penalty 20 cycles
  • Calculate AMAT
  • Avg mem access time
  • 1 0.05 x 20
  • 1 1 cycles
  • 2 cycles

47
Cache Area Overhead
  • Cache contains useful data and Tag, Valid bit,
    dirty bit etc.
  • If a cache is described to be16K bytes, often
    16KB is the useful data capacity
  • The cache RAM is often 20 or 24K bytes
  • The amount of area spent as tag depends on the
    mapping strategy and block size
  • Fully assoc means more tag area
  • Small block size means more tag area

48
A Typical Memory Hierarchy
49
A Typical Main Memory Organization
50
DRAM Chip Organization
51
Memory Module Organization
52
Virtual Memory System
53
Another View of the Memory Hierarchy
Regs
Upper Level
Instr. Operands
Faster
Cache
Blocks
L2 Cache
Blocks
Memory
Pages
Disk
Files
Larger
Tape
Lower Level
54
Memory Hierarchy Requirements
  • If Principle of Locality allows caches to offer
    (close to) speed of cache memory with size of
    DRAM memory,then recursively why not use at next
    level to give speed of DRAM memory, size of Disk
    memory?
  • While were at it, what other things do we need
    from our memory system?

55
Virtual Memory
  • Allows OS to share memory, protect programs from
    each other
  • Today, more important for protection vs. just
    another level of memory hierarchy
  • Each process thinks it has all the memory to
    itself
  • Historically, it predates caches

56
Comparing the 2 levels of hierarchy
  • Cache Version Virtual Memory vers.
  • Block or Line Page
  • Miss Page Fault
  • Block Size 32-64B Page Size 4K-8KB
  • Placement Fully AssociativeDirect Mapped,
    N-way Set Associative
  • Replacement Least Recently UsedLRU or
    Random (LRU)
  • Write Thru or Back Write Back

57
Virtual to Physical Addr. Translation
Program operates in its virtual address space
Physical memory (incl. caches)
HW mapping
virtual address (inst. fetch load, store)
physical address (inst. fetch load, store)
  • Each program operates in its own virtual address
    space only program running
  • Each is protected from the other
  • OS can decide where each goes in memory
  • Hardware (HW) provides virtual ? physical mapping

58
Mapping Virtual Memory to Physical Memory
Virtual Memory
  • Divide into equal sizedchunks (about 4 KB - 8 KB)


Stack
  • Any chunk of Virtual Memory assigned to any chuck
    of Physical Memory (page)

Physical Memory
64 MB
0
0
59
Paging Organization (assume 1 KB pages)
Page is unit of mapping
Page also unit of transfer from disk to physical
memory
60
  • Virtual Memory Mapping
  • Use table lookup (Page Table) for mappings
    Page number is index
  • Physical Page Number PageTableVirtual Page
    Number
  • (P.P.N. also called Page Frame)

61
Page Table
  • A page table is an operating system structure
    which contains the mapping of virtual addresses
    to physical locations

62
Address Mapping Page Table
Page Table located in physical memory
63
Paging/Virtual Memory Multiple Processes
User B Virtual Memory
User A Virtual Memory


Physical Memory
Stack
Stack
64 MB
Heap
Heap
Static
Static
0
Code
Code
0
0
64
Virtual Memory Problem 1
  • Map every address ? 1 indirection via Page Table
    in memory per virtual address ? 1 virtual memory
    accesses 2 physical memory accesses ? SLOW!
  • Observation since locality in pages of data,
    there must be locality in virtual address
    translations of those pages
  • Since small is fast, why not use a small cache of
    virtual to physical address translations to make
    translation fast?
  • For historical reasons, cache is called a
    Translation Lookaside Buffer, or TLB

65
Translation Look-Aside Buffers (TLBs)
  • TLBs usually small, typically 128 - 256 entries
  • Like any other cache, the TLB can be direct
    mapped, set associative, or fully associative

hit
PA
miss
VA
TLB Lookup
Cache
Main Memory
Processor
miss
hit
Trans- lation
data
On TLB miss, get page table entry from main memory
66
What if not in TLB?
  • Option 1 Hardware checks page table and loads
    new Page Table Entry into TLB
  • Option 2 Hardware traps to OS, up to OS to
    decide what to do
  • MIPS follows Option 2 Hardware knows nothing
    about page table

67
What if the data is on disk?
  • We load the page off the disk into a free block
    of memory, using a DMA (Direct Memory Access
    very fast!) transfer
  • Meantime we switch to some other process waiting
    to be run
  • When the DMA is complete, we get an interrupt and
    update the process's page table
  • So when we switch back to the task, the desired
    data will be in memory

68
What if we dont have enough memory?
  • We chose some other page belonging to a program
    and transfer it onto the disk if it is dirty
  • If clean (disk copy is up-to-date), just
    overwrite that data in memory
  • We chose the page to evict based on replacement
    policy (e.g., LRU)
  • And update that program's page table to reflect
    the fact that its memory moved somewhere else
  • If continuously swap between disk and memory,
    called Thrashing

69
Virtual Memory Overview (1/4)
  • User program view of memory
  • Contiguous
  • Start from some set address
  • Infinitely large
  • Is the only running program
  • Reality
  • Non-contiguous
  • Start wherever available memory is
  • Finite size
  • Many programs running at a time

70
Virtual Memory Overview (2/4)
  • Virtual memory provides
  • illusion of contiguous memory
  • all programs starting at same set address
  • illusion of infinite memory (232 or 264 bytes)
  • protection

71
Virtual Memory Overview (3/4)
  • Implementation
  • Divide memory into chunks (pages)
  • Operating system controls page table that maps
    virtual addresses into physical addresses
  • Think of memory as a cache for disk
  • TLB is a cache for the page table

72
Virtual Memory Overview (4/4)
  • Lets say were fetching some data
  • Check TLB (input VPN, output PPN)
  • hit fetch translation
  • miss check page table (in memory)
  • Page table hit fetch translation
  • Page table miss page fault, fetch page from disk
    to memory, return translation to TLB
  • Check cache (input PPN, output data)
  • hit return value
  • miss fetch value from memory

73
Overview of Address Translation
74
Virtual Memory System
75
Handling a Page Fault
76
A Typical Page Table Entry
77
Multilevel Forward Page Table
78
Hashed Page Table
79
Memory Hierarchy Implementation
80
Direct Mapped Cache
(a) Single Word Per Block
(a) Multi-Word Per Block
81
Fully Associative Cache
82
Set Associative Cache
83
Translation of Virtual Word Address
84
Translation of Virtual Page Address
85
Direct Mapped TLB
86
Other configurations of TLB
(a) Set Associative TLB
(b) Fully Associative TLB
87
Interaction between TLB and D-cache
88
Virtually Indexed D-cache
89
Input/Output Systems
90
Disk Drive Structures
91
Striping Data in Disk Arrays
92
Placement of Parity Blocks
93
Bus Design Parameters
94
Time Sharing the CPU
Write a Comment
User Comments (0)
About PowerShow.com