Title: Chapter 6: Memory
1Chapter 6 Memory
- Memory is organized into a hierarchy
- Memory near the top of the hierarchy is faster,
but also more expensive, so we have less of it in
the computer this presents a challenge - how do we make use of faster memory without
having to go down the hierarchy to slower memory?
- CPU accesses memory at least once per
fetch-execute cycle - Instruction fetch
- Possible operand reads
- Possible operand write
- RAM is much slower than the CPU, so we need a
compromise - Cache
- We will explore memory here
- RAM, ROM, Cache, Virtual Memory
2Types of Memory
- Cache
- SRAM (static RAM) made up of flip-flops (like
Registers) - Slower than registers because of added circuits
to find the proper cache location, but much
faster than RAM - DRAM is 10-100 times slower than SRAM
- ROM
- Read-only memory contents of memory are fused
into place - Variations
- PROM programmable (comes blank and the user can
program it once) - EPROM erasable PROM, where the contents of all
of PROM can be erased by using ultraviolet light - EEPROM electrical fields can alter parts of the
contents, so it is selectively erasable, a newer
variation, flash memory, provides greater speed
- RAM
- stands for random access memory because you
access into memory by supplying the address - it should be called read-write memory (Cache and
ROMs are also random access memories) - Actually known as DRAM (dynamic RAM) and is built
out of capacitors - Capacitors lose their charge, so must be
recharged often (every couple of milliseconds)
and have destructive reads, so must be recharged
after a read
3Memory Hierarchy Terms
- The goal of the memory hierarchy is to keep the
contents that are needed now at or near the top
of the hierarchy - We discuss the performance of the memory
hierarchy using the following terms - Hit when the datum being accessed is found at
the current level - Miss when the datum being accessed is not found
and the next level of the hierarchy must be
examined - Hit rate how many hits out of all memory
accesses - Miss rate how many misses out of all memory
accesses - NOTE hit rate 1 miss rate, miss rate 1
hit rate - Hit time time to access this level of the
hierarchy - Miss penalty time to access the next level
4Effective Access Time Formula
- We want to determine the impact that the memory
hierarchy has on the CPU - In a pipeline machine, we expect 1 instruction to
leave the pipeline each cycle - the system clock is usually set to the speed of
cache - but a memory access to DRAM takes more time, so
this impacts the CPUs performance - On average, we want to know how long a memory
access takes (whether it is cache, DRAM or
elsewhere) - effective access time hit time miss rate
miss penalty - that is, our memory access, on average, is the
time it takes to access the cache, plus for a
miss, how much time it takes to access memory - With a 2-level cache, we can expand our formula
- average memory access time hit time0 miss
rate0 (hit time1 miss rate1 miss penalty1 ) - We can expand the formula more to include access
to swap space (hard disk)
5Locality of Reference
- The better the hit rate for level 0, the better
off we are - Similarly, if we use 2 caches, we want the hit
rate of level 1 to be as high as possible - We want to implement the memory hierarchy to
follow Locality of Reference - accesses to memory will generally be near recent
memory accesses and those in the near future will
be around this current access - Three forms of locality
- Temporal locality recently accessed items tend
to be accessed again in the near future (local
variables, instructions inside a loop) - Spatial locality accesses tend to be clustered
(accessing ai will probably be followed by
ai1 in the near future) - Sequential locality instructions tend to be
accessed sequentially - How do we support locality of reference?
- If we bring something into cache, bring in
neighbors as well - Keep an item in the cache for awhile as we hope
to keep using it
6Cache
- Cache is fast memory
- Used to store instructions and data
- It is hoped that what is needed will be in cache
and what isnt needed will be moved out of cache
back to memory - Issues
- What size cache? How many caches?
- How do you access what you need?
- since cache only stores part of what is in
memory, we need a mechanism to map from the
memory address to the location in cache - this is known as the caches mapping function
- If you have to bring in something new, what do
you discard? - this is known as the replacement strategy
- What happens if you write a new value to cache?
- we must update the now obsolete value(s) in memory
7Cache and Memory Organization
- Group memory locations into lines (or refill
lines) - For instance, 1 line might store 16 bytes or 4
words - The line size varies architecture-to-architecture
- All main memory addresses are broken into two
parts - the line
- the location in the line
- If we have 256 Megabytes, word accessed, with
word sizes of 4, and 4 words per line, we would
have 16,777,216 lines so our 26 bit address has
24 bits for the line number and 2 bits for the
word in the line - The cache has the same organization but there are
far fewer line numbers (say 1024 lines of 4 words
each) - So the remainder of the address becomes the tag
- The tag is used to make sure that the line we
want is the line we found
The valid bit is used to determine if the given
line has been modified or not (is the line in
memory still valid or outdated?)
8Types of Cache
- The mapping function is based on the type of
cache - Direct-mapped each entry in memory has 1
specific place where it can be placed in cache - this is a cheap and easy cache to implement (and
also fast), but since there is no need for a
replacement strategy it has the poorest hit rate - Associative any memory item can be placed in
any cache line - this cache uses associative memory so that an
entry is searched for in parallel this is
expensive and tends to be slower than a
direct-mapped cache, however, because we are free
to place an entry anywhere, we can use a
replacement strategy and thus get the best hit
rate - Set-associative a compromise between these two
extremes - by grouping lines into sets so that a line is
mapped into a given set, but within that set, the
line can go anywhere - a replacement strategy is used to determine which
line within a set should be used, so this cache
improves on the hit rate of the direct-mapped
cache - while not being as expensive or as slow as the
associative cache
9Direct Mapped Cache
- Assume m refill lines
- A line j in memory will be found in cache at
location j mod m - Since each line has 1 and only 1 location in
cache, there is no need for a replacement
strategy - This yields poor hit rate but fast performance
(and cheap) - All addresses are broken into 3 parts
- a line number (to determine the line in cache)
- a word number
- the rest is the tag compare the tag to make
sure you have the right line
Assume 24 bit addresses, if the cache has 16384
lines, each storing 4 words, then we have the
following
10(No Transcript)
11Associative Cache
- Any line in memory can be placed in any line in
cache - No line number portion of the address, just a tag
and a word within the line - Because the tag is longer, more tag storage space
is needed in the cache, so these caches need more
space and so are more costly - All tags are searched simultaneously using
associative memory to find the tag requested - This is both more expensive and slower than
direct-mapped caches but, because there are
choices of where to place a new line, associative
caches require a replacement strategy which might
require additional hardware to implement
Notice how big the tag is, our cache now requires
more space to store more tag space!
From our previous example, our address now looks
like this
12Set Associative Cache
- In order to provide some degree of variability in
placement, we need more than a direct-mapped
cache - A 2-way set associative cache provides 2 refill
lines for each line number - Instead of n refill lines, there are now n / 2
sets, each set storing 2 refill lines - We can think of this as having 2 direct-mapped
caches of half the size - Because there are ½ as many refill lines, the
line number has 1 fewer bits and the tag number
has 1 more
- We can expand this to
- 4-way set associative
- 8-way set associative
- 16-way set associative, etc
- As the number increases, the hit rate improves,
but the expense also increases and the hit time
gets worse - Eventually we reach an n-way cache, which is a
fully associative cache
13(No Transcript)
14Replacement And Write Strategies
- When we need to bring in a new line from memory,
we will have to throw out a line - Which one?
- No choice in a direct-mapped cache
- For associative and set-associative, we have
choices - We rely on a replacement strategy to make the
best choice - this should promote locality of reference
- 3 replacement strategies are
- Least recently used (hard to implement, how do we
determine which line was least recently used?) - First-in first out (easy to implement, but not
very good results) - Random
- If we are to write a datum to cache, what about
writing it to memory? - Write-through write to both cache and memory at
the same time - if we write to several data in the same line
though, this becomes inefficient - Write-back wait until the refill line is being
discarded and write back any changed values to
memory at that time - This causes stale or dirty values in memory
15Virtual Memory
- Just as DRAM acts as a backup for cache, hard
disk (known as the swap space) acts as a backup
for DRAM - This is known as virtual memory
- Virtual memory is necessary because most programs
are too large to store entirely in memory - Also, there are parts of a program that are not
used very often, so why waste the time loading
those parts into memory if they wont be used? - Page a fixed sized unit of memory all
programs and data are broken into pages - Paging the process of bringing in a page when
it is needed (this might require throwing a page
out of memory, moving it back to the swap disk) - The operating system is in charge of Virtual
Memory for us - it moves needed pages into memory from disk and
keeps track of where a specific page is placed
16The Paging Process
- When the CPU generates a memory address, it is a
logical (or virtual) address - The first address of a program is 0, so the
logical address is merely an offset into the
program or into the data segment - For instance, address 25 is located 25 from the
beginning of the program - But 25 is not the physical address in memory, so
the logical address must be translated (or
mapped) into a physical address - Assume memory is broken into fixed size units
known as frames (1 page fits into 1 frame) - We know the logical address as its page and the
offset into the page - We have to translate the page into the frame
(that is, where is that particular page currently
be stored in memory or is it even in memory?) - Thus, the mapping process for paging means
finding the frame and replacing the page with
it
17Example of Paging
Here, we have a process of 8 pages but only 4
physical frames in memory therefore we must
place a page into one of the available frames in
memory whenever a page is needed At this point
in time, pages 0, 3, 4 and 7 have been moved into
memory at frames 2, 0, 1 and 3 respectively This
information (of which page is stored in which
frame) is stored in memory in a location known as
the Page Table. The page table also stores
whether the given page has been modified (the
valid bit much like our cache)
18A More Complete Example
Virtual address mapped to physical address
the page table
Address 1010 is page 101, item 0 Page 101 (5)
is located in frame 11 (3) so the item 1010 is
found at 110
Logical and physical memory for our program
19Page Faults
- Just as cache is limited in size, so is main
memory a process is usually given a limited
number of frames - What if a referenced page is not currently in
memory? - The memory reference causes a page fault
- The page fault requires that the OS handle the
problem - The process status is saved and the CPU switches
to the OS - The OS determines if there is an empty frame for
the referenced page, if not, then the OS uses a
replacement strategy to select a page to discard - if that page is dirty, then the page must be
written to disk instead of discarded - The OS locates the requested page on disk and
loads it into the appropriate frame in memory - The page table is modified to reflect the change
- Page faults are time consuming because of the
disk access this causes our effective memory
access time to deteriorate badly!
20Another Paging Example
Here, we have 13 bits for our addresses even
though main memory is only 4K 212
21The Full Paging Process
We want to avoid memory accesses (we prefer cache
accesses) but if every memory access now
requires first accessing the page table, which is
in memory, it slows down our computer So we move
the most used portion of the page table into a
special cache known as the Table Lookaside Buffer
or Translation Lookaside Buffer, abbrev. as the
TLB The process is also shown in the next slide
as a flowchart
22(No Transcript)
23A Variation Segmentation
- One flaw of paging is that, because a page is
fixed in size, a chunk of code might be divided
into two or more pages - So page faults can occur any time
- Consider, as an example, a loop which crosses 2
pages - If the OS must remove one of the two pages to
load the other, then the OS generates 2 page
faults for each loop iteration! - A variation of paging is segmentation
- instead of fixed size blocks, programs are
divided into procedural units equal to their size - We subdivide programs into procedures
- We subdivide data into structures (e.g., arrays,
structs) - We still use the on-demand approach of virtual
memory, but when a block of code is loaded into
memory, the entire needed block is loaded in - Segmentation uses a segment table instead of a
page table and works similarly although addresses
are put together differently - But segmentation causes fragmentation when a
segment is discarded from memory for a new
segment, there may be a chunk of memory that goes
unused - One solution to fragmentation is to use paging
with segmentation
24Effective Access With Paging
- We modify our previous formula to include the
impact of paging - effective access time hit time0 miss rate0
(hit time1 miss rate1 (hit time2 miss rate2
miss penalty2)) - Level 0 is on-chip cache
- Level 1 is off-chip cache
- Level 2 is main memory
- Level 3 is disk (miss penalty2 is disk access
time, which is lengthy) - Example
- On chip cache hit rate is 90, hit time is 5 ns,
off chip cache hit rate is 96, hit time is 10
ns, main memory hit rate is 99.8, hit time is 60
ns, memory miss penalty is 10 ms 10,000 ns - memory miss penalty is the same as the disk hit
time, or disk access time - Access time 5 ns .10 (10 ns .04 (60 ns
.002 10,000 ns)) 6.32 ns - So our memory hierarchy adds over 20 to our
memory access
25Memory Organization
Here we see a typical memory layout Two on-chip
caches one for data, one for instructions with
part of each cache Reserved for a TLB One
off-chip cache to back-up both on-chip
caches Main memory, backed up by virtual memory