Chapter 6: Memory

About This Presentation

Title:

Chapter 6: Memory

Description:

Chapter 6: Memory Memory is organized into a hierarchy Memory near the top of the hierarchy is faster, but also more expensive, so we have less of it in the computer ... – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 26

Provided by: nkuEdufo

Learn more at: https://www.nku.edu

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 6: Memory

1
Chapter 6 Memory

Memory is organized into a hierarchy
Memory near the top of the hierarchy is faster,
but also more expensive, so we have less of it in
the computer this presents a challenge
how do we make use of faster memory without
having to go down the hierarchy to slower memory?

CPU accesses memory at least once per
fetch-execute cycle
Instruction fetch
Possible operand reads
Possible operand write
RAM is much slower than the CPU, so we need a
compromise
Cache
We will explore memory here
RAM, ROM, Cache, Virtual Memory

2
Types of Memory

Cache
SRAM (static RAM) made up of flip-flops (like
Registers)
Slower than registers because of added circuits
to find the proper cache location, but much
faster than RAM
DRAM is 10-100 times slower than SRAM
ROM
Read-only memory contents of memory are fused
into place
Variations
PROM programmable (comes blank and the user can
program it once)
EPROM erasable PROM, where the contents of all
of PROM can be erased by using ultraviolet light
EEPROM electrical fields can alter parts of the
contents, so it is selectively erasable, a newer
variation, flash memory, provides greater speed

RAM
stands for random access memory because you
access into memory by supplying the address
it should be called read-write memory (Cache and
ROMs are also random access memories)
Actually known as DRAM (dynamic RAM) and is built
out of capacitors
Capacitors lose their charge, so must be
recharged often (every couple of milliseconds)
and have destructive reads, so must be recharged
after a read

3
Memory Hierarchy Terms

The goal of the memory hierarchy is to keep the
contents that are needed now at or near the top
of the hierarchy
We discuss the performance of the memory
hierarchy using the following terms
Hit when the datum being accessed is found at
the current level
Miss when the datum being accessed is not found
and the next level of the hierarchy must be
examined
Hit rate how many hits out of all memory
accesses
Miss rate how many misses out of all memory
accesses
NOTE hit rate 1 miss rate, miss rate 1
hit rate
Hit time time to access this level of the
hierarchy
Miss penalty time to access the next level

4
Effective Access Time Formula

We want to determine the impact that the memory
hierarchy has on the CPU
In a pipeline machine, we expect 1 instruction to
leave the pipeline each cycle
the system clock is usually set to the speed of
cache
but a memory access to DRAM takes more time, so
this impacts the CPUs performance
On average, we want to know how long a memory
access takes (whether it is cache, DRAM or
elsewhere)
effective access time hit time miss rate
miss penalty
that is, our memory access, on average, is the
time it takes to access the cache, plus for a
miss, how much time it takes to access memory
With a 2-level cache, we can expand our formula
average memory access time hit time0 miss
rate0 (hit time1 miss rate1 miss penalty1 )
We can expand the formula more to include access
to swap space (hard disk)

5
Locality of Reference

The better the hit rate for level 0, the better
off we are
Similarly, if we use 2 caches, we want the hit
rate of level 1 to be as high as possible
We want to implement the memory hierarchy to
follow Locality of Reference
accesses to memory will generally be near recent
memory accesses and those in the near future will
be around this current access
Three forms of locality
Temporal locality recently accessed items tend
to be accessed again in the near future (local
variables, instructions inside a loop)
Spatial locality accesses tend to be clustered
(accessing ai will probably be followed by
ai1 in the near future)
Sequential locality instructions tend to be
accessed sequentially
How do we support locality of reference?
If we bring something into cache, bring in
neighbors as well
Keep an item in the cache for awhile as we hope
to keep using it

6
Cache

Cache is fast memory
Used to store instructions and data
It is hoped that what is needed will be in cache
and what isnt needed will be moved out of cache
back to memory
Issues
What size cache? How many caches?
How do you access what you need?
since cache only stores part of what is in
memory, we need a mechanism to map from the
memory address to the location in cache
this is known as the caches mapping function
If you have to bring in something new, what do
you discard?
this is known as the replacement strategy
What happens if you write a new value to cache?
we must update the now obsolete value(s) in memory

7
Cache and Memory Organization

Group memory locations into lines (or refill
lines)
For instance, 1 line might store 16 bytes or 4
words
The line size varies architecture-to-architecture
All main memory addresses are broken into two
parts
the line
the location in the line
If we have 256 Megabytes, word accessed, with
word sizes of 4, and 4 words per line, we would
have 16,777,216 lines so our 26 bit address has
24 bits for the line number and 2 bits for the
word in the line
The cache has the same organization but there are
far fewer line numbers (say 1024 lines of 4 words
each)
So the remainder of the address becomes the tag
The tag is used to make sure that the line we
want is the line we found

The valid bit is used to determine if the given
line has been modified or not (is the line in
memory still valid or outdated?)
8
Types of Cache

The mapping function is based on the type of
cache
Direct-mapped each entry in memory has 1
specific place where it can be placed in cache
this is a cheap and easy cache to implement (and
also fast), but since there is no need for a
replacement strategy it has the poorest hit rate
Associative any memory item can be placed in
any cache line
this cache uses associative memory so that an
entry is searched for in parallel this is
expensive and tends to be slower than a
direct-mapped cache, however, because we are free
to place an entry anywhere, we can use a
replacement strategy and thus get the best hit
rate
Set-associative a compromise between these two
extremes
by grouping lines into sets so that a line is
mapped into a given set, but within that set, the
line can go anywhere
a replacement strategy is used to determine which
line within a set should be used, so this cache
improves on the hit rate of the direct-mapped
cache
while not being as expensive or as slow as the
associative cache

9
Direct Mapped Cache

Assume m refill lines
A line j in memory will be found in cache at
location j mod m
Since each line has 1 and only 1 location in
cache, there is no need for a replacement
strategy
This yields poor hit rate but fast performance
(and cheap)
All addresses are broken into 3 parts
a line number (to determine the line in cache)
a word number
the rest is the tag compare the tag to make
sure you have the right line

Assume 24 bit addresses, if the cache has 16384
lines, each storing 4 words, then we have the
following
10
(No Transcript)
11
Associative Cache

Any line in memory can be placed in any line in
cache
No line number portion of the address, just a tag
and a word within the line
Because the tag is longer, more tag storage space
is needed in the cache, so these caches need more
space and so are more costly
All tags are searched simultaneously using
associative memory to find the tag requested
This is both more expensive and slower than
direct-mapped caches but, because there are
choices of where to place a new line, associative
caches require a replacement strategy which might
require additional hardware to implement

Notice how big the tag is, our cache now requires
more space to store more tag space!
From our previous example, our address now looks
like this
12
Set Associative Cache

In order to provide some degree of variability in
placement, we need more than a direct-mapped
cache
A 2-way set associative cache provides 2 refill
lines for each line number
Instead of n refill lines, there are now n / 2
sets, each set storing 2 refill lines
We can think of this as having 2 direct-mapped
caches of half the size
Because there are ½ as many refill lines, the
line number has 1 fewer bits and the tag number
has 1 more

We can expand this to
4-way set associative
8-way set associative
16-way set associative, etc
As the number increases, the hit rate improves,
but the expense also increases and the hit time
gets worse
Eventually we reach an n-way cache, which is a
fully associative cache

13
(No Transcript)
14
Replacement And Write Strategies

When we need to bring in a new line from memory,
we will have to throw out a line
Which one?
No choice in a direct-mapped cache
For associative and set-associative, we have
choices
We rely on a replacement strategy to make the
best choice
this should promote locality of reference
3 replacement strategies are
Least recently used (hard to implement, how do we
determine which line was least recently used?)
First-in first out (easy to implement, but not
very good results)
Random

If we are to write a datum to cache, what about
writing it to memory?
Write-through write to both cache and memory at
the same time
if we write to several data in the same line
though, this becomes inefficient
Write-back wait until the refill line is being
discarded and write back any changed values to
memory at that time
This causes stale or dirty values in memory

15
Virtual Memory

Just as DRAM acts as a backup for cache, hard
disk (known as the swap space) acts as a backup
for DRAM
This is known as virtual memory
Virtual memory is necessary because most programs
are too large to store entirely in memory
Also, there are parts of a program that are not
used very often, so why waste the time loading
those parts into memory if they wont be used?
Page a fixed sized unit of memory all
programs and data are broken into pages
Paging the process of bringing in a page when
it is needed (this might require throwing a page
out of memory, moving it back to the swap disk)
The operating system is in charge of Virtual
Memory for us
it moves needed pages into memory from disk and
keeps track of where a specific page is placed

16
The Paging Process

When the CPU generates a memory address, it is a
logical (or virtual) address
The first address of a program is 0, so the
logical address is merely an offset into the
program or into the data segment
For instance, address 25 is located 25 from the
beginning of the program
But 25 is not the physical address in memory, so
the logical address must be translated (or
mapped) into a physical address
Assume memory is broken into fixed size units
known as frames (1 page fits into 1 frame)
We know the logical address as its page and the
offset into the page
We have to translate the page into the frame
(that is, where is that particular page currently
be stored in memory or is it even in memory?)
Thus, the mapping process for paging means
finding the frame and replacing the page with
it

17
Example of Paging
Here, we have a process of 8 pages but only 4
physical frames in memory therefore we must
place a page into one of the available frames in
memory whenever a page is needed At this point
in time, pages 0, 3, 4 and 7 have been moved into
memory at frames 2, 0, 1 and 3 respectively This
information (of which page is stored in which
frame) is stored in memory in a location known as
the Page Table. The page table also stores
whether the given page has been modified (the
valid bit much like our cache)
18
A More Complete Example
Virtual address mapped to physical address
the page table
Address 1010 is page 101, item 0 Page 101 (5)
is located in frame 11 (3) so the item 1010 is
found at 110
Logical and physical memory for our program
19
Page Faults

Just as cache is limited in size, so is main
memory a process is usually given a limited
number of frames
What if a referenced page is not currently in
memory?
The memory reference causes a page fault
The page fault requires that the OS handle the
problem
The process status is saved and the CPU switches
to the OS
The OS determines if there is an empty frame for
the referenced page, if not, then the OS uses a
replacement strategy to select a page to discard
if that page is dirty, then the page must be
written to disk instead of discarded
The OS locates the requested page on disk and
loads it into the appropriate frame in memory
The page table is modified to reflect the change
Page faults are time consuming because of the
disk access this causes our effective memory
access time to deteriorate badly!

20
Another Paging Example
Here, we have 13 bits for our addresses even
though main memory is only 4K 212
21
The Full Paging Process
We want to avoid memory accesses (we prefer cache
accesses) but if every memory access now
requires first accessing the page table, which is
in memory, it slows down our computer So we move
the most used portion of the page table into a
special cache known as the Table Lookaside Buffer
or Translation Lookaside Buffer, abbrev. as the
TLB The process is also shown in the next slide
as a flowchart
22
(No Transcript)
23
A Variation Segmentation

One flaw of paging is that, because a page is
fixed in size, a chunk of code might be divided
into two or more pages
So page faults can occur any time
Consider, as an example, a loop which crosses 2
pages
If the OS must remove one of the two pages to
load the other, then the OS generates 2 page
faults for each loop iteration!
A variation of paging is segmentation
instead of fixed size blocks, programs are
divided into procedural units equal to their size
We subdivide programs into procedures
We subdivide data into structures (e.g., arrays,
structs)
We still use the on-demand approach of virtual
memory, but when a block of code is loaded into
memory, the entire needed block is loaded in
Segmentation uses a segment table instead of a
page table and works similarly although addresses
are put together differently
But segmentation causes fragmentation when a
segment is discarded from memory for a new
segment, there may be a chunk of memory that goes
unused
One solution to fragmentation is to use paging
with segmentation

24
Effective Access With Paging

We modify our previous formula to include the
impact of paging
effective access time hit time0 miss rate0
(hit time1 miss rate1 (hit time2 miss rate2
miss penalty2))
Level 0 is on-chip cache
Level 1 is off-chip cache
Level 2 is main memory
Level 3 is disk (miss penalty2 is disk access
time, which is lengthy)
Example
On chip cache hit rate is 90, hit time is 5 ns,
off chip cache hit rate is 96, hit time is 10
ns, main memory hit rate is 99.8, hit time is 60
ns, memory miss penalty is 10 ms 10,000 ns
memory miss penalty is the same as the disk hit
time, or disk access time
Access time 5 ns .10 (10 ns .04 (60 ns
.002 10,000 ns)) 6.32 ns
So our memory hierarchy adds over 20 to our
memory access

25
Memory Organization
Here we see a typical memory layout Two on-chip
caches one for data, one for instructions with
part of each cache Reserved for a TLB One
off-chip cache to back-up both on-chip
caches Main memory, backed up by virtual memory

Write a Comment

User Comments (0)