Title: Computer Organization
1Computer Organization
- CSC 405
- Multi-Level Memory
2Memory Hierarchies
CPU
Registers
Cache
Main Memory
Secondary Storage
very fast SRAM tiny 128 byte-4kb
very fast SRAM small 32kb-4Mb
fast DRAM large 4Mb-512Mb
very fast mag/optical very large 100Mb-16Gb
The limitations of technology prevent memory from
being at once cheap, fast and large. The tact
that memory requests are non-random provide the
opportunity to significantly enhance performance
in a memory hierarchy.
3Multi-Level Memory Organization Multi-Level
Memory are used to improve performance in
computer systems. We want to have transparent
(no user/programmer management required) large
high-speed memory. Very large memory has
historically been slower for a number of reasons.
The cost of memory increases with speed of
access and the speed of access is a function of
the size of memory. Set associative memory
access is very fast for cache memory mainly
because the size of the cache is small compared
to main memory (RAM). As machine speeds
increase we are quickly approaching a speed limit
will will not likely be able to break. This is
the speed of light (3x108 M/s). Approximately
how far does light travel in a nanosecond
(1x10-9)? This distance (30 cm) is the popular
notion for the speed of electrons in a wire, but
the drift velocity of electrons is only about
1/10 the speed of light. Therefore, we should
not expect to obtain access times that would
require electrons to move farther than 3 cm in a
nanosecond. Currently, the best multi-level
memory access times are 2nsec for L1 cache
(memory on the CPU), 4nsec for L2 cache, and
10nsec for RAM (due to the system bus).
These numbers will be outdated by the time
this page is published.
4By now you should be working diligently on the
following homework problem A 450MHz Pentium with
32KBytes L1 cache, 128MBytes RAM, and a 133MHz
system bus runs a program with an average working
set size of 80KBytes. While in a working set the
program has a 0.9997 probability that the next
memory request will be from this working set and
a 0.9 probability that the next memory request
will be the next instruction/data value in memory
(i.e. 10 of the time a request is from a random
memory address in the working set). (Note when
the program changes working sets, it will begin
making memory requests from the new working set
with 0.9997 probability.) Your task... (1)
Determine how much (if any) performance
improvement could be achieved by adding a
256KByte L2 (access speed 450/2MHz) to the
processor. (2) Determine what size memory blocks
should be moved between cache and RAM. (3) Give
an outline of a memory caching strategy that
makes sense. It is strongly recommended that you
take the time to investigate the details of cache
memory, especially the operation of the cache
controller chip-set. As a guide, try to answer
the following questions about the operation of
the Pentium II/III and the related cache
controller. When there is a cache miss, how many
words of memory does the CPU need to be able to
continue processing? When the cache memory is
replaced, how many words are transferred? (assume
4K). What hardware component is responsible for
the transfer of blocks of memory to cache? Given
the stats listed in the problem, what will happen
to the hit-ratio while a new block of memory is
being loaded into cache?
5Effective Latency
A useful performance parameter is the effective
latency. If the needed word is found in a level
of the hierarchy, it is a hit if a request must
be sent to the next lower level, the request is
said to miss. If the latency Lhit is known in
the case of a hit and the latency in the case of
a miss is Lmiss, the effective latency for that
level in the hierarcy can be determined from the
hit ratio Hratio
Cache
Main Memory
Secondary Storage
6Bandwidth
Another measure of performance is bandwidth which
is the rate at which information can be
transferred from the memory system. If R is the
number of requests that the memory can service
simultaneously, then
For processors that can produce multiple memory
requests, it is important not only to reduce
latency but also to increase bandwidth by
designing a memory system that is capable of
servicing multiple requests simultaneously.
Memory hierarchies provide decreased average
latency and reduced band-width requirements,
whereas parallel or interleaved memories provide
higher bandwidth.
7Components of Cache Memory
The basic unit of construction of a semiconductor
memory system is a module or bank. A single bank
can service only one request at a time.
The time that a bank is busy servicing a request
is called the bank busy time. Caches have much
shorter bank busy times than do main memory banks.
Cache contains redundant copies of portions of
the address space which is wholly contained in
the main memory.
Cache often comprises two memories the data
memory and the tag memory, as shown. The address
of each cache line in data memory is stored in
the tag memory along with the state.
Cache memory must be content-addressable and all
tags must be compared concurrently in order to
achieve the low latency which is the point of
cache memory.
Cache can be simplified by direct mapping each
memory location to a single location in the
cache. This simplification is achieved at the
cost of a lower hit ratio.
8Methods of Memory Allocation
Fixed Partition Allocation - each job is
allocated the same amount of memory. Jobs larger
than the allocated space split into a number of
segments or overlays and brought into real memory
as needed during program execution. Jobs smaller
than the allocated space leave portions of the
memory unused. Variable Partition Allocation -
each job is allocated exactly the amount of space
it needs to run to completion. There could still
be a limit in size forcing the use of overlays
for large jobs.
9Memory Fragmentation
Although the use of variable-sized partitions
improves memory utilization, memory fragments can
still develop as jobs come and go in a variable
partition memory system.
memory compaction - this is the process of moving
jobs in memory together in order to open
contiguous blocks of memory large enough to be
useful. coalescing holes - scanning memory in
order to recogize that adjacent blocks of
available memory are contig-uous and to refine
them as single blocks.
compaction
coalescing
10Job Placement Strategies
First Fit - place the job in the first available
memory location large enough to hold it. Best
Fit - place the job in the available memory block
whose size is closest to the job size Worst Fit
- place the job in the available memory block
whose size is the most different (largest).
new job
11Basic Concepts of Virtual Storage
In general, virtual memory management is the
function of the OS that permits a program which
is larger than available memory to operate
without additional control or consideration of
the user or application programmer. The key to
controlling virtual storage is disassociating the
addresses referenced in a running process form
the addresses available in primary memory.
real storage
secondary storage
process in secondary storage
virtual storage
memory block allocated to process
12Pure Paging
An OS that supports only fixed blocksize virtual
memory management is called a pure paging system.
A virtual address in a pure paging system is an
ordered pair (p,d) where p is the page number and
d is the displacement within the page.
base address of page map table
page
displacement
b
p
d
virtual address
secondary storage address
p
p
d
real address
r
s
p
page reference bit
How many entries are there in the page map table?
page frame
page map table
13Paging with Associative Mapping
b
virtual address
p
d
implemented in hardware
associative map (AM)
try these first
p
p
p
p
d
use PMT if miss in AM
real address
page map table (PMT)
14Segmentation
If an OS allocates arbitrary sized memory blocks
to suit the needs of processes we refer to this
process as segmentation. In a virtual
segmentation system, a virtual address is an
ordered pair (s,d) where s is the segment number
and d is the displacment within that segment.
Only processes with their current segments in
primary memory may run.
virtual address
segment
displacement
b
s
d
Since segment sizes are set by the program size
we cannot simply transfer the displacement
address bits to the low-order end of the real
addess. The address of the start of the segment
in real memory (s) must be added to the
displacement d.
s
segment map table
real address
15Pentium-II Memory-Management With 32-bit
addressing the Pentium processor has been
equipped with sophisticated memory management
hardware and OS software similar to that provided
in larger scale computer systems.
The Pentium II includes hardware support for both
paging and segmentation. These approaches are
selectable, allowing the implementation of four
differenent memory management schemes
segmentation
paging
The virtual addresss is the same as the physical
address. This is useful in low-complexity,
hight-performance controller applications.
Memory is viewed as a paged linear address space.
This is favored by some operating system such as
Berkeley UNIX.
Memory is viewed as a collection of locial
address spaces, which quarantees that the segment
map table is in-cache with the segment.
Segmentation is used to define logical memory
partitions, and paging is used to manage memory
allocation within segments.
16Pentium II Segmentation
2 bit protection
virtual address
segment
displacement
b
14 bits
32 bits
Segmentation increases addressable memory from
4Gbytes (232) up to 64Terabytes (246). Virtual
address space is divided into two parts. Half of
virtual memory is global and half is local and
distict for each processor. Each segment is
protected by privelege levels and an access
attribute. Four levels of privilege are possible
from 00most protected to 11least protected.
s
segment map table
real address
The access attribute regulates access to data
segments by giving read-write or read-only
access. For program segments, access is limited
to read-execute or read-only.
17Pentium II Paging In the Pentium II, the paging
mechanism is a two-level table lookup operation.
The first level is a page directory, containing
up to 1K entries which partitions the 4Gbyte
memory space into 1K page groups (4Mbytes in
size), each with its own page table. Each PMT
contains up to 1024 entries corresponding to
4Kbyte pages.
base address of directory
page
pmt
displacement
b
10 bits
12 bits
10 bits
virtual address
secondary storage address
p
p
d
real address
r
s
p
A translation-lookaside buffer holds up to 32
page table entries giving faster access to the
most recently used addresses.
page reference bit
page frame
page map table