Title: Internal Memory
1William Stallings Computer Organization and
Architecture
Chapter 4 Internal Memory
2The four-level memory hierarchy
?Computer memory is organized into a
hierarchy. ?Decreasing cost/bit, increasing
capacity, slower access time, and decreasing
frequency of access of the memory by the
processor ?The cache automatically retains a copy
of some of the recently used words from the DRAM.
3Memory Hierarchy
- Registers
- In CPU
- Internal or Main memory
- May include one or more levels of cache
- RAM
- External memory
- Backing store
44.1 COMPUTER MEMORY SYSTEM OVERVIEW
- Characteristics of Memory Systems
- Location
- Capacity
- Unit of transfer
- Access method
- Performance
- Physical type
- Physical characteristics
- Organisation
5Location
- The term location refers to whether memory is
internal or external to the computer. - CPU
- The processor requires its own local memory , in
the form of registers. - Internal
- Main memory, cache
- External
- Peripheral storage devices, such as disk and tape
6Capacity
- Internal memory capacity typically expressed in
terms of bytes(1byte8bits)or words. - External memory capacity expressed in bytes.
- Word
- The natural unit of organisation
- Word length usually 8, 16 and 32 bits
- The size of the word is typically equal to the
number of bits used to represent a number and to
the instruction length. Unfortunately, there are
many exceptions.
7Unit of Transfer
- Internal
- Usually governed by data bus width
- External
- Usually a block which is much larger than a word
- Addressable unit
- Smallest location which can be uniquely addressed
- At the word level or byte level
- In any case,
- 2AN, A is the length in bits of an address
- N is the number of addressable units
8Access Methods (1)
- Sequential access
- Start at the beginning and read through in order
- Access time depends on location of data and
previous location - variable
- e.g. tape
- Direct access
- Individual blocks have unique address
- Access is by jumping to vicinity plus sequential
search - Access time depends on location and previous
location - variable
- e.g. disk
9Access Methods (2)
- Random
- Individual addresses identify locations exactly
- Access time is independent of location or
previous access and is constant - e.g. RAM
- Associative
- Data is located by a comparison with contents of
a portion of the store - Access time is independent of location or
previous access and is constant - e.g. cache
10Performance Parameters
- Access time
- For random-access memory
- the time it takes to perform a read or write
operation. - Time between presenting the address to the memory
and getting the valid data - For non-random-access memory
- The time it takes to position the read-write
mechanism at the desired location. - Memory Cycle time
- Cycle time is access time plus additional time
- Time may be required for the memory to recover
before next access - Transfer Rate
- Rate at which data can be moved
11Physical Types
- Semiconductor
- RAM
- Magnetic
- Disk Tape
- Optical
- CD (Compact Disk) DVD (Digital Video Disk)
- Others
- Bubble
- Hologram
12Physical Characteristics
- Decay
- Volatility
- In a volatile memory, information decays
naturally or is lost when electrical power is
switched off. - In a nonvolatile memory, no electrical power is
needed to retain information, e.g.
magnetic-surface memory. - Erasable
- Power consumption
13Organisation
- Organisation means physical arrangement of bits
into words - Obvious arrangement not always used
14Memory Hierarchy
- Registers
- In CPU
- Internal or Main memory
- May include one or more levels of cache
- RAM
- External memory
- Backing store
15The Bottom Line
- The design constraints on a computers memory
- How much?
- Capacity
- How fast?
- Time is money
- How expensive?
- A trade-ff among the three key characteristics
of memory cost, capacity, and access time.
16Hierarchy List
- Registers
- L1 Cache
- L2 Cache
- Main memory
- Disk cache
- Disk
- Optical
- Tape
17Hierarchy List
- Across this spectrum of technologies
- Faster access time, greater cost per bit
- Greater capacity, smaller cost per bit
- Greater capacity, slower access time
- From top to down
- Decreasing cost per bit
- Increasing capacity
- Increasing access time
- Decreasing frequency of access of the memory by
the processor
18So you want fast?
- It is possible to build a computer which uses
only static RAM (see later) - This would be very fast
- This would need no cache
- How can you cache cache?
- This would cost a very large amount
19Locality of Reference
- During the course of the execution of a program,
memory references tend to cluster - e.g. loops and subroutines
- Main memory is usually extended with a
higher-speed, smaller cache. It is a device for
staging the movement of data between main memory
and processor registers to improve performance. - External memory, called Secondary or auxiliary
memory are used to store program and data files
and visible to the programmer only in terms of
files and records.
204.2 Semiconductor Main Memory
- Table 4.2 Semiconductor Memory Types
21Types of Random-Access Semiconductor Memory
- RAM
- Misnamed as all semiconductor memory is random
access, because all of the types listed in the
table are random access. - Read/Write
- Volatile
- A RAM must be provided with a constant power
supply. - Temporary storage
- Static or dynamic
22Dynamic RAM (DRAM)
- Bits stored as charge in capacitors
- Charges leak
- Need refreshing even when powered
- Simpler construction
- Smaller per bit
- Less expensive
- Need refresh circuits
- Slower
- Main memory
23Static RAM (SRAM)
- Bits stored as on/off switches
- No charges to leak
- No refreshing needed when powered
- More complex construction
- Larger per bit
- More expensive
- Does not need refresh circuits
- Faster
- Cache
24Read Only Memory (ROM)
- Permanent storage
- Applications
- Microprogramming (see later)
- Library subroutines
- Systems programs (BIOS)
- Function tables
25Types of ROM
- Written during manufacture
- Very expensive for small runs
- Programmable (once)
- PROM
- Needs special equipment to program
- Read mostly
- Erasable Programmable (EPROM)
- Erased by UV
- Electrically Erasable (EEPROM)
- Takes much longer to write than read
- Flash memory
- It is intermediate between EPROM and EEPROM in
both cost and functionality. - Erase whole memory electrically or erase blocks
of memory
26Organisation in detail
- Memory cell
- The basic element of a semiconductor memory
- Two stable states
- being written into to set the state, or being
read to sense the state - Chip Logic
- One extreme organization the physical
arrangement of cells in the array is the same as
the logical arrangement. - The array is organized into W words of B bits
each. - e.g. A 16Mbit chip can be organised as 1M 16-bit
words - One-bit-per-chip in which data is read/written
one bit at a time - A bit per chip system has 16 lots of 1Mbit chip
with bit 1 of each word in chip 1 and so on
27Chip Logic
- Typical organization of a 16-Mbit DRAM
- A 16Mbit chip can be organised as a 2048 x 2048 x
4bit array - Reduces number of address pins
- Multiplex row address and column address
- 11 pins to address (2112048)
- An additional 11 address lines select one of 2048
columns of 4bits per column. Four data lines are
for the input and output of 4 bits to and from a
data buffer. On write, the bit driver of each bit
line is activated for a 1 or 0 according to the
value of the corresponding data line. On read,
the value of each bit line selects which row of
cells is used for reading or writing. - Adding one more pin devoted to addressing doubles
the number of rows and columns, and so the size
of the chip memory grows by a factor 4.
28Typical 16 Mb DRAM (4M x 4)
29Refreshing
- Refresh circuit included on chip
- Disable chip
- Count through rows
- Read Write back
- Takes time
- Slows down apparent performance
30Chip Packaging
EPROM package , which is a one-word-per-chip,
8-Mbit chip organized as 1M8 The address of the
word being accessed . For 1M words, a total of 20
pins (2201M) are needed. D0D7 The power
supply to the chip (VCC) A ground pin (Vss) A
chip enable (CE) pin the CE pin is used to
indicate whether or not the address is valid for
this chip. A program voltage (Vpp)
31DRAM package, 16-Mbit chip organized as 4M4 RAM
chip can be updated, the data pins are
input/output different from ROM chip Write
Enable pin (WE) Output Enable pin (OE) Row
Address Select (RAS) Column Address Select (CAS)
32Module Organisation
If a RAM chip contain only 1bit per word,
clearly a number of chips equal to the number of
bits per words are needed. e.g. How a memory
module consisting of 256K 8-bit words could be
organized? 256K218, an 18-bit address
needed The address is presented to 8
256K1-bit chips, each of which provides the
input/output of 1 bit.
Figure 4.6 256kbyte memory Organization
33Module Organisation (2)
Figure 4.7 1-Mbyte Memory Organization
34- (1M8bit/256K8bit)422
- As show in figure 4.7, 1M word by 8bits per
word is organized as four columns of chips, each
column containing 256K words arranged as in
Figure 4.6. - 1M220
- For 1M word, 20 address lines are needed.
- The 18 least significant bits are routed to all
32 modules. - The high-order 2 bits are input to a group
select logic module that sends a chip enable
signal to one of the four columns of modules.
35Error Correction
- Hard Failure
- Permanent defect
- Soft Error
- Random, non-destructive
- No permanent damage to memory
- Detected using Hamming error correcting code
36Error Correcting Code Function
A function f, is performed on the data to
produce a code. When the previously stored word
is read out, the code is used to detect and
possible correct errors. A new set of K code
bits is generated from the M data bits and
compared with the fetched code bits.
37Even Parity bits
Figure 4.9 Hamming Error-Correcting Code
Figure 4.9 uses Venn diagrams to illustrate the
use of Hamming code on 4-bit words (M4). With
three intersection circles, there are seven
compartments. We assign the 4 data bits to the
inner compartments. The remaining compartments
are filled with parity bits. Each parity bit is
chosen so that the total number of 1s in its
circle is even.
38Figure 4.8 Error-Correcting Code
- The comparison logic receives as input two k-bit
values. A bit-by-bit comparison is done by taking
the exclusive-or of the two inputs. The results
is called the syndrome word. - The syndrome word is therefore K bits wide and
has a range between 0 and 2K-1. The value 0
indicates that no error was detected. Leaving
2K-1 values to indicate, if there is an error,
which bit was in error (the numerical value of
the syndrome indicates the position of the data
bit in error). - An error could occur on any of the M data bits or
K check bits so,
2K-1MK - (This equation gives the number of bits
needed to correct a single bit error in a word
containing M data bits.)
39?Those bit positions whose position number are
powers of 2 are designated as check bits. ?Each
check bit operates on every data bit position
whose position number contains a 1 in the
corresponding column position. ?Bit position n
is checked by those bits Ci such that ?in.
C8 C4 C2 C1
Figure 4.10 Layout of Data bits and Check bits
40The check bits are calculated as follows, where
the symbol designates the exclusive-or
operation
Assume that the 8-bit input words is 00111001,
with data bit M1 in the right-most position. The
calculations are as follows
Suppose the data bit 3 sustains an error and is
changed from 0 to 1.
41When the new check bits are compared with the old
check bits, the syndrome word is formed
The result is 0110, indicating that bit position
6, which contains data bit 3, in error.
42Figure 4.11 Check Bit Degeneration
a single-error-correction (SEC) code
43More commonly, semiconductor memory is equipped
with a single-error-correcting double-error-detect
ing (SEC-DED) code. An error-correction code
enhances the reliability of the memory at the
cost of added complexity.
Table 4.3 Increase in Word Length with Error
Correction
441
1
Figure 4.12 Hamming SEC-DEC Code
The sequence show that if two errors occur
(Figure 4.12 c), the checking procedure goes
astray (d) and worsens the problem by creating a
third error (e). To overcome the problem, an
eighth bit is added that is set so that the total
number of 1s in the diagram is even.
454.3 CASHE MEMORY
- Small amount of fast memory
- Sits between normal main memory and CPU
- May be located on CPU chip or module
46Cache operation - overview
- Figure 4.14 Cache/Main-Memory Structure (P118)
- Cache includes tags to identify which block of
main memory is in each cache slot. The tag is
usually a portion of the main memory address. -
Block
Line Number
Tag
0 1 2 C-1
? ? ?
Block length (k words)
(a) Cache
47Memory address
0 1 2 3 2n-1
Block (K words)
Block
Word Length
(b) Main Memory
48Figure 4.15 Cache Read Operation (P119) CPU
requests contents of memory location Check
cache for this data If present, get from cache
(fast) If not present, read required block
from main memory to cache Then deliver from
cache to CPU
49Typical Cache Organization
In this organization, the cache connects to the
processor via data, control, and address
lines. The data and address lines attach to data
and address buffers, which attach to a system bus
from which main memory is reached. When a cache
hit occurs, the data and address buffers are
disabled and communication is only between
processor and cache, with no system bus
traffic When a cache miss occurs, the desired
address is loaded onto the system bus and the
data are returned through a data buffer to both
the cache and main memory.
Figure 4.16 Typical Cache Organization
50Elements of Cache Design
- Size
- Mapping Function
- Direct
- Associative
- Set Associative
- Replacement Algorithm
- Least recently used (LRU)
- First in first out (FIFO)
- Least frequently used (LFU)
- Random
- Write Policy
- Write through
- Write back
- Write once
- Block Size
- Number of Caches
- Single or two level
- Unified or split
51Cache Size
- A trade-off between cost per bit and access time
- Cost
- More cache is expensive
- Speed
- More cache is faster (up to a point)
- Checking cache for data takes time
- Optimum cache sizes Suggested between 1K and
512K words.
52Mapping Function
- Three techniques
- direct, associative, and set associative
- Elements of the example
- Cache of 64kByte
- Cache block of 4 bytes
- Data is transferred between memory and the cache
in blocks of 4 bytes each. - i.e. cache is 16k (214) lines of 4 bytes
- 16MBytes main memory
- 24 bit address (22416M)
- Main memory (4M blocks of 4 bytes each)
53Direct Mapping
- Each block of main memory maps to only one cache
line - i.e. if a block is in cache, it must be in one
specific place - Address is in two parts
- Least Significant w bits identify unique word or
byte within a block of main memory. - Most Significant s bits specify one memory block
- The MSBs are split into a cache line field r and
a tag of s-r (most significant) - The line field of r identifies one of the m2r
lines of the cache
54Direct Mapping Cache Line Table
Every row has the same cache line number Every
column has the same tag number.
- Cache line Main Memory blocks
assigned - 0 0, m, 2m, 2s-m
- 1 1, m1, 2m12s-m1
- m-1 m-1, 2m-1, 3m-1 2s-1
The mapping is expressed as i j modulo
m where i cache line number j main
memory block number m number of lines in
the cache
No two blocks in the same line have the same Tag
field!
55Direct Mapping Cache Organization
The r-bit line number is used as an index into
the cache to access a particular line. If the
(s-r) bit tag number matches the tag number
currently stored in that line, then the w-bit
word number is used to select one of the 2w bytes
in that line. Otherwise, the s bits
tag-plus-line field is used to fetch a block from
main memory.
56Direct MappingAddress Structure
Tag s-r
Line or Slot r
Word w
14
2
8
- 24 bit address
- w 2 bit word identifier (4 byte block)
- s22 bit block identifier
- 8 bit tag (22-14)
- 14 bit slot or line
- No two blocks in the same line have the same Tag
field - Check contents of cache by finding line and
checking Tag
57Direct Mapping Example
The cache is organized as 16K214 lines of 4
bytes each. The main memory consists of
16Mbytes, organized as 4M blocks of 4 bytes
each. i j modulo m i cache line number
j main memory block number m number of
lines in the cache Note that no two blocks that
map into the same line number have the same tag
number.
Main Memory Address
58Direct Mapping pros cons
- Advantages
- Simple
- Inexpensive
- Disadvantages
- Fixed location for given block
- If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high
59Associative Mapping
- A main memory block can load into any line of
cache - Memory address is interpreted as a tag and a word
field. - Tag uniquely identifies block of memory
- Every lines tag is examined for a match
- Disadvantages of associative mapping
- Cache searching gets expensive
- Complex circuitry required to examine the tags of
all caches in parallel.
60Fully Associative Cache Organization
61Associative MappingAddress Structure
Word 2 bit
Tag 22 bit
- 22 bit tag stored with each 32 bit (4B) block of
data - Compare tag field with tag entry in cache to
check for hit - Least significant 2 bits of address identify
which 16-bit word is required from 32 bit data
block - e.g.
- Address Tag Data Cache line
- 16339C 058CE7 FEDCBA98 0001
62Associative Mapping Example
Main Memory Address
63Set Associative Mapping
- Cache is divided into a number of sets
- Each set contains a number of lines
- A given block maps to any line in a given set
- e.g. Block B can be in any line of set i
- e.g. 2 lines per set
- 2 way associative mapping
- A given block can be in one of 2 lines in only
one set
64Set Associative Mapping
- In this case , the cache is divided into v sets,
each of which consists of k lines. - The relationships are
- m v k
- i j modulo v
- where
- icache set number
- jmain memory block number
- mnumber of lines in the cache
- This is referred to as k-way set associative
mapping.
65Two Way Set Associative Cache Organization
The d set bits specify one of v2d sets. The s
bits of the tag and set fields specify one of the
2s blocks of main memory. With K-way set
associative mapping, the tag in a memory address
is much smaller and is only compared to the k
tags within a single set.
66Set Associative MappingExample
- 13 bit set number
- Block number in main memory is modulo 213
- 000000, 00A000, 00B000, 00C000 map to same set
67Set Associative MappingAddress Structure
Word 2 bit
Tag 9 bit
Set 13 bit
- Use set field to determine cache set to look in
- TagSet field specifies one of the blocks in the
main memory. - Compare tag field to see if we have a hit
- e.g
- Address Tag Data Set number
- 1FF 7FFC 1FF 24682468 1FFF
68Two Way Set Associative Mapping Example
e.g Address Tag Data Set number 1FF
7FFC 1FF 24682468 1FFF 02C 0004
02C 11235813 0001
Main Memory Address
69Replacement Algorithms (1)Direct mapping
- When a new block is brought into the cache, one
of the existing blocks must be replaced. - Direct mapping
- No choice
- Each block only maps to one line
- Replace that line
70Replacement Algorithms (2)Associative Set
Associative
- Hardware implemented algorithm (speed)
- Least Recently used (LRU)
- Replace that block in the set which has been in
the cache longest with no reference to it. (hit
ratio time) - e.g. in 2 way set associative
- Which of the 2 block is LRU?
- First in first out (FIFO)
- replace block in the set that has been in cache
longest. (time) - Least frequently used
- replace block in the set which has had fewest
hits. (hit ratio) - Random
71Write Policy
- Must not overwrite a cache block unless main
memory is up to date - Problems to contend with
- More than one device may have access to main
memory. - Data inconsistent between memory and cache
- Multiple CPUs may have individual caches
- Data inconsistent among caches
- Write Policy
- Write through
- Write back
- Write once
72Write through
- All writes go to main memory as well as cache
- Any other processor-cache can monitor main memory
traffic to keep local (to CPU) cache updated. - Disadvantages
- Lots of traffic
- Slows down writes
73Write back
- Updates initially made in cache only
- Update bit for cache slot is set when update
occurs - If block in cache is to be replaced, write to
main memory only if update bit is set - Other caches get out of sync
- I/O must access main memory through cache
- Because portions of main memory are invalid
74Approaches to cache coherency
- Bus watching with write through
- Each cache controller monitors the address lines
to detect write operations to memory by other bus
masters. - This strategy depends on the use of a
write-through policy by all cache controller. - Hardware transparency
- Additional hardware is used to ensure that all
the updates to main memory via cache are
reflected in all caches. - Noncachable memory
- Only a portion of main memory is shared by more
than one processor. - In such a system, all accesses to shared memory
are cache misses, because the shared memory is
never copied to the cache. - The noncachable memory can be identified using
chip-select logic or high-access bits.
75Line Size
- The principle of locality
- Data in the vicinity of a referenced word is
likely to be referenced in the near future. - The relationship between block size and hit ratio
is complex, depending on the locality
characteristics of a particular program, and no
definitive optimum value has been found. - A size of from two to eight words seems
reasonably close to optimum.
76Number of caches
- A single cache
- Multiple caches
- The number of levels of caches
- The use of unified versus split caches
- Split caches one dedicated to instructions and
one dedicated to data - Key advantage of split caches eliminate
contention for cache between the instruction
processor and the execution unit. - Unified cache a single cache used to store
references to both data and instructions - For a given cache size, a unified cache has a
higher hit rate than split caches because it
balances the load between instruction and data
fetches automatically.
77Number of caches
- The on-chip cache cache and processor on the
same chip - When the requested instruction or data is found
in the on-chip cache, the bus access is
eliminated. Because of the short data paths
internal to the processor, on-chip cache accesses
will complete appreciably faster than would even
zero-wait state bus cycles. - Advantages
- Reduce the processors external bus activity
- Speed up execution times
- Increase overall system performance
- A two-level cache
- The internal cache designated as level 1 (L1)
- The external cache designated as level 2 (L2)
784.4 Pentium Cache
- Foreground reading
- Find out detail of Pentium II cache systems
- NOT just from Stallings!
794.5 Newer RAM Technology (1)
- Basic DRAM same since first RAM chips
- Constraints of the traditional DRAM chip
- its internal architecture and its interface
to the processors memory bus. - Enhanced DRAM
- Contains small SRAM as well
- SRAM holds last line read
- A comparator stores the 11-bit value of the most
recent row address selection. - Cache DRAM (CDRAM)
- Larger SRAM component
- Use as cache or serial buffer
80Newer RAM Technology (2)
- Synchronous DRAM (SDRAM)
- Access is synchronized with an external clock
unlike DRAM asynchronous. - Address is presented to RAM
- Since SDRAM moves data in time with system clock,
CPU knows when data will be ready - CPU does not have to wait, it can do something
else - Burst mode allows SDRAM to set up stream of data
and fire it out in block
81Internal logic of the SDRAM
In burst mode, a series of data bits can be
clocked out rapidly after the first bit has been
accessed. Burst mode is useful when all the bits
to be accessed are in sequence and in the same
row of the array as the initial access A
dual-bank internal architecture that improves
opportunities for on-chip parallelism. The mode
register and associated control logic provide a
mechanism to customize the SDRAM to suit specific
system needs.
82Newer RAM Technology (3)
- Foreground reading
- Check out any other RAM you can find
- See Web site
- The RAM Guide
83Exercises
- P143 4.4, 4.6, 4.7, 4.8
- P145 4.20
- Deadline