Title: Internal Memory
1William Stallings Computer Organization and
Architecture
Chapter 4 Internal Memory
2Memory
- How much ?
- As much as possible
- How fast ?
- As fast as possible
- How expensive ?
- As cheap as possible
- Fast memory is expensive
- Large memory is expensive
- The larger the memory, the slower the access
3Memory Hierarchy
- CPU Registers
- L1 cache (on chip)
- L2 cache (on board)
- Main memory
- Disk cache
- Disk
- Optical
- Tape
Access time
Size
Access Frequency
Cost per bit
4Characteristics
- Location
- Capacity
- Unit of transfer
- Access method
- Performance
- Physical type
- Physical characteristics
- Organisation
5Location
- CPU
- Registers
- Internal access directly from CPU
- Cache
- RAM
- External access through I/O module
- Disks
- CD-ROM,
6Capacity
- Word size
- The natural unit of organisation
- Usually, it is equal to the numer of bits used
for representing numbers or instructions - Typical word size 8 bits, 16 bits, 32 bits
- Number of words (or Bytes)
- 1 Byte 8 bits 23 bits
- 1 K Byte 210 Bytes 210 x 23 bits 1024
bytes (Kilo) - 1 M Byte 210 K Bytes 1024 K Bytes (Mega)
- 1 G Byte 210 M Bytes 230 Bytes (Giga)
- 1 T Byte 210 G Bytes 1024 G Bytes (Tera)
7Unit of Transfer
- Number of bits can be read/written at the same
time - Internal
- Usually governed by data bus width
- bus width may be equal to word size or (often)
larger - Typical bus width 64, 128, 256 bits
- External
- Usually a block which is much larger than a word
- A related concept addressable unit
- Smallest location which can be uniquely addressed
- Word internally
- Cluster on M disks
8Access Methods (1)
- Sequential
- Start at the beginning and read through in order
- Access time depends on location of data and
previous location - e.g. tape
- Direct
- Individual blocks have unique address
- Access is by jumping to vicinity plus sequential
search - Access time depends on location and previous
location - e.g. disk
9Access Methods (2)
- Random
- Individual addresses identify locations exactly
- Access time is independent of location or
previous access - e.g. RAM
- Associative
- Data is located by a comparison with contents of
a portion of the store - Access time is independent of location or
previous access - e.g. cache
10Performance
- Access time
- Time between presenting the address and getting
the valid data - Memory Cycle time
- Time may be required for the memory to recover
before next access - Cycle time is access recovery
- Transfer Rate
- Rate at which data can be moved
- TNTA N/R
N number of bits TA access
time TN time need to read N bits R
transfer rate
11Physical Types
- Semiconductor
- RAM, ROM, EPROM, Cache
- Magnetic
- Disk Tape
- Optical
- CD DVD
- Others
- Bubble
- Hologram
12Semiconductor Memory
- RAM (Random Access Memory)
- Misnamed as all semiconductor mem. are random
access - Read/Write
- Volatile
- Temporary storage
- Static or dynamic
- ROM (Read only memory)
- Permanent storage
- Read only
13Dynamic RAM
- Bits stored as charge in capacitors
- Charges leak
- Need refreshing even when powered
- Simpler construction
- Smaller per bit
- Less expensive
- Need refresh circuits
- Slower
- Main memory (static RAM would be too expensive)
14Static RAM
- Bits stored as on/off switches
- No charges to leak
- No refreshing needed when powered
- More complex construction
- Larger per bit
- More expensive
- Does not need refresh circuits
- Faster
- Cache (here the faster the better)
15Read Only Memory (ROM)
- Permanent storage
- Microprogramming (see later)
- Library subroutines
- Systems programs (BIOS)
- Function tables
16Types of ROM
- Written during manufacture
- Very expensive for small runs
- Programmable (once)
- PROM
- Needs special equipment to program
- Read mostly
- Erasable Programmable (EPROM)
- Erased by UV (it can take up to 20 minuts)
- Electrically Erasable (EEPROM)
- Takes much longer to write than read
- a single byte can be erased
- Flash memory
- Erase memory electrically block-at-a-time
17Physical Characteristics
- Decay (refresh time)
- Volatility (needs power source)
- Erasable
- Power consumption
18Organisation
- Physical arrangement of bits into words
- Not always obvious
- e.g. interleaved
19Basic Organization (1)
- Basic element memory cell
- has 2 stable states one represent 0, the other 1
- can be written at least once
- can be read
Write
Read
R/W Control
R/W Control
Cell
Cell
Select
Select
Input Data
Output Data
20Basic Organization (2)
- Basic organization of a 512x512 bits chip
Timing and control
Array of Memory Cells (512x512)
Row Address Decoder
A0
9
A8
D0
1
Sense Amplifier and I/O Gate
A9
9
Column Address Decoder
A17
21Module Organisation
- Basic organization of a 256KB chip
- 8 times a 512x512 bits chip
- For a 1 MB chip replicate 4 times this
organization
22Module Organisation (1 MByte)
23Organisation for larger sizes
- The larger the size the higher the number of
address pins - For 2k words, k pins are needed
- A solution to reduce the number of address pins
- Multiplex row address and column address
- k/2 pins to address 2k Bytes
- Adding one more pin doubles range of values so x4
capacity
24Typical 16 Mb DRAM (4M x 4)
X
X
25Refreshing (Dynamic RAM)
- Refresh circuit included on chip
- Disable chip
- Count through rows
- Read Write back
- Takes time
- Slows down apparent performance
26Packaging
X
27Error Correction
- Hard Failure
- Permanent defect
- Soft Error
- Random, non-destructive
- No permanent damage to memory
- Detected using Hamming error correcting code
- it is able to detect and correct 1-bit errors
28Error Correcting Code Function
29A simple example of correction (1)
B
A
- Correcting errors in 4 bits words
- 3 control groups
- In each control group add 1 parity bit
1
1
1
0
C
B
A
1
1
0
1
1
0
0
C
30A simple example of correction (2)
B
A
- One of the bits change value
- Using control bit the right value is restored
1
1
0
1
0
0
0
C
B
A
1
1
0
1
1
0
0
C
31Compare Circuit
- it takes two K-length binary strings X, Y as
input - XXKX1
- YYKY1
- it returns a K-length binary string Z (syndrome)
- ZZKZ1
- ZiXi ? Yi for each i1,,K
- Z00 means no error
32Relation between M and K
- Z may assume 2K values
- the value Z00 means no error
- the error may be in any bit among the MK bits
- it must be
2K -1 ? MK
Data bits (M) Control Bits (K) Additional Memory ()
4 3 75
8 4 50
16 5 31,25
32 6 18,75
64 7 10,94
128 8 6,25
256 9 3,52
33How to arrange the MK bits
- the MK bits are arranged so that
- if Z contains a single bit equal to 1
- error occured in the corresponding control bit
- if Z contains more than one bit equal to 1
- error occured in the i-th bit where i is the
value (in binary) of Z
34The case M4
bit position 7 6 5 4 3 2 1
position number 111 110 101 100 011 010 001
data bits D4 D3 D2 D1
control bits C4 C2 C1
D1
C1 D1 ? D2 ? D4 C2 D1 ? D3 ? D4 C4 D2 ? D3 ? D4
C1
C2
D4
D2
D3
C4
35Exercise
- Design a Hamming error correcting code for
8-bit words - See the textbook for the solution
36Cache
- Small amount of fast memory
- Sits between normal main memory and CPU
- May be located on CPU chip or module
37Cache operation - overview
- CPU requests contents of memory location
- Check cache for this data
- If present (hit), get from cache (fast)
- If not present (miss), read required block from
main memory to cache - Then deliver from cache to CPU
38Cache Performance
- Cache access time t1
- Memory access time T10
- Hit Probability H
- Taverage accesstH(Tt)(1-H)t(1-H)T
T average access
H
39Locality of Reference (Denning68)
- Spatial Locality
- Memory cells physically close to those just
accessed tend to be accessed - Temporal Locality
- During the course of the execution of a program,
all accesses to the same memory cells tend to
close in time - e.g. loops, arrays
40Typical Cache Organization
41Cache Design
- Size
- Mapping Function
- Replacement Algorithm
- Write Policy
- Block Size
- Number of Caches
42Size does matter
- Cost
- More cache is expensive
- Speed
- More cache is faster (up to a point)
- Checking cache for data takes time
43Cache-memory mapping
- There are M2n/K blocks
- C ltlt M
- Each block is mapped to a cache line
44Mapping Function
- Word size 1 Byte
- Cache of 64KBytes (216 Bytes)
- Cache block of 4 bytes
- 64 KB/4 16K (214) lines of 4 bytes
- 16MBytes (224) main memory
- 224/4 4M (222) blocks in main memory
- Map 222 blocks to 214 lines of cache
45A simple example of Direct Mapping
w
r
s-r
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 .. .. .. 11110 11111
Block 0
Line 0
Block 1
Line 1
Block 2
Line 2
Block 3
Line 3
Block 4
Line 0
Block 15
Line 3
46Direct Mapping (1)
- Each block of main memory is mapped to a specific
cache line - i.e. if a block is in cache, it must be in one
specific place - In a cache of C lines block j is stored into line
i, where i j mod C
47Direct Mapping (2)
- Address is in two parts
- w Least Significant Bits (LSB) identify unique
word - s Most Significant Bits (MSB) specify one memory
block - The MSBs are split into
- a cache line field r (least significant)
- a tag of s-r (most significant)
48Direct Mapping Summarizing
- address length nsw bits
- number of addressable units (words) 2sw
- block sizecache line size 2w words
- number of memory bocks 2sw/2w 2s
- number of cache lines C 2r
- tag length (s-r) bits
49Cache Line Mapping Table
- Cache line Main Memory blocks held
- 0 0, C, 2C, ,2s-C
- 1 1, C1, 2C1, ,
2s-C1 - C-1 C-1, 2C-1, 3C-1, ,
2s-1
50Direct MappingAddress Structure
Tag s-r
Line or Slot r
Word w
14
2
8
- 24 bit address 16MBytes (224) main memory
- 2 bit word identifier (4 byte block)
- Cache 64 KB/4 16K (214) lines of 4 bytes
- 22 bit block identifier
- 8 bit tag (22-14)
- 14 bit slot or line
- No two blocks mapping to the same line have the
same Tag field - Check contents of cache by finding line and
checking Tag
51Direct Mapping Cache Organization
52Direct Mapping pros cons
- Simple
- Inexpensive
- Fixed location for given block
- If a program repeatedly accesses 2 distinct
blocks that are mapped to the same line, cache
misses are very high (thrashing)
53Associative Mapping
- A main memory block can load into any line of
cache - Memory address is interpreted as tag and word
- Tag uniquely identifies block of memory
- Every lines tag is examined for a match
- Cache searching gets expensive
54A simple example of Associative Mapping
w
s
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 .. .. .. 11110 11111
Block 0
Block 1
w0 w1
Block 2
Line 0 Line 1 Line 2 Line 3
0011 0001 0000 0100
Block 3
Block 4
Note a replacement algorithm is needed (see
later)
Block 15
55Associative Mapping Summarizing
- address length nsw
- number of addressable units (words) 2sw
- block sizecache line size 2w words
- number of memory bocks 2sw/2w 2s
- number of cache lines not specified
- tag length s bits
56Associative MappingAddress Structure
Word 2 bit
Tag 22 bit
- 22 bit tag stored with each 4 byte block of data
- Compare tag field with tag entry in cache to
check for hit - Least significant 2 bits of address identify
which byte is required from the 4 byte data block
57Fully Associative Cache Organization
58Set Associative Mapping
- Cache is divided into v sets
- Each set contains k lines
- number of cache lines Cv?k
- A given block maps to any line in a given set
- Block j can be in any line of set i, where ij
mod v - There are k lines in a set (k-way set associative
mapping) - k1 direct mapping kC associative mapping
- The best choice in practice is 2 lines per set
- 2 way associative mapping
- A given block can be in only one set, but in any
of its 2 lines
59A simple example of Set Associative Mapping
d
w
s-d
00000 00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 .. .. .. 11110 11111
Block 0
Set 0
Block 1
Set 1
w0 w1
Block 2
Line 0 Line 1 Line 2 Line 3
010 000 111 000
Set 0
Set 0
Block 3
Set 1
Set 1
Block 4
Set 0
Note a replacement algorithm is needed (see
later)
Block 15
Set 1
60Set Associative Mapping
- Address is in two parts
- w Least Significant Bits (LSB) identify unique
word - s Most Significant Bits (MSB) specify one memory
block - The MSBs are split into
- a cache set field d (least significant)
- a tag of s-d (most significant)
61Set Associative Mapping Summarizing
- address length nsw bits
- number of addressable units (words) 2sw
- block sizecache line size 2w words
- number of memory bocks 2sw/2w 2s
- number of lines for each cache set k
- number of sets v 2d
- number of cache lines C k v k 2d
- tag length (s -d) bits
62Set Associative MappingAddress Structure
Word 2 bit
Tag 9 bit
Set 13 bit
- number of cache lines 214
- number of cache sets 213
- each cache set has two lines 2-way set
associative mapping - Use set field to determine cache set to look in
- Compare Tag field with all lines in the set to
see if we have a hit
63Two Way Set Associative Cache Organization
64Replacement Algorithms (1)Direct mapping
- No choice
- Each block only maps to one line
- Replace that line
65Replacement Algorithms (2)Associative Set
Associative
- Hardware implemented algorithm (to obtain speed)
- Least Recently used (LRU)
- e.g. in 2 way set associative
- Which of the 2 blocks is LRU?
- First in first out (FIFO)
- replace block that has been in cache longest
- Least frequently used
- replace block which has had fewest hits
- Random
- Almost as good as LRU
66Write Policy
- Must not overwrite a cache block unless main
memory is up to date - Multiple CPUs may have individual caches
- I/O may address main memory directly
67Write through
- All writes go to main memory as well as cache
- Multiple CPUs can monitor main memory traffic to
keep local (to CPU) cache up to date - Lots of traffic
- Slows down writes
68Write back
- Updates initially made in cache only
- Update bit for cache slot is set when update
occurs - If block is to be replaced, write to main memory
only if update bit is set - I/O must access main memory through cache
- N.B. 15 of memory references are writes
- Caches of other devices get out of sync
- Cache coherency problem (a general problem in
distributed systems !)
69Block Size
- Too small
- Locality of reference is not used
- Too large
- Locality of reference is lost
70Number of Caches
- 2 levels of cache
- L1 on chip (since technology allows it)
- L2 on board (to fill the speed gap)
- 2 kinds of cache
- Data cache
- Instruction cache
- To allow instruction parallel processing and data
fetching interfere