Title: CSC: 345 Computer Architecture
1CSC 345 Computer Architecture
- Jane Huang
- Lecture 5
- Memory Organization
- Error Correction
2Review of cache
- Stallings Question 4.2For the hex main memory
addresses 111111,666666,BBBBBB show the following
information in hex form - Direct mapped cache 16Mbyte main memory, with
FFFC words of 32 bits each. 16Kword cache with
3FFF words of 32 bits each - Show Tag, Line, and Word values for these
addresses. - Associative cache
3- Direct mapped cache 16Mbyte main memory, with
FFFC words of 32 bits each. 16Kword cache with
3FFF words of 32 bits each - Show Tag, Line, and Word values for these
addresses. - Specify the following values for hex addresses
111111, 666666, BBBBBB - Word
- Line
- Tag
4- Associative Memory
- Address length
- Number of addressable units
- Block size
- Number of blocks in main memory
- Number of lines in cache
- Size of tag.
5- Two-way set associative cache
- Address length
- Number of addressable units
- Block size
- Number of blocks in main memory
- Number of lines in set
- Number of sets
- Number of lines in cache
- Size of tag.
6Semiconductor Main Memory
- Basic element memory cell
- Exhibit 2 stable states used to represent 0 and 1
- Can be written into (at least once)
- Can be read to sense state
- Random Access Memory
- Read and write easily by use of electrical
signals - Volatile must be provided with a constant
electrical supply or else data will be lost.
(only good for temporary storage). - DRAM (Dynamic) and SRAM (Static)
7Dynamic RAM (DRAM)
- DRAM made from cells that store data as charge on
capacitors. (Charge 1, no charge 0) - Capacitors have a tendency to discharge.
- DRAMS need periodic refreshing to maintain data
storage.
Static RAM (SRAM)
- SRAM is a digital device.
- Binary values stored using traditional flip-flop
logic gates. - SRAM holds value as long as power is supplied.
SRAM vs. DRAM
- Both volatile
- DRAM is simpler, smaller, denser, less expensive
but needs refresh circuitry. (Only worthwhile
for larger memories main memory). - SRAM is faster, more expensive therefore
usually used for smaller cache memories.
8ROM
- Read-only memory
- Contains a permanent pattern of bits, therefore
no power source needed to maintain bit values. - Created like any other integrated chip.
- Useful for microprogramming, system programs,
function tables etc. - Problems
- Large fixed cost incurred for 1 or 1000s of
chips. - No room for error.
- Programmable ROM
- If only a small number of ROMs of one memory
content are needed, a good alternative is
programmable ROM (PROM) - PROM can only be written once, but the writing
process is performed electronically and need not
be done at the time of original chip fabrication. - Provides flexibility and convenience.
- Read mostly memory
- EPROM (Erasable programmable read-only memory
erases everything) - EEPROM (Electrically erasable programmable
read-only memory byte level) - Flash Memory (Uses electrical technology to flash
erase one section)
9Chip Art Gallery
Chip designers often secretly add artwork to
the chips they design.
Where is Waldo?
We caught this silicon version of Waldo (that is
about 30 microns in size) hiding among caches,
buses, and registers while searching through many
thousands of square microns of complex circuitry
with a high-power optical microscope. Waldo is
the first Silicon Creature that we discovered,
and this led to an exhaustive search for more
creatures and construction of the Silicon Zoo
gallery.
http//www.wired.com/news/print/0,1294,17028,00.ht
ml
10Chip Art Gallery
Daffy Duck
As we see it, the engineers that designed this
wireframe version of Daffy Duck must have had a
very interesting sense of humor. We found it
deeply embedded within the circuitry of a RISC
microprocessor, about 1500 microns away from a
similar-style rendition of Waldo. Daffy is about
50 microns in size, making it necessary to use a
high-power (40X to 60X) microscope objective to
photograph the wireframe character.
http//www.wired.com/news/print/0,1294,17028,00.ht
ml
11(No Transcript)
1264 bit ROM
13Use of a ROM to
B2
B1
B0
G2
G1
G0
An example of ROM
implement a
0
0
0
0
0
0
conversion from
0
0
1
0
0
1
Binary to Gray Code
0
1
0
1
1
0
(A 24 bit Rom
0
1
1
1
0
0
- ROM only performs the read operation.
- A given input always produces the same output.
- Therefore a ROM is just a combinational circuit.
- Also can be viewed as a memory of n words b
bits, where 2n the number of inputs, and b
the number of outputs.
1
0
0
1
0
1
consisting of 8
1
0
1
1
1
1
words of 3 bits each)
1
1
0
1
0
1
1
1
1
0
0
1
000
001
010
B2
Three Input
011
Eight Output
B1
100
Decoder
B0
101
110
111
G2
G1
G0
14Chip Logic
- Trade offs in terms of speed, capacity, and cost.
- Physical arrangement of cells matches logical
arrangement.Memory array organized into W words
of B bits each.Example 16-Mbit chip ? 1 M
16-bit words. - One-bit-per chip organization. Data is
read/written one bit at a time.
16-MBit DRAM
15Typical 16 Megabit DRAM (4M X 4)
- 19 bit address multiplexed into the Chip
- Select an entire row using 11 most significant
bits. - Select a column using 11 least significant bits.
- Refresh circuitry (DRAM)
16256-Kbyte Memory Organization
- In this example a RAM chip contains 1 bit per
word. - For 256K 8-bit words we need 8 chips.
- Row address simultaneously sent to all 8 chips.
- Followed by column address simultaneously sent to
all 8 chips.
17Group Exercise
- Design a 512K 4 bit memory using 256X256 chips.
- Show how the address would be used to access data.
18Error Correction
- A semiconductor memory system is subject to
errors. - Hard failures permanent physical
defectsEnvironmental abuse, manufacturing
defects, wear. - Soft error Power supply problems, alpha
particles. - Need logic for detecting and correcting errors.
- Basic technique
- Prior to storing data a code is generated from
the bits in the word. - Code stored alongside the word in memory.
- Code used to identify and correct errors.
- When the word is fetched a new code is generated
and compared to the stored code. - No error (normal case)
- Correctable error is detected and corrected.
- Non-fixable error is detected and reported.
19Hamming Code
A
B
A
B
C
C
20Hamming Code
A
B
A
B
1
1
1
0
1
0
1
1
0
0
0
1
0
0
0
C
C
If a bit gets erroneously changed, the parity
bits in that circle will no longer add up to 1.
Errors are found in A and C and the shared bit
in A and C is in error and can be fixed.
21Single Bit Errors in 8-bit words
- 8 data bits
- The code needs to represent the bit position of
the error. For example, if bit 2 were in error
(10011001 ? 10011011) we would like the syndrome
word to output a value of 2 (0010). If no errors
occurred the code should output 0 (0000) - Therefore code length (K) must be greater or
equal to Log2W 1, where W word length. ie
for 8 bits, it must be big enough to represent
numbers 0 8, therefore 4 bits are needed.
- No errors code 0.
- One error bit error occurred in one of the
check bits. No action. - More than one bit set to 1 the numerical
value of the syndrome indicates the position of
the data bit in error.
22Single Bit Errors in 8-bit words
- Data and check bits arranged into a 12-bit word.
- Bit positions numbered from 1 to 12.
- Bit positions representing position numbers that
are powers of 2 are designated as check bits. - Check bits calculated as follows
- Data and check bits arranged into a 12 bit
syndrome word
8 data bits
4 check bits
23Calculating check bits
C1 D1 D2 D4 D5 D7 Each
check bit works on every data bit who shares the
same bit position
24Example
- Input word 00111001 Databit D1 in rightmost
position - Calculate check bits
- C1 1 ? 0 ? 1 ? 1 ? 0 1
- C2 1 ? 0 ? 1 ? 1 ? 0 1
- C3 0 ? 0 ? 1 ? 0 1
- C4 1 ? 1 ? 0 ? 0 0Stored word
001101001111 - If data bit 3 sustains an error (001101101111)
- C1 1 ? 0 ? 1 ? 1 ? 0 1
- C2 1 ? 1 ? 1 ? 1 ? 0 1
- C3 0 ? 1 ? 1 ? 0 1
- C4 1 ? 1 ? 0 ? 0 0
- Calculate syndrome word0110 bit position 6.
- D3 resides in bit position 6.
-
25Double Error Detecting
- Previous example is Single-Error-Correcting code.
- Semiconductor memory is usually equipped with
SEC-DED (Single-error-correcting,
double-error-detecting code. SEC-DED requires an
extra bit.
26Performance
- Access Time (latency)
- Random Access time taken to perform a read or
write. - Non-random access memory time to position
read-write mechanism at desired location. - Memory Cycle Time
- Access time additional time required before a
second access can commence. - Affected by behavior of the system bus not the
processor. - Transfer Rate
- Rate at which data can be transferred into or out
of a memory unit. - For random access memory 1/(cycle time).
- Non random-access memoryTN TA ( N / R)TN
Average time to read or write N bits - TA Average access time
- N Number of bits
- R Transfer rate, in bits per second
(bps)
27Magnetic Disks
- Tracks Hard Disk platters arrange data into
concentric circles, rather than one large spiral,
as some other mediums use. Each circle is called
a Track. - Sectors The smallest addressable unit on a
Track. Sectors are normally 512 bytes in size,
and there can be hundreds of sectors per track,
depending on location.(Constant bit density
more sectors on outer tracts) - Heads The devices used to write and read data on
each platter. - Cylinders Platters on a hard disk are stacked
up, and so are the heads. Concentric circles
from each parallel platter form a cylinder.
(Think Stargate!)
http//www.pcguide.com/ref/hdd/geom/tracksDifferen
ce-c.html
28Reading and Writing
- SEEK Disk controller sends a command to move
the arm over the proper track. Seek Time. - Seek time
- Minimum / Maximum
- Average? Sum of all possible seeks divided by
the number of possible seeks. What is wrong with
this??? - Rotation latency (delay)
- Time for requested sector to rotate under the
head.Average halfway around disk. (0.5) - If a disk rotates at 10,000 RPMAvg Rotation
time 0.5 / 10,000 RPM - 0.5 / (10,000/60) RPS
- 0.0030 sec 3.0 ms.
- Transfer time
- Time it takes to transfer a block of bits.
(typically a sector)Function of block size, disk
size, rotation speed, recording density, etc.
29Example
- What is the average time to read or write a
512-byte sector for a disk? The advertised
average seek time is 5ms, the transfer rate is
40MB/sec, it rotates at 10,000 RPM, and the
controller overhead is 0.1ms. Assume the disk is
idle so that there is no queueing delay. In
addition, calculate the time assuming the
advertised seek time is three times longer than
the measured seek time. - Answer
- Average disk access average seek time average
rotational delay transfer time controller
overhead. - 5ms 0.5 0.5KB
0.1ms - 10,000 RPM 40 MB/sec
- 1.67ms 3.0ms 0.013ms 0.1ms
4.783ms
30RAID
- Redundant Array of Independent Disks
- Disk storage designers recognized that if access
times etc can only be improved to a certain
extent additional performance can be gained by
introducing multiple disks. - Introduced possibility of more errors.
- RAID Improve access time improve reliability.
- Set of physical disk drives viewed as the
Operating system as a single logical drive. - Data are distributed across the drives of an
array. - Redundant disk capacity is used to store parity
information guaranteeing data recoverability
in case of a disk failure.
Picture fromhttp//mst2.lcc.whecn.edu/byeager/wh
itepapers/raid.pdf
31RAID Level 0
- Not a true member of the RAID family - does not
include redundancy to improve performance. - User and system data distributed across all disks
in the array in strips. - Imagine a large logical disk containing ALL data.
This is divided into strips that are mapped
round robin to the strips in the array. - If two different I/O requests are pending for
two different blocks of data then there is a
good chance that the data will be on different
disks and can be serviced in parallel. - If a single I/O request is for multiple
logically continuous strips up to n strips can
be handled in parallel.
32Data Mapping for RAID Level 0
33RAID Level 1
- Redundancy achieved through duplicating all data.
- Each logic strip is mapped to two physical disks.
- Read request can be serviced from either
available disk. - Write request requires both disks to be updated
but this can be done in parallel. (Slower write
dictates overall speed). - Recover from failure is simple!
Picture from http//mst2.lcc.whecn.edu/byeager/wh
itepapers/raid.pdf
34RAID Level 2
- Utilizes parallel access techniques - All disks
participate in the execution of every I/O
request. - Spindles of individual drives are synchronized so
that each disk head is in the same position on
each disk at any given time. - Data striping very small strips (single byte or
word). - Error correcting code calculated across
corresponding bits on each disk, and the code
bits are stored in corresponding bit positions on
multiple parity disks. - For Hamming Code number of parity disks is
proportionate to the log of the number of data
disks.Array control can detect and fix single bit
errors. - For write all disks must be accessed.
- Good choice only for an environment in which
many errors occur therefore not used much.
35RAID Level 3
- Similar to RAID 2 parallel access with data
distributed in small strips. - Only requires a single redundant disk because it
uses a single parity bit for the set of
individual bits in the same position. - If drives X0-X3 contain data, and X4 contains
parity bits. - X4(i) X3(i) ? X2(i) ? X1(i) ? X0(i)
- Redundancy in the case of disk failure, the
data can be reconstructed.If drive X1 fails it
can be reconstructed as - X1(i) X4(i) ? X3(i) ? X2(i) ? X0(i)
- Performance can achieve high transfer rates,
but only one I/O request can be executed at one
time. (Better for large data transfers in non
transaction-oriented environments).
36RAID Level 4
- Each disk operates independently - Separate I/O
requests satisfied in parallel. - Suitable for applications with high I/O request
rates and NOT well suited for those requiring
high data transfer rates. - Data striping. (Strips are larger than in lower
RAIDs). - Bit-by-bit parity calculated across corresponding
strips on each data disk, and stored in
corresponding strip on the parity disk. - Performance write penalty when I/O request is
small size. Write must update user data
corresponding parity bits. - X4(i) X3(i) ? X2(i) ? X1(i) ? X0(i)
- If X1(i) is changed to X1(i) X4(i) X3(i)
? X2(i) ? X1(i) ? X0(i) X4(i) ?
X1(i) ? X1(i)
37RAID Level 5
- Same as RAID 4 but parity strips distributed
across all disks. - Typical allocation uses round-robin.
- For an n-disk array, the parity strip is on a
different disk for the first n strips. - Avoid potential bottleneck found in RAID 4.
38RAID Level 6
- Two different parity calculations carried out and
stored in separate blocks on different disks. - Example XOR and a second independent data check
algorithm. - No. of disks required N 2 (where N number
of disks required for data). - Provides HIGH data reliability.
- Incurs substantial write penalty as each write
affects two parity blocks.
39Homework
- Stallings 5.3Design a 16-bit memory of total
capacity 8192 bits using SRAM chips of size 64X1
bit. Give the array configuration of the chips
on the memory board showing all required input
and output signals for assigning this memory to
the lowest address space. The design should
allow for both byte and 16-bit word accesses. - Stallings 5.5Suppose an 8-bit data word stored
in memory is 11000010. Using the Hamming
algorithm, determine what check bits would be
stored in memory with the data word. Show how
you got your answer. - Stallings 5.6For the 8-bit word 00111001, the
check bits stored with it would be 0111. Suppose
when the word is read from memory, the check bits
are calculated to be 1101. What is the data word
that was read from memory? - Stallings 6.3 (Question on RAID)
- What is the average time to read or write a
512-byte sector for a disk? The advertised
average seek time is 4ms, the transfer rate is
35MB/sec, it rotates at 8,000 RPM, and the
controller overhead is 0.15ms. Assume the disk
is idle so that there is no queueing delay.
40Challenge Question
- Stallings 5.3Design a 16-bit memory of total
capacity 8192 bits using SRAM chips of size 64X1
bit. Give the array configuration of the chips
on the memory board showing all required input
and output signals for assigning this memory to
the lowest address space. The design should
allow for both byte and 16-bit word accesses. - Stallings 5.5Suppose an 8-bit data word stored
in memory is 11000010. Using the Hamming
algorithm, determine what check bits would be
stored in memory with the data word. Show how
you got your answer. - Stallings 5.6For the 8-bit word 00111001, the
check bits stored with it would be 0111. Suppose
when the word is read from memory, the check bits
are calculated to be 1101. What is the data word
that was read from memory? - Stallings 6.3 (Question on RAID)
- What is the average time to read or write a
512-byte sector for a disk? The advertised
average seek time is 4ms, the transfer rate is
35MB/sec, it rotates at 8,000 RPM, and the
controller overhead is 0.15ms. Assume the disk
is idle so that there is no queueing delay. - CHALLENGE QUESTION See handout.