Title: Memory Arrays Contd
1Memory ArraysContd
2Dual-Ported SRAM
- We have considered single-ported SRAM
- One read or one write on each cycle
- Multiported SRAM are needed for register file
- Simple dual-ported SRAM
- Two independent single-ended reads
- Or one differential write
- Do two reads and one write by time multiplexing
- Read during ½ cycle, write during the 2nd ½ cycle
3Multi-Ported SRAM
- Adding more access transistors hurts read
stability - Multiported SRAM isolates reads from state node
- Bit and INV generate complimentary signals for
write - Single-ended design minimizes number of bitlines
Read
If Bit0, then Bit_b 1 both read pass
transistors are on gt bitline (e.g., bD) is to
ground gt bD 0Bit If Bit1, then Bit_b0 one
the two pass transistors is off gt bitline stays
charged or bD1Bit
4Serial Access Memories
- Serial access memories do not use an address
- Shift Registers
- Tapped Delay Lines
- Serial In Parallel Out (SIPO)
- Parallel In Serial Out (PISO)
- Queues (FIFO, LIFO)
5Shift Register
- Shift registers store and delay data
- Simple design cascade of registers
- Watch your hold times!
6Denser Shift Registers
- Flip-flops arent very area-efficient
- For large shift registers, keep data in SRAM
instead - Move read/write pointers to RAM rather than data
- Initialize read address to first entry, write to
last - Increment address on each cycle
7Tapped Delay Line
- A tapped delay line is a shift register with a
programmable number of stages - Set number of stages with delay controls to mux
- Ex 0 63 stages of delay
8Serial In Parallel Out
- 1-bit shift register reads in serial data
- After N steps, presents N-bit parallel output
9Parallel In Serial Out
- Load all N bits in parallel when shift 0
- Then shift one bit out per cycle
10Queues
- Queues allow data to be read and written at
different rates - Read and write each use their own asynchronous
clock - Build with SRAM and read/write counters
(pointers) - Two types FIFO and LIFO
- FIFO - After reset, both read and write pointers
are initialized to the first location and FIFO is
empty. The write and read pointers increment on
write and read commands, respectively. If the
write (read) pointer catches with the read
(write) pointer, the FIFO is FULL (EMPTY)
11Queues
- LIFO Uses a single pointer for read and write
- After reset, the pointer is initialized to the
first location and LIFO is empty. On a write the
pointer is incremented. If it reaches the last
location, the LIFO is FULL. On a read, the
pointer is decremented. If it reaches the first
element, the LIFO is EMPTY again.
12CAMs, ROMs, and PLAs
13CAMs
- Extension of ordinary memory (e.g. SRAM)
- Read and write memory as usual
- Also match to see which words contain a key
1410T CAM Cell
- Add four match transistors to 6T SRAM
- Multiple CAM cells in the same word are tied to
the same matchine (pre-charged high). The key is
placed on the bitlines. - If the key and the value stored in the cell
differ, the matchine will be pulled down (a). - If all the key bits match all the stored bits,
the matchline would remain high (b)
(a)
(b)
15CAM Cell Operation
- Read and write like ordinary SRAM
- For matching
- Leave wordlines low
- Precharge matchlines using
- clocked pMOS transistors
- Place key on bitlines
- Matchlines evaluate
- Miss line
- Pseudo-nMOS NOR of match lines
- Goes high if no words match
16Read-Only Memories
- Read-Only Memories are nonvolatile
- Retain their contents when power is removed
- Mask-programmed ROMs use one transistor per bit
- Presence or absence determines 1 or 0
- Programmed with metal contacts
17ROM Example
- 4-word x 6-bit ROM
- Represented with dot diagram
- Dots indicate 1s in ROM
Word 0 010101 Word 1 011001 Word 2 100101 Word
3 101010
Looks like 6 4-input pseudo-nMOS NORs
18PROMs and EPROMs
- Programmable ROMs (one-time programmable
memories) - Build array with transistors at every site
- Burn out fuses to disable unwanted transistors
- Electrically Programmable ROMs (EPROM, EEPROM,
Flash) - Use floating gate to turn off unwanted
transistors - Applying a high voltage to the upper gate causes
electrons to be injected onto the floating gate
through the thin oxide
19ROM Implementation
20PLAs
- A Programmable Logic Array performs any function
in sum-of-products form. - Literals inputs complements
- Products / Minterms AND of literals
- Outputs OR of Minterms
- Example Full Adder
21NOR-NOR PLAs
- ANDs and ORs are not very efficient in CMOS
- Dynamic or Pseudo-nMOS NORs are very efficient
since they dont use any series transistors - Use DeMorgans Law to convert to all NORs
22Design for Testability
23Testing
- Testing is one of the most expensive parts of
chips - Logic verification accounts for gt 60 of design
effort for many chips - Debug time after fabrication has enormous
opportunity cost - Less visibility into the inside of the chip
- Shipping defective parts can sink a company
- Example Intel Pentium bug
- Logic error not caught until gt 1M units shipped
- Recall cost 450M (!!!)
24Logic Verification
- Does the chip simulate correctly?
- Usually done at HDL level
- Verification engineers write self-checking test
benches for HDL - Cant test all cases
- Look for corner cases
- Try to break logic design
- Ex 32-bit adder
- Test all combinations of corner cases as inputs
- 0, 1, 2, 231-1, -1, -231, a few random numbers
- Good tests require ingenuity
- Need to perform regression testing
25Silicon Debug
- Test the first chips back from fabrication
- If you are lucky, they work the first time
- If not
- Logic bugs vs. electrical failures
- Most chip failures are logic bugs from inadequate
simulation - Some are electrical failures
- Crosstalk
- Dynamic nodes leakage, charge sharing
- Ratio failures
- A few are tool or methodology failures (e.g. DRC)
- Fix the bugs and fabricate a corrected chip
26Manufacturing Test
- A speck of dust on a wafer is sufficient to kill
chip - Yield of any chip is lt 100
- Must test chips after manufacturing before
delivery to customers to only ship good parts - Manufacturing testers are
- very expensive
- Minimize time on tester
- Careful selection of
- test vectors
27Stuck-At Faults
- How does a chip fail?
- Usually failures are shorts between two
conductors or opens in a conductor - This can cause very complicated behavior
- A simpler model Stuck-At
- Assume all failures cause nodes to be stuck-at
0 or 1, i.e. shorted to GND or VDD - Not quite true, but works well in practice
28Examples
And gate output can be 0 or 1
And gate output stuck to 0
29Delay Faults
- Some faults do not impact the low-speed
functionality of the circuit, but the impact
timing (I.e., affects the at-speed functionality) - e.g., large inverter using parallel nMOS and pMOS
transistors with an open circuit in one of the
nMOS transistors gt increase in tpdf - Another example of a delay fault is cross-talk
induced timing failure
30Observability Controllability
- Observability ease of observing a node by
watching external output pins of the chip - Controllability ease of forcing a node to 0 or 1
by driving input pins of the chip - Make sure all flip-flops can be reset using a
global reset signal - Fault coverage What percentage of chips
internal nodes are checked using test vectors
(98 or higher is desired) - Each chip node is stuck to a 0 (1). Test vectors
are applied to see if the stuck at 0 (1) is
detected. Fault coverage is the percentage of the
nodes that can be detected.
31Test Pattern Generation
- Manufacturing test ideally would check every node
in the circuit to prove it is not stuck. - Apply the smallest sequence of test vectors
necessary to prove each node is not stuck. - Good observability and controllability reduces
number of test vectors required for manufacturing
test. - Reduces the cost of testing
- Motivates design-for-test
- Traditionally, test pattern generation was
manually performed using functional test patters. - As chips have become more complex, Automatic Test
Pattern Generation (ATPG) has become the norm. - Today a combination of ATPG and functional (often
at speed) test patterns are used.
32Design for Test
- Design the chip to increase observability and
controllability - If each register could be observed and
controlled, test problem reduces to testing
combinational logic between registers. - Better yet, logic blocks could enter test mode
where they generate test patterns and report the
results automatically.
33Scan
- Convert each flip-flop to a scan register
- Only costs one extra multiplexer
- Normal mode flip-flops behave as usual
- Scan mode flip-flops behave as shift register
- Contents of flops
- can be scanned
- out and new
- values scanned
- in
34Scannable Flip-flops
Using Mux-Flip-Flops
Using Clock Gating
35Built-in Self-test
- Built-in self-test lets blocks test themselves
- Generate pseudo-random inputs to comb. logic
- Combine outputs into a syndrome
- With high probability, block is fault-free if it
produces the expected syndrome - Another example is MBIST (Memory BIST) for
testing memory - An on-chip controller with a extra logic wrapper
around the memory writes and reads all the memory
bits
36PRSG
- Linear Feedback Shift Register (LFSR)
- Shift register with input taken from XOR of state
- Pseudo-Random Sequence Generator
- Also called PRBS (Pseudo-Random Bit Sequence
Generator) - e.g. 23-1 (7) PRSG
- In general an n bit LFSR
- will have 2n-1 states before
- repeating
37BILBO
- Built-in Logic Block Observer (or BIST- Built-In
Self test) - Combine scan with PRSG signature analysis
Reset mode all FFs are Synchronously
reset Normal mode DFFs Scan mode 3-bit SR
between SI and SO Test mode PRBS generator (D
inputs 0) or signature analyzer (D inputs
combinational logic output)
38Boundary Scan
- Testing boards is also difficult
- Need to verify solder joints are good
- Drive a pin to 0, then to 1
- Check that all connected pins get the values
- Build capability of observing and controlling
pins into each chip to make board test easier - IEEE 1149 JTAG architecture is the standard for
Boundary Scan - All of the I/Os of each IC on the board are
connected serially in a scan chain such that
every pin can be observed and controlled - Chip inputs can be copied into the BS registers
and shifted out to check input connectivity - Known patterns can be scanned in and driven to
outputs to check output connectivity and
connection between chips
39Boundary Scan Example
40Boundary Scan Interface
- Boundary scan is accessed through five pins
- TCK test clock
- TMS test mode select
- TDI test data in
- TDO test data out
- TRST test reset (optional)
41Summary
- Think about testing from the beginning
- Simulate as you go
- Plan for test after fabrication
- If you dont test it, it wont work!
(Guaranteed)