SOC Design: From System to Transistor

About This Presentation

Title:

SOC Design: From System to Transistor

Description:

port ( d0, d1, d2, en, clk : in bit; q0, q1, q2 : out bit ); end; ... port ( clk, reset : in bit; multiplicand, multiplier : in integer; product : out integer ) ... – PowerPoint PPT presentation

Number of Views:318

Avg rating:3.0/5.0

Slides: 179

Provided by: zoransta

Category:

more less

Transcript and Presenter's Notes

Title: SOC Design: From System to Transistor

1
SOC Design From System to Transistor
Zoran Stamenkovic
2
Outline

Modeling Systems
Simulation and Verification
Analog Integrated Circuits
Digital Integrated Circuits
Embedded Memories
Logic Synthesis
Design for Testability
Layout Generation
Design for Manufacturability
SOC Example

3
Modeling Systems

Domains and Levels
ESL Design
Basics of HDL
Gate Modeling
Delay Modeling
Power Modeling
Effects of Parasitics
Logic Optimization

4
Domains and Levels

Open Systems Interconnection (OSI) model of
network communication
Local area network (LAN) technologies are defined
by standards that describe unique functions at
both the Physical and the Data Link layers

5
Domains and Levels

802.11 Wireless LAN modem
Modulates outgoing digital signals from a
computer or other digital device to an analogue
(radio) signal
Demodulates the incoming analogue (radio) signal
and converts it to a digital signal for the
digital device

6
MIMO and MIMAX WLAN Modems
Domains and Levels
Signal processing performed in the analogue RF
domain Number of the digital basebands reduced to
a single one
Signal processing performed in the digital
baseband
7
Domains and Levels
8
Behavioral Domain
9
Structural Domain
10
Physical Domain
11
Electronic System Level Design

The point of a system level model is to capture
the intent of the design
Design does exactly what it is defined to do, and
the model is the definition of what the design
does
It allows software developers to test their code
on a working model
The value of system level modeling is in helping
us to understand the implications of our intent
To explore responses to the stimulus in an useful
way
ESL Languages
UML, SystemC, SystemVerilog
ESL Verification
No amount of experimentation can ever prove me
right a single experiment can prove me wrong
Albert Einstein
The system level testbench languages and
methodologies that exist today are woefully
inadequate
If one tries to capture enough information in ESL
to verify RTL, then one might as well write RTL

12
Electronic System Level Design

The environment that provides models of
memories, connectors, and queues that can be
interconnected with configured processors into an
overall system model
Processor and device interfaces are at the
transaction level
Transaction-level modeling requests for SOC
architecture assembly and simulation tools
If RTL IP blocks present, HW/SW co-verification
tools needed

13
Electronic System Level Design
14
Hardware Description Languages

Motivation for HDL
Increased hardware complexity
Design space exploration
Inexpensive alternative to prototyping
General features
Support for describing circuit connectivity
High-level programming language support for
describing behavior
Support for timing information (constraints,
etc.)
Support for concurrency
VHDL
IEEE Standard 1076-1987
IEEE Standard 1076-1993
Extension VHDL-AMS-1999
Verilog
IEEE Standard 1364-1995
IEEE Standard 1364-2000

15
Modeling Interfaces

Entity (VHDL) or Module (Verilog) declaration
Describes the input/output ports of a module

16
Modeling Behavior

Architecture Body (VHDL)
Describes an implementation of an entity
May be several per entity
Module (Verilog)
Is unique
Behavioral Architecture
Describes the algorithm performed by the module
Contains
Procedural Statements, each containing
Sequential Statements, including
Assignment Statements and
Wait Statements

17
Behavior Example
entity reg3 is port ( d0, d1, d2, en, clk in
bit q0, q1, q2 out bit )end architectur
e behav of reg3 isbegin process ( d0, d1, d2,
en, clk ) begin if en '1' and clk '1'
then q0 lt d0 after 5 ns q1 lt d1 after
5 ns q2 lt d2 after 5 ns end if end
process end
timescale 1ns/10ps module reg3 ( d0, d1, d2, en,
clk, q0, q1, q2 ) input d0, d1, d2, en,
clk output q0, q1, q2 reg q0, q1, q2 always
_at_ ( d0 or d1 or d2 or en or clk ) if ( en
clk ) begin q0 lt 5 d0
q1 lt 5 d1 q2 lt 5 d2
end endmodule
VHDL
Verilog
18
Modeling Structure

Structural Architecture
Implements the module as a composition of
components
Contains
Signal Declarations (entity ports are also
signals)
Declare internal connections
Component Instances
Instantiate previously declared
entity/architecture pairs
Port Maps in component instances
Connect signals to component ports
Wait Statements
Suspend a process or procedure

19
Structure Example
20
Structure Example
21
Structure Example
22
Mixing Behavior and Structure

An architecture can contain both behavioral and
structural parts
Process Statements and Component Instances
Collectively called Concurrent Statements
Processes can read and assign to signals
Example Register-transfer-language model
Data-path described structurally
Control section described behaviorally

23
Mixed Example
24
Mixed Example
entity multiplier is port ( clk, reset in
bit multiplicand, multiplier in
integer product out integer
)end architecture mixed of multiplier
is signal partial_product, full_product
integer signal arith_control, result_en,
mult_bit, mult_load bit begin arith_unit
entity work.shift_adder(behavior) port map (
addend gt multiplicand, augend gt
full_product, sum gt partial_product, ad
d_control gt arith_control ) result entity
work.reg(behavior) port map ( d gt
partial_product, q gt full_product, en gt
result_en, reset gt reset ) ...
25
Mixed Example
multiplier_sr entity work.shift_reg(behavior
) port map ( d gt multiplier, q gt
mult_bit, load gt mult_load, clk gt clk
) product lt full_product control_section
process is -- variable declarations for
control_section -- begin -- sequential
statements to assign values to control
signals -- wait on clk, reset end process
control_section end
26
Logic Functions

Function
f ab ab a is a variable, a and a are
literals, ab is a term
Irredundant Function
No literal can be removed without changing its
value
Implementing logic functions is non-trivial
No logic gates in the library for all logic
expressions
A logic expression may map into gates that
consume a lot of area, time, or power
A set of functions f1, f2, ... is complete if
every Boolean function can be generated by a
combination of the functions from the set
NAND is a complete set
NOR is a complete set
AND and OR are not complete
Transmission gates are not complete
Incomplete set of logic gates
No way to design arbitrary logic

27
Inverter
28
Inverter
29
Switches

Complementary switch produces full-supply
voltages for both logic 0 and logic 1
n-type transistor conducts logic 0
p-type transistor conducts logic 1

30
NAND Gate
31
NOR Gate
32
AOI/OAI Gates

AOI and/or/invert
OAI or/and/invert
Implement larger functions
Pull-up and pull-down networks are compact
Smaller area, higher speed than NAND/NOR network
equivalents
AOI312
And 3 inputs
And 1 input (dummy)
And 2 inputs
Or together these terms
Invert

out abc
33
Logic Levels

Solid logic 0/1 defined by VSS/VDD
Inner bounds of logic values VL/VH are not
directly determined by circuit properties, as in
some other logic families

Levels at output of one gate must be sufficient
to drive next gate

34
Inverter Transfer Curve

Choose threshold voltages at points where slope
of transfer curve is -1
Inverter has
High gain between VIL and VIH points
Low gain at outer regions of transfer curve
Note that logic 0 and 1 regions are not equally
sized
In this case, high pull-up resistance leads to
smaller logic 1 range
Noise margins are VDD-VIH and VIL-VSS
Noise must exceed noise margin to make second
gate produce wrong output

35
Inverter Delay

Only one transistor is on at the time
Rise time (pull-up on)
Fall time (pull-up off)

Resistor model of transistor
Ignores saturation region
Mischaracterizes linear region
Gives acceptable results

36
RC Model for Delay

Delay
Time required for gates output to reach 50 of
final value
Transition time
Time required for gates output to reach 10
(logic 0) or 90 (logic 1) of final value
Gate delay based on RC time constant
Vout(t) VDD exp-t/(RnRL)CL
td 0.69 RnCL
tf 2.3 RnCL
0.5 mm process
Rn 3.9 kW
CL 0.68 fF
td 0.69 x 3.9 x .68E-15 1.8 ps
tf 2.3 x 3.9 x .68E-15 6.1 ps
For pull-up time, use pull-up resistance
Current source model (in power/delay studies)
tf CL (VDD-VSS)/0.5 k (W/L) (VDD-VSS -Vt)2
Fitted model
Fit curve to measured circuit characteristics

37
Step Input (VGS VDD) Approximation
38
Body Effect

Source voltage of gates in middle of network may
not equal substrate voltage
Difference between source and substrate voltages
causes body effect

To minimize body effect
Put early arriving signals at transistors closest
to power supply

39
Power Consumption

Clock frequency
f 1/t
Energy
E CL(VDD - VSS)2
Power
E x f f CL(VDD - VSS)2
Almost all power consumption comes from switching
behavior
A single cycle requires one charge and one
discharge of capacitor
Static power dissipation
Comes from leakage currents
Surprising result
Resistance of the pull-up/pull-down transistor
drops out of energy calculation
Power consumption is independent of the sizes of
the pull-up and pull-down transistors
Static CMOS power-delay product is independent of
frequency
Voltage scaling depends on this fact

40
Effects of Parasitics

Capacitance on power supply is not bad
Can be good in absence of inductance
Resistance slows down static gates
May cause pseudo-nMOS circuits to fail

Increasing capacitance/resistance
Reduces input slope
Resistance near source is more damaging
It must charge more capacitance

41
Optimal Sizing

Sometimes, large loads must be driven
Off-chip or by long wires on-chip
Sizing up the driver transistors only pushes back
the problem
Driver now presents larger capacitance to earlier
stage
Use a chain of inverters
Each stage has transistors larger than previous
stage
a is the driver size ratio, Cbig/Cd an,
ln(Cbig/Cd) n lna
Minimize total delay through the driver chain
ttot ln(Cbig/Cd)(a/lna)td
Optimal driver size ratio is aopt e
Optimal number of stages is nopt ln(Cbig/Cd)

42
Driving Large Fan-Out

Fan-out adds capacitance
Increase sizes of driver transistors
Must take into account rules for driving large
loads
Add intermediate buffers
This may require/allow restructuring of the logic

43
Path Delay

Network delay is measured over paths through
network
Can trace a causality chain from inputs to
worst-case output
Critical path creates longest delay
Can trace transitions which cause delays that are
elements of the critical path delay
To reduce circuit delay, speed up the critical
path
Reducing delay off the path doesnt help
There may be more than one path of the same delay
Must speed up all equivalent paths to speed up
circuit

44
False Paths

Logic gates are not simple nodes
Some input changes dont cause output changes
A false path is a path which cannot be exercised
due to Boolean gate conditions
False paths cause pessimistic delay estimates

45
Logic Transformations

Rewrite by using sub-expressions
Logic rewrites may affect gate placement
Flattening logic
Increases gate fan-in
Logic synthesis programs
Transform Boolean expressions into logic gate
networks in a particular library

Deep Logic
Shallow Logic
46
Logic Optimization

Optimization goals
Minimize area, meet delay constraint
Technology-independent optimization
Works on Boolean expression equivalent
Estimates size based on number of literals
Uses factorization, resubstitution, minimization,
etc.
Uses simple delay models
Technology-dependent optimization
Maps Boolean expressions into a particular cell
library
May perform some optimizations on addition to
simple mapping
Allows more accurate delay models

47
Simulation and Verification

Simulation
Verification
Annotation

48
Simulation

Simulation
Tests the functionality of a designs elaborated
model
Needs a test bench and a simulation tool
Advances in discrete time steps
Test Bench
Includes an instance of the design under test
Applies sequences of test values to inputs
Monitors signal values on outputs using simulator
Simulation Tools
NCSIM (Cadence)
VSIM (Mentor Graphics)
VCS (Synopsys)

49
Event-Driven Simulation

Event-driven simulation is designed for digital
circuit characteristics
Small number of signal values
Relatively sparse activity over time
Event-driven simulators try to update only those
signals which change in order to reduce CPU time
requirements
An event is a change in a signal value
A time-wheel is a queue of events
Simulator traces structure of circuit to
determine causality of events
Event at input of one gate may cause new event at
gates output

50
Switch Simulation

Special type of event-driven simulation optimized
for MOS transistors
Treats the transistor as a switch
Takes capacitance into account to model charge
sharing
Can also be enhanced to model the transistor as a
resistive switch

51
Test Bench Example
entity test_bench isend architecture test_reg3
of test_bench is signal d0, d1, d2, en, clk, q0,
q1, q2 bit begin dut entity
work.reg3(behav) port map ( d0, d1, d2, en,
clk, q0, q1, q2 ) stimulus process
is begin d0 lt 1 d1 lt 1 d2 lt 1
wait for 20 ns en lt 0 clk lt 0 wait
for 20 ns en lt 1 wait for 20 ns clk lt
1 wait for 20 ns d0 lt 0 d1 lt 0
d2 lt 0 wait for 20 ns wait end
process stimulus end
52
Verification

To test a refinement of a design
Low-level structural model must be functionally
the same as a corresponding behavioral model
To include two instances of a design in the test
bench
To stimulate both with same test values on inputs
To compare values of outputs for equality
To take account of timing differences
Zero delay
Unit delay
Gate delay
RC delay

53
Verification Example
architecture regression of test_bench is signal
d0, d1, d2, d3, en, clk bit signal q0a, q1a,
q2a, q3a, q0b, q1b, q2b, q3b bit begin dut_a
entity work.reg4(struct) port map ( d0, d1,
d2, d3, en, clk, q0a, q1a, q2a, q3a ) dut_b
entity work.reg4(behav) port map ( d0, d1, d2,
d3, en, clk, q0b, q1b, q2b, q3b ) stimulus
process is begin d0 lt 1 d1 lt 1 d2 lt
1 d3 lt 1 wait for 20 ns en lt 0
clk lt 0 wait for 20 ns en lt 1 wait
for 20 ns clk lt 1 wait for 20
ns wait end process stimulus ...
54
Verification Example
verify process is begin wait for 10
ns assert q0a q0b and q1a q1b and q2a
q2b and q3a q3b report implementations have
different outputs severity error wait on
d0, d1, d2, d3, en, clk end process verify end
architecture regression
55
Annotation

Standard Delay Format (SDF) annotation
Design timing is stored in an SDF file
Used to iteratively improve design
Updates a more-abstract design with information
from later design stages
Annotation of logic schematic with extracted
parasitic resistances and capacitances
Back annotation requires tools to know more about
each other
Simulation tools
Synthesis tools
Layout tools

56
Standard Delay Format
(CELL (CELLTYPE "exnor2_1") (INSTANCE
i_aes_wr/U_ALG/U6533) (DELAY (ABSOLUTE
(IOPATH a x (0.6621.0451.045)
(0.6821.0761.076)) (IOPATH b x
(1.3791.4161.416) (1.4541.4921.492)) )
) ) ... (CELL (CELLTYPE "mux2_2") (INSTANCE
i_mips/u0/ejt_tap\/pa_addr_reg_next\/bit_00i/U1)
(DELAY (ABSOLUTE (IOPATH d0 x
(0.3950.3950.395) (0.4640.4640.464))
(IOPATH d1 x (0.3870.4030.403)
(0.4470.4770.477)) (IOPATH sl x
(1.7681.7811.781) (1.8791.8921.892)) )
) ) )

(DELAYFILE
(SDFVERSION "OVI 1.0")
(DESIGN "tcp_1_chip")
(DATE "Fri Apr 30 094822 2004")
(VENDOR "cdr3synPwcslV225T125")
(PROGRAM "Synopsys Design Compiler cmos")
(VERSION "2003.06")
(DIVIDER /)
(VOLTAGE 2.252.252.25)
(PROCESS)
(TEMPERATURE 125.00125.00125.00)
(TIMESCALE 1ns)
(CELL
(CELLTYPE "tcp_1_chip")
(INSTANCE)
(DELAY
(ABSOLUTE
(INTERCONNECT U5/x U81/a (0.0000.0000.000))
(INTERCONNECT U73/x U74/a (0.0000.0000.000))

57
Analog Integrated Circuits

Filters
Amplifiers
Phase Lock Loop
Voltage Control Oscillator
Modulator/Demodulator

58
Fairchild Semiconductor µA741 Op-Amp

In 1963, a 26-year-old engineer named Robert
Widlar designed the first monolithic op-amp IC,
the µA702
Price at the beginning was 300
Fairchild and competitors have sold it in the
hundreds of millions
Now, for 300 you can get about a thousand of
todays 741 chips

59
Signetics NE555 Timer

A simple IC from 1971 that could function as a
timer or an oscillator
It would become a best seller in analog
semiconductors
Kitchen appliances
Toys
Spacecraft
A few thousand other things
Many billions have been sold

60
Intersil ICL8038 Waveform Generator

A generator of sine, square, triangular,
sawtooth, and pulse waveforms from 1983
Countless applications
Music synthesizers
Blue boxes
Hundreds of millions sold
Intersil discontinued the production in 2002

61
LNA in BiCMOS Technology
62
PLL for 802.11a WLAN
63
Oscillator
64
Modulator
65
Digital Integrated Circuits

Adders
Multipliers
Shifters
Carry Units
Arithmetic-Logic Units

66
Full Adder

Computes one-bit sum and carry
si ai ? bi ? cin
cout aibi aici bicin
Ripple-carry adder n-bit adder built from full
adders
Delay of ripple-carry adder goes through all
carry bits

67
Combinational Multiplier

0 1 1 0 multiplicand
x 1 0 0 1 multiplier
0 1 1 0
0 0 0 0
0 0 1 1 0
0 0 0 0
0 0 0 1 1 0
0 1 1 0
0 1 1 0 1 1 0

68
Array Multiplier

Array multiplier is an efficient layout of a
combinational multiplier
Array multipliers may be pipelined to decrease
clock period at the expense of latency

69
Wallace Tree

Reduces depth of adder chain
Built from carry-save adders
Three inputs a, b, c
Produces two outputs y, z
y z a b c
Carry-save equations
yi parity (ai,bi,ci)
zi majority (ai,bi,ci)
At each stage, i numbers are combined to form
2i/3-sums
Final adder completes the summation
Wiring is more complex

70
Serial-Parallel Multiplier

Used in serial-arithmetic operations
Multiplicand can be held in place by register
Multiplier is shifted into array

71
Barrel Shifter

Can perform n-bit shifts in a single cycle
Accepts 2n data inputs and n control signals,
producing n data outputs
Selects arbitrary contiguous n bits out of 2n
input buts
Examples
Right shift data into top, 0 into bottom
Left shift 0 into top, data into bottom
Rotate data into top and bottom

72
Barrel Shifter

Two-dimensional array of 2n vertical X n
horizontal cells
Input data travels diagonally upward
Output wires travel horizontally
Control signals run vertically
Exactly one control signal is set to 1, turning
on all transmission gates in that column
Large number of cells, but each one is small
Delay is large, considering long wires and
transmission gates

73
Carry-Lookahead Unit

First computes carry propagate and generate
Pi ai bi
Gi aibi
Computes sum and carry from P and G
si ci ? Pi ? Gi
ci1 Gi Pici
Can recursively expand carry formula
ci1 Gi Pi(Gi-1 Pi-1ci-1)
ci1 Gi PiGi-1 PiPi-1 (Gi-2 Pi-1ci-2)
Expanded formula does not depend on intermediate
carries
Allows carry for each bit to be computed
independently

74
Depth-4 Carry-Lookahead Unit

Deepest carry expansion requires gates with large
fan-in
Large and slow
Carry-lookahead unit requires complex wiring
between adders and lookahead unit
Values must be routed back from lookahead unit to
adder

75
Carry-Skip Adder

Looks for cases in which carry out of a set of
bits is identical to carry in
Typically organized into m-bit stages
If ai bi for every bit in stage, then bypass
gate sends stages carry input directly to carry
output

76
Carry-Select Adder

Computes two results in parallel, each for
different carry input assumptions
Uses actual carry in to select correct result
Reduces delay to multiplexer

77
Manchester Carry Chain

Precharged carry chain which uses P and G signals
Propagate signal connects adjacent carry bits
Generate signal discharges carry bit
Worst-case discharge path goes through entire
carry chain

78
Serial Adder

May be used in signal-processing arithmetic where
fast computation is important but latency is
unimportant
LSB control signal clears the carry shift register

79
Arithmetic-Logic Unit

Computes a variety of logical and arithmetic
functions based on opcode
May offer complete set of functions of two
variables or a subset
Built around adder, since carry chain determines
delay
Function block may be used to compute required
intermediate signals for a full-function ALU
Transmission gates may introduce significant delay

80
Arithmetic-Logic Unit

P and G compute intermediate values from inputs
May not correspond to carry lookahead P and G for
non-addition functions
Add unit is adder of choice
Output unit computes from sum, propagate signal

81
Acorn Computers ARM1 Processor

32-bit RISC microprocessor from 1985
The simplicity made all the difference
Small, low power, and easy to program
ARM architecture has become the dominant embedded
processor
More than 10 billion ARM cores have been used in
all sorts of gadgetry, including the iPhone

82
Computer Cowboys Sh-Boom Processor

Russell Fish and Chuck Moore 1988 found a way to
have the processor run its own super fast
internal clock while still staying synchronized
with the rest of the computer
In the years since Sh-Booms invention, the speed
of processors had by far surpassed that of
motherboards, and so practically every maker of
computers and consumer electronics wound up using
the same solution
Since 2006, Patriot Scientific (and Moore) have
reaped over US 125 million in licensing fees
from Intel, AMD, Sony, Olympus, and others

83
8-bit Microprocessors

Microchip Technology PIC16C84 8-bit
microcontroller in 1993
Incorporates EEPROM
Does not need UV light to be erased as EPROM needs

Radiation-hardened RCA CDP 1802 8-bit
microprocessor in 1976
One of the first, if not the first, CMOS
processors
Low power consumption, wide range of operating
voltages and military operating temperature range

84
Embedded Memories

Read-Only Memory
Static Random-Access Memory
Dynamic Random-Access Memory
Memory Generators

85
Memory Architecture

Address is divided into row and column
Row may contain full word or more than one word
Selected row drives/senses bit lines in columns
Amplifiers/drivers read/write bit lines

86
Read-Only Memory (ROM)

ROM core is organized as an array of NOR gates
Pull-down transistors of NOR determine
programming
Erasable ROMs require special processing that is
not typically available
ROMs on digital ICs are generally mask-programmed
Placement of pull-downs determines ROM contents

87
Static Random-Access Memory (SRAM)

Core cell uses six-transistor circuit to store
value
Value is stored symmetrically
Both true and complement are stored on
cross-coupled transistors
SRAM retains value as long as power is applied to
circuit
Read
Precharge bit and bit high
Set select line high from row decoder
One bit line will be pulled down
Write
Set bit/bit to desired (complementary) values
Set select line high
Drive on bit lines will flip state if necessary

88
SRAM Sense Amplifier

Differential pair
Takes advantage of complementarity of bit lines
One bit line goes low
One arm of diff pair reduces its current, causing
compensating increase in current of another arm
Sense amp can be cross-coupled to increase speed

89
Dynamic Random-Access Memory (DRAM)

Cell can easily be made with a CMOS digital
technology process
Dynamic RAM loses value due to charge leakage
Must be refreshed
Value is stored on gate capacitance of transistor
t1
Read
read 1, write 0, read_data is precharged
t1 will pull down read_data if 1 is stored
Write
read 0, write 1, write_data value
Guard transistor writes value onto gate
capacitance
Modern commercial DRAMs use one-transistor cell

90
Toshiba NAND Flash Memory

In 1980, Fujio Masuoka recruited four engineers
to a project aimed at designing a memory chip
that could store lots of data and would be
affordable
Team came up with a variation of EEPROM that
featured a memory cell consisting of a single
transistor (at the time, conventional EEPROM
needed two transistors per cell)
Why is it named flash?
Because of the chips ultrafast erasing
capability
In 1984 Masuoka presented a paper at the IEEE
International Electron Devices Meeting
In 1988 Intel developed a type of flash based on
NOR logic gates (a 256-kilobit chip)
Toshibas first NAND flash (greater storage
densities but trickier to manufacture) hit the
market in 1989

91
Memory Generators

A software tool which can create memories (ROM or
RAM blocks) in a range of sizes as needed
The customer usually wants a particular number of
words (depth) and bits (width) for each memory
ordered
Each of the final building blocks (physical
layout) will be implemented as a stand-alone,
densely packed, pitch-matched array
Complex layout generators and state-of-the-art
logic and circuit design techniques offer
Embedded memories of extreme density and
performance
Each memory generator is a set of various,
parameterized generators
Layout generator generates an array of custom,
pitch-matched leaf cells
Schematic generator and Net-lister extracts a
net-list used for both layout vs. schematic and
functional verification
Function and Timing model generators create
models for gate level simulation, dynamic/static
timing analysis and synthesis
Symbol generator generates schematic
Critical Path generator is used for both circuit
design and timing characterization

92
Logic Synthesis

Logic Synthesis Flow
Optimization
Technology Mapping
Low-Power Techniques

93
Logic Synthesis Flow

Goal is to create a logic gate network which
performs a given set of functions
Input is Boolean formulae
Output is gates implementing Boolean functions
Several iterations needed for generation of the
optimized gate-level description
Logic synthesis
Maps onto available gates
Restructures for delay, area, testability, power,
etc.
Automated logic synthesis has enabled
Enormous reduction of the time needed for
conversion of a design from high-level to
gate-level description
Saving of designer resources for architectural
and RTL descriptions, and optimization of the
standard cell library

94
High-Level Synthesis

Scheduling determines
Number of clock cycles required
As-soon-as-possible (ASAP) schedule puts every
operation as early in time as possible
As-late-as-possible (ALAP) schedule puts every
operation as late in schedule as possible
Binding determines
Area and cycle time
Area tradeoffs must consider
Shared function units vs. multiplexers and
control
Delay tradeoffs must consider
Cycle time vs. number of cycles

95
Logic Synthesis Phases

Technology-independent optimizations
A Boolean network is the main representation of
the logic functions
Each node can be represented as sum-of-products
(or product-of-sums)
Functions in the network need not correspond to
logic gates
Technology mapping (library binding)
Design transformation from technology-independent
to technology-dependent
Technology-dependent optimizations
Work in the available set of logic gates

96
Technology-Independent Optimization

Area is estimated by number of literals
Literal is true or complement form of a variable
Simplification
Rewrites a node to reduce the number of literals
in it
Network restructuring
Introduces new nodes for common factors
Collapses several nodes into one new node
Delay restructuring
Changes factorization to reduce path length

97
Covers and Cubes

Function is defined by
On-set set of inputs for which output is 1
Off-set set of inputs for which output is 0
Dont-care-set set of inputs for which output is
dont-care
Each way to write a function as a sum-of-products
is a cover
It covers the on-set
A cover is composed of cubes
Cubes are product terms that define a subspace
cube in the function space

98
Covers and Optimizations

Larger cover
x1 x2 x3 x1 x2 x3 x1 x2 x3 x1 x2 x3
Requires four cubes (12 literals)
Smaller cover
x2 x3 x1 x3 x1 x2 x3
Requires three cubes (7 literals)
x1 x2 x3 is covered by two cubes
Dont-cares
Can be implemented in either on-set or off-set
Provide the greatest opportunities for
minimization in many cases
Espresso
A two-level logic optimizer
Expands, makes irredundant and reduces
Optimization loop refines cover to reduce its size

99
Factorization

Based on division
Formulate candidate divisor
Test how it divides into the function
If g f/c, we can use c as an intermediate
function
Algebraic division
Doesnt take into account Boolean simplification
Less expensive then Boolean division
Three steps
Generate potential common factors and compute
literal savings if used
Choose factors to substitute into network
Restructure the network to use the new factors
Algebraic/Boolean division is used to implement
first step

100
Technology Mapping

Rewrites Boolean network
In terms of available logic functions
Optimizes for
Area
Delay
Can be viewed as a pattern matching problem
Find pattern match which minimizes area/delay
cost
Procedure
Write Boolean network in canonical NAND form
Write each library gate in canonical NAND form
Assign cost to each library gate
Use dynamic programming to select minimum-cost
cover of network by library gates

101
Breaking into Trees
not optimal, but reasonable cuts usually work well
102
Mapping Example
after three levels of matching
103
Mapping Example
after four levels of matching
104
Low Power Techniques

Architecture-driven supply voltage scaling
Add extra logic to increase parallelism so that
system can run at lower frequency
Power improvement for n parallel units over Vref
Pn(n) 1 Ci(n)/nCref Cx(n)/Cref(V/Vref)
Dynamic voltage and frequency scaling
Decreased to parts of the circuit where it does
not adversely affect the performance
Dynamic scaling is regulated by software based on
system load
Reducing capacitances
Parasitic capacitances of the transistors
Parasitic capacitances of the wires

105
Low Power Techniques

Reducing switching activity
Deactivate the clock to unused registers (clock
gating)
Deactivate signals if not used (signal gating)
Deactivate VDD for unused hardware blocks (power
gating)

Distributed clocks Globally Asynchronous Locally
Synchronous
Eliminating centrally synchronous clocks and
utilizing local clocks
Distinct local clocks, possibly running at
different frequencies

106
Design for Testability

DFT Methods
Scan Design
Test Pattern Generation
Built-In Self-Test

107
Design for Testability Methods

Make the system as testable as possible
Keep minimum cost in hardware and testing time
Use knowledge of architecture to help in
selection of testability points
Modify architecture to improve testability
DFT for digital circuits
Ad-hoc methods
Avoid asynchronous feedback
Make flip-flops initializable
Avoid redundant gates, large fan-in gates and
gated clocks
Provide test control for difficult-to-control
signals
Consider ATE requirements (tri-states, etc.)
Structured methods
Scan Design
Built-in self-test (BIST)
Boundary scan

108
Scan Design

Circuit is designed using pre-specified design
rules
Test structure (hardware) is added to the
verified design
Add a test control (TC) primary input
Replace flip-flops by scan flip-flops (SFF) and
connect to form one or more shift registers
(scan-chains) in the test mode
Make input/output of each scan-chain
controllable/observable from primary
input/primary output
Use combinational ATPG to obtain tests for all
testable faults in the combinational logic
Add shift register tests and convert ATPG tests
into scan sequences for use in manufacturing test
Full scan is expensive
Must roll out and roll in state many times during
a set of tests
Partial scan selects some registers (not all) for
scanability to reduce the chain length
Analysis is required to choose which registers
are best for scan

109
Scanable Flip-Flop
110
Level-Sensitive Scanable Flip-Flop
111
Scan Structure
112
Combinational Test Vectors
113
Testing Scan Chain

Scan-chain must be tested prior to application of
scan test sequences
A shift sequence 00110011 . . . of length nsff4
in scan mode (TC0)
Produces 00, 01, 11 and 10 transitions in all
flip-flops
Observes the result at SCANOUT output
Total scan test length
(ncomb 2) nsff ncomb 4 clock periods
Example
2,000 scan flip-flops, 500 comb. vectors, total
scan test length 106 clocks
Multiple scan-chains reduce test length

114
Testing and Faults

Errors are introduced during manufacturing
Testing weeds out infant mortality
Varieties of testing
Functional testing
Performance testing
Fault model
Possible locations of faults
I/O behavior produced by the fault
With a fault model, we can test the network for
every possible instantiation of that type of
fault
It is difficult to enumerate all types of
manufacturing faults
Testing procedure
Set inputs
Observe output
Compare fault-free and observed output

115
Stuck-At-0/1 Faults

Logic gate output is always stuck at 0 or 1
independently on input values
Correspondence to manufacturing defects depends
on logic family
Experiments show that 100 stuck-at-0/1 fault
coverage corresponds to high overall fault
coverage
Testing NAND
Three ways to test it for stuck-at-0
Only one way to test it for stuck-at-1
Testing NOR
Three ways to test it for stuck-at-1
Only one way to test it for stuck-at-0

116
Multiple Test Example

Can test both NANDs for stuck-at-0 simultaneously
abc 000
Cannot test both NANDs for stuck-at-1
simultaneously due to inverter
Must use two vectors
Must also test inverter

117
Stuck-At-Open/Closed Model

Transistors always on/off
t1 is stuck open (switch cannot be closed)
No path from VDD to output capacitance
Testing requires two cycles
Must discharge capacitor
Try to operate t1 to charge capacitor

118
Combinational Testing Example

Two parts of testing
Controlling the inputs of (possibly interior)
gates
Observing the outputs of (possibly interior)
gates
Delay faults
Gate delay model assumes that all delays are
lumped into one gate
Path delay model takes into account the delay of
a path through network
Performance problems
Functional problems in some types of circuits

119
Testing Procedure

Goal
Test gate D for stuck-at-0 fault
First step
Justify 0 values on gate inputs
Work backward from gate to primary inputs
w1 0 (A output 0)
i1 i2 1
Observe the fault at a primary output
o1 gives different values if D is true/faulty
Work forward and backward
Fs other input must be 0 to detect true/fault
Justify 0 at Es output
In general, may have to propagate fault through
multiple levels of logic to primary outputs

120
Redundancy and Testing

Redundant logic can mask faults
Testing NOR for SA0 requires setting both inputs
to 0
Network topology ensures that one NOR input (for
instance b) will always be 1
Function reduces to 0
f ((ab) b) (a b)b 0
Redundant logic can introduce delay faults and
other problems

121
Sequential Testing

Much harder than combinational testing
Cant set memory element values directly
Must apply sequences
To put machine in proper state for test
To observe value of test
Testing of NAND for stuck-at-1
Set both NAND inputs to 1
Primary input i1 can be controlled directly
Lower input is 1 if ps0/ps1 1

122
Time-Frame Expansion

A model for sequential test
Unroll machine in time
A single-stuck-at fault in sequential machine
appears to be the multiple-stuck-at fault

123
Test Pattern Generation

Automatic test pattern generator (ATPG) generates
a set of test vectors
Boolean network (combinational ATPG)
Sequential machine (sequential ATPG)
D (from Discrepancy) allows us to quickly write
fault
D value on a node means that good and faulty
circuits have different values at that point
If a test for a particular fault exists,
D-algorithm will find it by an exhaustive search
of all sensitized paths
Start at the faulty gate
Suppose initially a stuck-at fault on gate output
Primitive D-cube of failure (PDCF) of gate
summarizes minimal assignment of input values to
highlight fault
Propagation D-cube (PDC) has D or D on output
and on at least one input
Summarizes non-controlling values for other
inputs to allow propagation of D signal

124
PODEM Algorithm

PODEM stands for Path-Oriented DEcision Making
Circuit-based, fault-oriented ATPG algorithm
Goal
Propagate D value to primary outputs
Signal values are explicitly assigned at primary
inputs only
Other values are computed by implication
Backtracking means reassigning primary inputs
when a contradiction occurs
Uses implicit enumeration
Uses five values 0, 1, D, D, and X
Start all values at X
In worst case, must examine all possible inputs
Can be implemented to run quickly

125
Fault Propagation Example
126
Built-In Self-Test (BIST)

Includes on-chip machine responsible for
Generating tests
Evaluating correctness of tests
Allows many tests to be applied
Cant afford large memory for test results
Rely on compression and statistical analysis
Uses a linear-feedback shift register (LFSR) to
generate a pseudo-random sequence of bit vectors

127
BIST Architecture

One LFSR generates test sequence
Another LFSR captures/compresses results
Can store a small number of signatures which
contain expected compressed results for valid
system
Usually used for testing memory blocks

128
Layout Generation

Layout Generation Flow
Design Rules
Layout Tools
Standard Cells
Floorplanning
Placement
Routing
Clock Tree
Pads

129
Layout Generation Flow

Library Exchange Format (LEF) files
To create a library database (standard cells, I/O
cells, and macro blocks)
Timing Library Format (TLF) file
Timing constraints
General Constraints Format (GCF) file
Design constraints
Verilog net-list
To create a design database

130
Layout Generation Flow

Floorplanning
To create a core area with rows (or columns) and
I/O rows around the core area
Power planning and routing
To plan, modify and rout power paths, power rings
and power stripes
Placement
An I/O constraints file may be used to place the
I/O pads
Block placement
Cell placement
Size adjustment
To estimate the die size
To resize the design to make it routable

131
Layout Generation Flow

Generating clock trees
The clock buffer space and clock net must be
defined
Generating clock trees is iterative process
At this point, the physical net-list differ from
the logical (original) net-list
Placement optimization
To resize gates and insert buffers to correct
timing and electrical violations
Routing
To perform both global and final route on a
placed design
Verification
To check for shorts and design rule violations

132
Design Rules

Masks are tools for manufacturing
Manufacturing processes have inherent limitations
in accuracy
Design rules specify geometry of masks which will
provide reasonable yields
Design rules are determined by experience
MOSIS SCMOS
Designed to scale across a wide range of
technologies
Designed to support multiple vendors
Designed for educational use
Fairly conservative
Lambda (?) design rules
Size of a minimum feature defines ?
Specifying ? particularizes the scalable rules
Parasitics are generally not specified in ??units

133
Wires
134
Transistors
135
Vias

Types of via
Metal1/diff
Metal1/poly
Metal2/metal1
Metal3/metal2
...

Highest via
Cut 3 x 3
Overlap by metal2 1
Minimum spacing 3
Minimum spacing to via1 2

136
Spacings

Diffusion/diffusion
3
Poly/poly
2
Poly/diffusion
1
Via/via
2
Metal1/metal1
3
Metal2/metal2
4
Metal3/metal3
4

137
Overglass

Cut in passivation layer
Connection for bonding wire
Minimum bonding pad
100
Pad overlap of glass opening
6
Minimum pad spacing to unrelated metal2/3
30
Minimum pad spacing to unrelated metal1, poly,
active
15

138
Layout Tools

Layout editors are interactive tools
Design rule checkers identify errors on the
layout
Circuit extractors extract the net-list from the
layout
Connectivity verification systems (CVS) compare
extracted and original net-lists
CADENCE Virtuosos Layout-versus-Schematic (LVS)
tool
Standard cell layouts are created from
pre-designed cells using the custom routing
Silicon Ensemble (CADENCE)
Encounter (CADENCE)
Physical Compiler (SYNOPSYS)

139
Standard Cell Layout

Layout made of small cells
Gates, flip-flops, etc.
Cells are hand-designed
Assembly of cells is automatic
Cells arranged in rows
Wires routed between and through cells
Pitch is the height of a cell
All cells have same pitch, may have different
widths
VDD/VSS connections are designed to run through
cells
A feedthrough area allows wires to be routed over
the cell

140
Floorplanning Strategy

Floorplanning must take into account
Blocks of varying function, size, and shape
Space allocation
Signal routing
Power supply routing
Clock distribution

141
Floorplanning Tips

Develop a wiring plan
Think about how layers will be used to distribute
important wires
Draw separate wiring plans for power and clocking
These are important design tasks which should be
tackled early
Sweep small components into larger blocks
A floorplan with a single NAND gate in the middle
will be hard to work with
Design wiring that looks simple
If it looks complicated, it is complicated
Design planar wiring
Planarity is the essence of simplicity
Do it where feasible (and where it doesnt
introduce unacceptable delay)

142
Placement Metrics

Placement of components interacts with routing of
wires
Quality metrics for layout
Area and delay
Area and delay determined in part by
Wiring
How do we judge a placement without wiring?
Estimate wire length without actually performing
routing

bad placement
good placement
143
Placement Techniques

To construct an initial solution
To improve an existing solution
Pairwise interchange is a simple improvement
metric
Interchange a pair, keep the swap if it helps
wire length
Heuristic determines which two components to swap
Placement by partitioning
Works well for components of fairly uniform size
Partition net-list to minimize total wire length
using min-cut criterion
Kernighan-Lin Algorithm
Computes min-cut criterion, count total net-cut
change
Exchanges sets of nodes to perform hill-climbing
finding improvements where no single swap will
improve the cut
Recursively subdivide to determine placement
detail

144
Routing

Major phases in routing
Global routing assigns nets to routing areas
Detailed routing designs the routing areas
Net ordering determines quality of result
Net ordering is a heuristic
Blocks and wiring
Blocks divide wiring area into routing channels
Large wiring areas may force rearrangement of
block placement
Channel routing
Channel grows in one dimension to accommodate
wires
Pins generally on only two sides
Switchbox routing
Box cannot grow in any dimension
Pins are on all four sides

145
Routing Channels

Tracks form a grid for routing
Spacing between tracks is center-to-center
distance between wires
Track spacing depends on wire layer used
Density (vertical and horizontal)
Gives the number of wire segments crossing a
vertical/horizontal grid segment
Different layers are used for horizontal and
vertical wires
Horizontal and vertical wires can be routed
relatively independently
Placement of cells determines placement of pins
Pin placement determines difficulty of routing
problem

146
Left-Edge Algorithm

Assumes one horizontal segment per net
Sweep pins from left to right
Assign horizontal segment to lowest available
track
Limitations
Some combinations of nets require more than one
horizontal segment per net (a dog-leg wire)
Aligned pins form vertical constraints
Wire to lower pin must be on lower track
Wire to upper pin must be above lower pins wire

147
Global and Detailed Routing