Title: ECE260B CSE241A Winter 2005 Logic Synthesis
1ECE260B CSE241AWinter 2005Logic Synthesis
Website http//vlsicad.ucsd.edu/courses/ece260b-
w05
Slides courtesy of Dr. Cho Moon
2Introduction
- Why logic synthesis?
- Ubiquitous used almost everywhere VLSI is done
- Body of useful and general techniques same
solutions can be used for different problems - Foundation for many applications such as
- Formal verification
- ATPG
- Timing analysis
- Sequential optimization
3RTL Design Flow
HDL
RTL Synthesis
Manual Design
Module Generators
netlist
Logic Synthesis
netlist
Physical Synthesis
layout
Slide courtesy of Devadas, et. al
4Logic Synthesis Problem
- Given
- Initial gate-level netlist
- Design constraints
- Input arrival times, output required times, power
consumption, noise immunity, etc - Target technology libraries
- Produce
- Smaller, faster or cooler gate-level netlist that
meets constraints
Very hard optimization problem!
5Combinational Logic Synthesis
2-level Logic opt
netlist
tech independent
multilevel Logic opt
Logic Synthesis
tech dependent
netlist
Slide courtesy of Devadas, et. al
6Outline
- Introduction
- Two-level Logic Synthesis
- Multi-level Logic Synthesis
- Timing Optimization in Synthesis
7Two-level Logic Synthesis Problem
- Given an arbitrary logic function in two-level
form, produce a smaller representation. - For sum-of-products (SOP) implementation on PLAs,
fewer product terms and fewer inputs to each
product term mean smaller area.
F A B A B C F A B
8Boolean Functions
- f(x) Bn B
- B 0, 1, x (x1, x2, , xn)
- x1, x2, are variables
- x1, x1, x2, x2, are literals
- each vertex of Bn is mapped to 0 or 1
- the onset of f is a set of input values for which
f(x) 1 - the offset of f is a set of input values for
which f(x) 0
9Logic Functions
Slide courtesy of Devadas, et. al
10Cube Representation
Slide courtesy of Devadas, et. al
11Sum-of-products (SOP)
- A function can be represented by a sum of cubes
(products) - f ab ac bc
- Since each cube is a product of literals, this is
a sum of products representation - A SOP can be thought of as a set of cubes F
- F ab, ac, bc C
- A set of cubes that represents f is called a
cover of f. Fab, ac, bc is a cover of
f ab ac bc.
12Prime Cover
- A cube is prime if there is no other cube that
contains it - (for example, b c is not a prime but b is)
- A cover is prime iff all of its cubes are prime
c
b
a
13Irredundant Cube
- A cube of a cover C is irredundant if C fails to
be a cover if c is dropped from C - A cover is irredundant iff all its cubes are
irredudant (for exmaple, F a b a c b c)
c
b
Not covered
a
14Quine-McCluskey Method
- We want to find a minimum prime and irredundant
cover for a given function. - Prime cover leads to min number of inputs to each
product term. - Min irredundant cover leads to min number of
product terms. - Quine-McCluskey (QM) method (1960s) finds a
minimum prime and irredundant cover. - Step 1 List all minterms of on-set O(2n) n
inputs - Step 2 Find all primes O(3n) n inputs
- Step 3 Construct minterms vs primes table
- Step 4 Find a min set of primes that covers all
the minterms O(2m) m primes
15QM Example (Step 1)
- F a b c a b c a b c a b c a b c
- List all on-set minterms
16QM Example (Step 2)
- F a b c a b c a b c a b c a b c
- Find all primes.
17QM Example (Step 3)
- F a b c a b c a b c a b c a b c
- Construct minterms vs primes table (prime
implicant table) by determining which cube is
contained in which prime. X at row i, colum j
means that cube in row i is contained by prime in
column j.
18QM Example (Step 4)
- F a b c a b c a b c a b c a b c
- Find a minimum set of primes that covers all the
minterms - Minimum column covering problem
Essential primes
19ESPRESSO Heuristic Minimizer
- Quine-McCluskey gives a minimum solution but is
only good for functions with small number of
inputs (lt 10) - ESPRESSO is a heuristic two-level minimizer that
finds a minimal solution - ESPRESSO(F)
- do
- reduce(F)
- expand(F)
- irredundant(F)
- while (fewer terms in F)
- verfiy(F)
-
20ESPRESSO ILLUSTRATED
Reduce
21Outline
- Introduction
- Two-level Logic Synthesis
- Multi-level Logic Synthesis
- Timing optimization in Synthesis
22Multi-level Logic Synthesis
- Two-level logic synthesis is effective and mature
- Two-level logic synthesis is directly applicable
to PLAs and PLDs - But
- There are many functions that are too expensive
to implement in two-level forms (too many product
terms!) - Two-level implementation constrains layout
(AND-plane, OR-plane) - Rule of thumb
- Two-level logic is good for control logic
- Multi-level logic is good for datapath or random
logic
23Two-Level (PLA) vs. Multi-Level
Multi-level all logic general automatic partially
technology independent coming can be high
speed some results
- PLA
- control logic
- constrained layout
- highly automatic
- technology independent
- multi-valued logic
- slower?
- input, output, state encoding
24Multi-level Logic Synthesis Problem
- Given
- Initial Boolean network
- Design constraints
- Arrival times, required times, power consumption,
noise immunity, etc - Target technology libraries
- Produce
- a minimum area netlist consisting of the gates
from the target libraries such that design
constraints are satisfied
25Modern Approach to Logic Optimization
- Divide logic optimization into two subproblems
- Technology-independent optimization
- determine overall logic structure
- estimate costs (mostly) independent of technology
- simplified cost modeling
- Technology-dependent optimization (technology
mapping) - binding onto the gates in the library
- detailed technology-specific cost model
- Orchestration of various optimization/transformati
on techniques for each subproblem
Slide courtesy of Keutzer
26Optimization Cost Criteria
- The accepted optimization criteria for
multi-level logic are to minimize some function
of - Area occupied by the logic gates and interconnect
(approximated by literals transistors in
technology independent optimization) - Critical path delay of the longest path through
the logic - Degree of testability of the circuit, measured in
terms of the percentage of faults covered by a
specified set of test vectors for an approximate
fault model (e.g. single or multiple stuck-at
faults) - Power consumed by the logic gates
- Noise Immunity
- Wireability
- while simultaneously satisfying upper or lower
bound constraints placed on these physical
quantities
27Representation Boolean Network
- Boolean network
- directed acyclic graph (DAG)
- node logic function representation fj(x,y)
- node variable yj yj fj(x,y)
- edge (i,j) if fj depends explicitly on yi
- Inputs x (x1, x2,,xn )
- Outputs z (z1, z2,,zp )
Slide courtesy of Brayton
28Network Representation
Boolean network
29Node Representation Sum of Products (SOP)
- Example
- abcabdbdbef (sum of cubes)
- Advantages
- easy to manipulate and minimize
- many algorithms available (e.g. AND, OR,
TAUTOLOGY) - two-level theory applies
- Disadvantages
- Not representative of logic complexity. For
example fadaebdbecdce fabcde - These differ in their implementation by an
inverter. - hence not easy to estimate logic difficult to
estimate progress during logic manipulation
30Factored Forms
- Example (adbc)(cd(eac))(de)fg
- Advantages
- good representative of logic complexity fada
ebdbecdce fabcde ? f(abc)(de) - in many designs (e.g. complex gate CMOS) the
implementation of a function corresponds directly
to its factored form - good estimator of logic implementation complexity
- doesnt blow up easily
- Disadvantages
- not as many algorithms available for manipulation
- hence usually just convert into SOP before
manipulation
31Factored Forms
Note literal count ? transistor count ? area
(however, area also depends on wiring)
32Factored Forms
- Definition a factored form can be defined
recursively by the following rules. A factored
form is either a product or sum where - a product is either a single literal or product
of factored forms - a sum is either a single literal or sum of
factored forms - A factored form is a parenthesized algebraic
expression. - In effect a factored form is a product of sums of
products or a sum of products of sums - Any logic function can be represented by a
factored form, and any factored form is a
representation of some logic function.
33Factored Forms
When measured in terms of number of inputs, there
are functions whose size is exponential in sum of
products representation, but polynomial in
factored form. Example Achilles heel
function There are n literals in the factored
form and (n/2)?2n/2 literals in the SOP form.
Factored forms are useful in estimating area and
delay in a multi-level synthesis and optimization
system. In many design styles (e.g. complex gate
CMOS design) the implementation of a function
corresponds directly to its factored form.
34Factored Forms
Factored forms cam be graphically represented as
labeled trees, called factoring trees, in which
each internal node including the root is labeled
with either or ?, and each leaf has a label of
either a variable or its complement. Example
factoring tree of ((ab)cde)(ab)e
35Reduced Ordered BDDs
- like factored form, represents both function and
complement - like network of muxes, but restricted since
controlled by primary input variables - not really a good estimator for implementation
complexity - given an ordering, reduced BDD is canonical,
hence a good replacement for truth tables - for a good ordering, BDDs remain reasonably small
for complicated functions (e.g. not multipliers) - manipulations are well defined and efficient
- true support (dependency) is displayed
36Technology-Independent Optimization
- Technology-independent optimization is a bag of
tricks - Two-level minimization (also called simplify)
- Constant propagation (also called sweep)
- f a b c b 1 gt f a c
- Decomposition (single function)
- f abcabdacdbcd gt f xy xy
x ab y cd - Extraction (multiple functions)
- f (azbz)cde g (azbz)e h cde
- ?
- f xye g xe h ye x azbz y
cd
37More Technology-Independent Optimization
- More technology-independent optimization tricks
- Substitution
- g ab f abc
- ?
- f g(ac)
- Collapsing (also called elimination)
- f gagb g cd
- ?
- f acadbcd g cd
- Factoring (series-parallel decomposition)
- f acadbcbde gt f (ab)(cd)e
38Summary of Typical Recipe for TI Optimization
- Propagate constants
- Simplify two-level minimization at Boolean
network node - Decomposition
- Local Boolean optimizations
- Boolean techniques exploit Boolean identities
(e.g., a a 0) - Consider f a b a c b a b c c a c
b - Algebraic factorization procedures
- f a (b c) a (b c) b c c b
- Boolean factorization procedures
- f (a b c) (a b c)
Slide courtesy of Keutzer
39Technology-Dependent Optimization
- Technology-dependent optimization consists of
- Technology mapping maps Boolean network to a set
of gates from technology libraries - Local transformations
- Discrete resizing
- Cloning
- Fanout optimization (buffering)
- Logic restructuring
Slide courtesy of Keutzer
40Technology Mapping
- Input
- Technology independent, optimized logic network
- Description of the gates in the library with
their cost - Output
- Netlist of gates (from library) which minimizes
total cost - General Approach
- Construct a subject DAG for the network
- Represent each gate in the target library by
pattern DAGs - Find an optimal-cost covering of subject DAG
using the collection of pattern DAGs - Canonical form 2-input NAND gates and inverters
41DAG Covering
- DAG covering is an NP-hard problem
- Solve the sub-problem optimally
- Partition DAG into a forest of trees
- Solve each tree optimally using tree covering
- Stitch trees back together
Slide courtesy of Keutzer
42Tree Covering Algorithm
- Transform netlist and libraries into canonical
forms - 2-input NANDs and inverters
- Visit each node in BFS from inputs to outputs
- Find all candidate matches at each node N
- Match is found by comparing topology only (no
need to compare functions) - Find the optimal match at N by computing the new
cost - New cost cost of match at node N sum of costs
for matches at children of N - Store the optimal match at node N with cost
- Optimal solution is guaranteed if cost is area
- Complexity O(n) where n is the number of nodes
in netlist
43Tree Covering Example
Find an optimal (in area, delay, power)
mapping of this circuit
into the technology library (simple example
below)
Slide courtesy of Keutzer
44Elements of a library - 1
Element/Area Cost
Tree Representation (normal form)
INVERTER 2
NAND2 3
NAND3 4
NAND4 5
Slide courtesy of Keutzer
45 Trivial Covering
subject DAG
7 NAND2 (3) 21 5 INV (2) 10
Area cost 31
Can we do better with tree covering?
Slide courtesy of Keutzer
46Optimal tree covering - 1
3
2
2
3
subject tree
Slide courtesy of Keutzer
47Optimal tree covering - 2
3
8
2
2
5
3
subject tree
Slide courtesy of Keutzer
48Optimal tree covering - 3
3
8
13
2
2
5
3
subject tree
Cover with ND2 or ND3 ?
1 NAND2 3 subtree 5
1 NAND3 4
Area cost 8
Slide courtesy of Keutzer
49Optimal tree covering 3b
3
8
13
2
2
4
5
3
subject tree
Label the root of the sub-tree with optimal match
and cost
Slide courtesy of Keutzer
50Optimal tree covering - 4
Cover with INV or AO21 ?
3
8
13
2
2
subject tree
2
5
4
1 AO21 4 subtree 1 3 subtree 2 2
1 Inverter 2 subtree 13
Area cost 9
Area cost 15
Slide courtesy of Keutzer
51Optimal tree covering 4b
3
9
8
13
2
2
subject tree
2
5
4
Label the root of the sub-tree with optimal match
and cost
Slide courtesy of Keutzer
52Optimal tree covering - 5
Cover with ND2 or ND3 ?
9
8
2
subject tree
4
subtree 1 8 subtree 2 2 subtree 3 4 1 NAND3 4
subtree 1 9 subtree 2 4 1 NAND2 3
NAND2
NAND3
Area cost 16
Area cost 18
Slide courtesy of Keutzer
53Optimal tree covering 5b
9
8
16
2
subject tree
4
Label the root of the sub-tree with optimal match
and cost
Slide courtesy of Keutzer
54Optimal tree covering - 6
Cover with INV or AOI21 ?
13
16
subject tree
5
subtree 1 13 subtree 2 5 1 AOI21 4
subtree 1 16 1 INV 2
AOI21
INV
Area cost 18
Area cost 22
Slide courtesy of Keutzer
55Optimal tree covering 6b
13
18
16
subject tree
5
Label the root of the sub-tree with optimal match
and cost
Slide courtesy of Keutzer
56Optimal tree covering - 7
Cover with ND2 or ND3 or ND4 ?
subject tree
Slide courtesy of Keutzer
57Cover 1 - NAND2
Cover with ND2 ?
9
18
16
subject tree
4
subtree 1 18 subtree 2 0 1 NAND2 3
Area cost 21
Slide courtesy of Keutzer
58Cover 2 - NAND3
Cover with ND3?
9
subject tree
4
subtree 1 9 subtree 2 4 subtree 3 0 1
NAND3 4
Area cost 17
Slide courtesy of Keutzer
59Cover - 3
Cover with ND4 ?
8
2
4
subject tree
subtree 1 8 subtree 2 2 subtree 3 4 subtree
4 0 1 NAND4 5
Area cost 19
Slide courtesy of Keutzer
60Optimal Cover was Cover 2
ND2
AOI21
ND3
INV
subject tree
ND3
INV 2 ND2 3 2 ND3 8 AOI21 4
Area cost 17
Slide courtesy of Keutzer
61Summary of Technology Mapping
- DAG covering formulation
- Separated library issues from mapping algorithm
(cant do this with rule-based systems) - Tree covering approximation
- Very efficient (linear time)
- Applicable to wide range of libraries (std cells,
gate arrays) and technologies (FPGAs, CPLDs) - Weaknesses
- Problems with DAG patterns (Multiplexors, full
adders, ) - Large input gates lead to a large number of
patterns
62Outline
- Introduction
- Two-level Logic Synthesis
- Multi-level Logic Synthesis
- Timing optimization in Synthesis
63Timing Optimization in Synthesis
- Factors determining delay of circuit
- Underlying circuit technology
- Circuit type (e.g. domino, static CMOS, etc.)
- Gate type
- Gate size
- Logical structure of circuit
- Length of computation paths
- False paths
- Buffering
- Parasitics
- Wire loads
- Layout
64Problem Statement
- Given
- Initial circuit function description
- Library of primitive functions
- Performance constraints (arrival/required times)
- Generate
- an implementation of the circuit using the
primitive functions, such that - performance constraints are met
- circuit area is minimized
65Current Design Process
Behavioral description
Behavior Optiization (scheduling)
Logic and latches
Partitioning (retiming)
Logic equations
- Logic synthesis
- Technology independent
- Technology mapping
- Gate library
- Perf. Constraints
- Delay models
Gate netlist
Timing driven place and route
Layout
66Synthesis delay models
- Why are technology independent delay reductions
hard? - Lack of fast and accurate delay models
- levels, fast but crude
- levels correction term (fanout, wires, ) a
little better, but still crude (what coefficients
to use?) - Technology mapped reasonable, but very slow
- Place and route better but extremely slow
- Silicon best, but infeasibly slow (except for
FPGAs)
s l o w e r
b e t t e r
67Clustering/partial-collapse
- Traditional critical-path based methods require
- Well defined critical path
- Good delay/slack information
- Problems
- Good delay information comes from mapper and
layout - Delay estimates and models are weak
- Possible solutions
- Better delay modeling at technology independent
level - Make speedup, insensitive to actual critical
paths and mapped delays
68Overview of Solutions for delay
- Circuit re-structuring
- Rescheduling operations to reduce time of
computation - Implementation of function trees (technology
mapping) - Selection of gates from library
- Minimum delay (load independent model)
- Minimize delay and area
- Implementation of buffer trees
- Resizing
- Focus here on circuit re-structuring
69Circuit re-structuring
- Approaches
- Local
- Mimic optimization techniques in adders
- Carry lookahead (tree height reduction)
- Conditional sum (generalized select
transformation) - Carry bypass (generalized bypass transformation)
- Global
- Reduce depth of entire circuit
- Partial collapsing
- Boolean simplification
70Re-structuring methods
- Performance measured by
- levels,
- sensitizable paths,
- technology dependent delays
- Level based optimizations
- Tree height reduction (Singh 88)
- Partial collapsing and simplification (Touati
91) - Generalized select transform (Berman 90)
- Sensitizable paths
- Generalized bypass transform (Mcgeer 91)
71Re-structuring for delay tree-height reduction
6
n
Collapsed Critical region
5
Critical region
n
5
5
Duplicated logic
1
l
m
m
1
1
1
4
1
k
4
2
k
0
0
i
j
3
i
j
3
h
h
0
0
0
0
0
0
2
0
0
0
0
0
0
2
a
b
c
d
e
f
g
a
b
c
d
e
f
g
72Restructuring for delay path reduction
4
New delay 5
n
3
n
5
Collapsed Critical region
5
2
Duplicated logic
1
m
m
1
1
1
1
1
1
4
2
2
4
k
k
0
0
0
i
j
i
j
3
3
0
h
h
0
0
0
0
0
0
0
0
2
0
0
0
0
2
a
b
c
d
e
f
g
a
b
c
d
e
f
g
73Generalized select transform (GST)
- Late signal feeds multiplexor
a
out
b
c
d
e
f
g
a0
0
b
out
c
d
e
f
g
a1
1
b
a
c
d
e
f
g
74Generalized bypass transform (GBX)
- Make critical path false
- Speed up the circuit
- Bypass logic of critical path(s)
fmf
fm1
fng
fm f
fm1
fng
0
g
1
dg __ df
Boolean difference
s-a-0 redundant
75 GST vs GBX
c
0/1
g
h
0
g
b
GBX
1
a
c
0/1
dh __ da
g
GBX
h
0
g
b
1
a
a0
b
c
d
e
f
g
a1
b
c
d
e
f
g
76Thank you!