Title: From Combinational to Sequential Circuits to Simple Processors
1From Combinational to Sequential Circuits to
Simple Processors
2Reminder Embedded Systems
2
3Outline
- Introduction
- Combinational logic
- Sequential logic
- FSM design
- Custom single-purpose processor design
- RT-level custom single-purpose processor design
4SYNTHESIS METHODOLOGIES
5Increasing abstraction level in design
specification
- Higher abstraction level focus of
hardware/software design evolution - Description smaller/easier to capture
- E.g., Line of sequential program code can
translate to 1000 gates - Many more possible implementations available
- (a) Like flashlight, the higher above the ground,
the more ground illuminated - Sequential program designs may differ in
performance/transistor count by orders of
magnitude - Logic-level designs may differ by only power of 2
- (b) Design process proceeds to lower abstraction
level, narrowing in on single implementation
5
6What is Synthesis
- Automatically converting systems behavioral
description to a structural implementation - Complex whole formed by parts
- Structural implementation must optimize design
metrics - Synthesis is more expensive, it is complex than
compilers - Cost 100s to 10,000s
- User controls 100s of synthesis options
- Optimization critical
- Otherwise could use software
- Optimizations different for each user
- Run time hours, days
7Gajskis Y-chart
- Each axis represents type of description
- Behavioral
- Defines outputs as function of inputs
- Algorithms but no implementation
- Structural
- Implements behavior by connecting components with
known behavior - Physical
- Gives size/locations of components and wires on
chip/board - Synthesis converts behavior at given level to
structure at same level or lower - E.g.,
- FSM ? gates, flip-flops (same level)
- FSM ? transistors (lower level)
- FSM X registers, FUs (higher level)
- FSM X processors, memories (higher level)
FU functional unit FSM finite state machine
7
8Example of Custom Processor
- Processor
- Digital circuit that performs a computation tasks
- Controller and datapath
- General-purpose variety of computation tasks
- Single-purpose one particular computation task
- Custom single-purpose non-standard task
- A custom single-purpose processor may be
- Fast, small, low power
- But, high NRE, longer time-to-market, less
flexible
9CMOS transistor on silicon
- Transistor
- The basic electrical component in digital systems
- Acts as an on/off switch
- Voltage at gate controls whether current flows
from source to drain - Dont confuse this gate with a logic gate
10CMOS transistor implementations
- Complementary Metal Oxide Semiconductor
- We refer to logic levels
- Typically 0 is 0V, 1 is 5V
- Two basic CMOS types
- nMOS conducts if gate1
- pMOS conducts if gate0
- Hence complementary
- Basic gates
- Inverter, NAND, NOR
11Basic logic gates
F x y AND
F x ? y XOR
F x Driver
F x y OR
F (x y) NAND
F x Inverter
F (xy) NOR
12Combinational logic design
A) Problem description y is 1 if a is to 1, or
b and c are 1. z is 1 if b or c is to 1, but not
both, or if all are 1.
13Combinational components
Students should be able to use all kinds of
combinational blocks in synthesis of various
problems
14Levels of synthesis
- Logic-level behavior to structural implementation
- Logic equations and/or FSM to connected gates
- Combinational logic synthesis
- Two-level minimization (Sum of products/product
of sums) - Best possible performance
- Longest path 2 gates (AND gate OR gate/OR
gate AND gate) - Minimize size
- Minimum cover
- Minimum cover that is prime
- Heuristics
- Multilevel minimization
- Trade performance for size
- Pareto-optimal solution
- Heuristics
- FSM synthesis and Control Unit Synthesis
- State minimization
- State encoding
- State decomposition
- Special architectures
15Minimum Cover
16Two-level logic minimization
- Represent logic function as sum of products (or
product of sums) - AND gate for each product
- OR gate for each sum
- Gives best possible performance
- At most 2 gate delay
- Goal minimize size
- Minimum cover
- Minimum of AND gates (sum of products)
- Minimum cover that is prime
- Minimum of inputs to each AND gate (sum of
products)
16
17Minimum cover
- Minimum of AND gates (sum of products)
- Literal variable or its complement
- a or a, b or b, etc.
- Minterm product of literals
- Each literal appears exactly once
- abcd, abcd, abcd, etc.
- Implicant product of literals
- Each literal appears no more than once
- abcd, acd, etc.
- Covers 1 or more minterms
- acd covers abcd and abcd
- Cover set of implicants that covers all minterms
of function - Minimum cover cover with minimum of implicants
18Minimum cover K-map approach
- Karnaugh map (K-map)
- 1 represents minterm
- Circle represents implicant
- Minimum cover
- Covering all 1s with min of circles
- Example direct vs. min cover
- Less gates
- 4 vs. 5
- Less transistors
- 28 vs. 40
K-map sum of products
K-map minimum cover
Minimum cover
Fabc'd' a'cd ab'cd
Minimum cover implementation
2 4-input AND gate 1 3-input AND gates 1 4 input
OR gate ? 28 transistors
18
19Minimum cover that is a prime cover
- Minimum of inputs to AND gates
- Prime implicant
- Implicant not covered by any other implicant
- Max-sized circle in K-map
- Minimum cover that is prime
- Covering with min of prime implicants
- Min of max-sized circles
- Example prime cover vs. min cover
- Same of gates
- 4 vs. 4
- Less transistors
- 26 vs. 28
19
20Minimum cover heuristics
- K-maps give optimal solution every time
- Functions with gt 6 inputs too complicated
- Use computer-based tabular method
- Finds all prime implicants
- Finds min cover that is prime
- Also optimal solution every time
- Problem 2n minterms for n inputs
- 32 inputs 4 billion minterms
- Exponential complexity
- Heuristic
- Solution technique where optimal solution not
guaranteed - Hopefully comes close
21Heuristics iterative improvement
- Start with initial solution
- i.e., original logic equation
- Repeatedly make modifications toward better
solution - Common modifications
- Expand
- Replace each nonprime implicant with a prime
implicant covering it - Delete all implicants covered by new prime
implicant - Reduce
- Opposite of expand
- Reshape
- Expands one implicant while reducing another
- Maintains total of implicants
- Irredundant
- Selects min of implicants that cover from
existing implicants - Synthesis tools differ in modifications used and
the order they are used
22Multilevel logic minimization
- Trade performance for size
- Increase delay for lower of gates
- Gray area represents all possible solutions
- Circle with X represents ideal solution
- Generally not possible
- 2-level gives best performance
- max delay 2 gates
- Solve for smallest size
- Multilevel gives pareto-optimal solution
- Minimum delay for a given size
- Minimum size for a given delay
multi-level minim.
delay
2-level minim.
size
23Example of logic factorization
- Minimized 2-level logic function
- F adef bdef cdef gh
- Requires 5 gates with 18 total gate inputs
- 4 ANDS and 1 OR
- After algebraic manipulation
- F (a b c)def gh
- Requires only 4 gates with 11 total gate inputs
- 2 ANDS and 2 ORs
- Less inputs per gate
- Assume gate inputs 2 transistors
- Reduced by 14 transistors
- 36 (18 2) down to 22 (11 2)
- Sacrifices performance for size
- Inputs a, b, and c now have 3-gate delay
- Iterative improvement heuristic commonly used
23
24Control automata
inputs
Counter
Counter
Register
Small FSM
Address of outputs
Address of outputs
page
ROM or similar logic
ROM or similar logic
outputs
Counter
Register
ROM or similar logic
Variant 1
Variant 2
Variant 3
25Control automata
Load new address
inputs
Small FSM
Load/count
Register/ Counter
outputs
Address of outputs
Counter
Register
ROM or similar logic
ROM or similar logic
Variant 4
Variant 6
26FSM synthesis
27FSM synthesis
- FSM to gates
- State minimization
- Reduce of states
- Identify and merge equivalent states
- Outputs, next states same for all possible inputs
- Tabular method gives exact solution
- Table of all possible state pairs
- If n states, n2 table entries
- Thus, heuristics used with large of states
- State encoding
- Unique bit sequence for each state
- If n states, log2(n) bits
- n! possible encodings
- Thus, heuristics common
27
28Sequential components
Q lsb - Content shifted - I stored in msb
Q 0 if clear1, I if load1 and
clock1, Q(previous) otherwise.
Q 0 if clear1, Q(prev)1 if count1 and
clock1.
Reversible shifter shifts left and
rigth Reversible counter counts up and
down Reading it operation in most of registers
generalized registers.
29Sequential logic design
A) Problem Description You want to construct a
clock divider. Slow down your pre-existing clock
so that you output a 1 for every four clock cycles
- Given this implementation model
- Sequential logic design quickly reduces to
combinational logic design
30Sequential logic design (cont.)
31Custom single-purpose processor basic model
32Example greatest common divisor
33Example greatest common divisor
- First create algorithm
- Convert algorithm to complex state machine
- Known as FSMD finite-state machine with datapath
- Can use templates to perform such conversion
(c) state diagram
(b) desired functionality
0 int x, y 1 while (1) 2 while
(!go_i) 3 x x_i 4 y y_i 5 while
(x ! y) 6 if (x lt y) 7
y y - x else 8
x x - y 9 d_o x
34State diagram templates
35Creating the datapath
- Create a register for any declared variable
- Create a functional unit for each arithmetic
operation - Connect the ports, registers and functional units
- Based on reads and writes
- Use multiplexors for multiple sources
- Create unique identifier
- for each datapath component control input and
output
36Creating the controllers FSM
- Same structure as FSMD
- Replace complex actions/conditions with datapath
configurations
37Splitting into a controller and datapath
go_i
Controller
!1
1
0000
1
!(!go_i)
2
0001
!go_i
2-J
0010
x_sel 0 x_ld 1
3
0011
y_sel 0 y_ld 1
4
0100
x_neq_y0
5
0101
x_neq_y1
6
0110
x_lt_y1
x_lt_y0
y_sel 1 y_ld 1
x_sel 1 x_ld 1
7
8
0111
1000
6-J
1001
5-J
1010
d_ld 1
9
1011
1-J
1100
38Controller state table for the GCD example
39Completing the GCD custom single-purpose
processor design
- We finished the datapath
- We have a state table for the next state and
control logic - All thats left is combinational logic design
- This is not an optimized design, but we see the
basic steps
You may be asked in homeworks or exams or
projects to optimize the design with some respect
such as area, speed , power or testability
40Example Bus Bridge Design
41RT-level custom single-purpose processor design
Example Bus Bridge
- We often start with a state machine
- Rather than algorithm
- Cycle timing often too central to functionality
- Example
- Bus bridge that converts 4-bit bus to 8-bit bus
- Start with FSMD
- Known as register-transfer (RT) level
- Exercise complete the design
42RT-level custom single-purpose processor design
(cont)
Example Bus Bridge
43Optimization in Synthesis
44Optimizing single-purpose processors
- Optimization is the task of making design metric
values the best possible - Optimization opportunities
- original program
- FSMD
- datapath
- FSM
45Optimizing the original program
- Analyze program attributes and look for areas of
possible improvement - number of computations
- size of variable
- time and space complexity
- operations used
- multiplication and division very expensive
46Optimizing the original program (cont)
original program
optimized program
0 int x, y 1 while (1) 2 while
(!go_i) 3 x x_i 4 y y_i 5 while
(x ! y) 6 if (x lt y) 7
y y - x else 8
x x - y 9 d_o x
0 int x, y, r 1 while (1) 2 while
(!go_i) // x must be the larger number
3 if (x_i gt y_i) 4 xx_i 5
yy_i 6 else 7
xy_i 8 yx_i 9
while (y ! 0) 10 r x y 11
x y 12 y r 13 d_o
x
replace the subtraction operation(s) with modulo
operation in order to speed up program
GCD(42, 8) - 9 iterations to complete the loop x
and y values evaluated as follows (42, 8), (43,
8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4),
(2,2).
GCD(42,8) - 3 iterations to complete the loop x
and y values evaluated as follows (42, 8),
(8,2), (2,0)
47Optimizing the FSMD
- Areas of possible improvements
- merge states
- states with constants on transitions can be
eliminated, transition taken is already known - states with independent operations can be merged
- separate states
- states which require complex operations (abcd)
can be broken into smaller states to reduce
hardware size - scheduling
48Optimizing the FSMD (cont.)
int x, y
optimized FSMD
!1
original FSMD
1
int x, y
1
eliminate state 1 transitions have constant
values
!(!go_i)
2
2
go_i
!go_i
!go_i
x x_i y y_i
2-J
3
merge state 2 and state 2J no loop operation in
between them
x x_i
3
5
y y_i
4
xlty
xgty
merge state 3 and state 4 assignment operations
are independent of one another
y y -x
x x - y
8
7
!(x!y)
5
x!y
d_o x
9
merge state 5 and state 6 transitions from
state 6 can be done in state 5
6
xlty
!(xlty)
y y -x
x x - y
8
7
eliminate state 5J and 6J transitions from each
state can be done from state 7 and state 8,
respectively
6-J
5-J
eliminate state 1-J transition from state 1-J
can be done directly from state 9
d_o x
9
1-J
49Optimizing the datapath
- Sharing of functional units
- one-to-one mapping, as done previously, is not
necessary - if same operation occurs in different states,
they can share a single functional unit - Multi-functional units
- ALUs support a variety of operations, it can be
shared among operations occurring in different
states
50Optimizing the FSM
- State encoding
- task of assigning a unique bit pattern to each
state in an FSM - size of state register and combinational logic
vary - can be treated as an ordering problem
- State minimization
- task of merging equivalent states into a single
state - state equivalent if for all possible input
combinations the two states generate the same
outputs and transitions to the next same state
51Technology mapping
- Library of gates available for implementation
- Simple
- only 2-input AND,OR gates
- Complex
- various-input AND,OR,NAND,NOR,etc. gates
- Efficiently implemented meta-gates (i.e.,
AND-OR-INVERT,MUX) - Final structure consists of specified librarys
components only - If technology mapping integrated with logic
synthesis - More efficient circuit
- More complex problem
- Heuristics required
51
52Complexity impact on user
- As complexity grows, heuristics used
- Heuristics differ tremendously among synthesis
tools - Computationally expensive
- Higher quality results
- Variable optimization effort settings
- Long run times (hours, days)
- Requires huge amounts of memory
- Typically needs to run on servers, workstations
- Fast heuristics
- Lower quality results
- Shorter run times (minutes, hours)
- Smaller amount of memory required
- Could run on PC
- Super-linear-time (i.e. n3) heuristics usually
used - User can partition large systems to reduce run
times/size - 1003 gt 503 503 (1,000,000 gt 250,000)
52
53Integrating logic design and physical design
- Past
- Gate delay much greater than wire delay
- Thus, performance evaluated as of levels of
gates only - Today
- Gate delay shrinking as feature size shrinking
- Wire delay increasing
- Performance evaluation needs wire length
- Transistor placement (needed for wire length)
domain of physical design - Thus, simultaneous logic synthesis and physical
design required for efficient circuits
53
54Embedded Systems CaseStudy
Elevator Controller
54
5555
56Elevator System
- CRC cards is a well-known method for analyzing a
system and developing an architecture. - CRC
- Classes logical groupings of data and
functionality - Responsibilities describe what the class do
- Collaborators other classes w/ which a given
class works - Elevator Control Classes
- Elevator car, Passenger, Floor control, Car
control, Car sensors, etc. - Architectural Classes
- Car state, Floor control reader, Car control
reader, Car control sender, Scheduler
56
57F floors N hoistways
57
5858
5959
6060
6161
62 Classes logical groupings of data and
functionality Responsibilities describe what
the class do Collaborators other classes w/
which a given class works Elevator Control
Classes Elevator car, Passenger, Floor control,
Car control, Car sensors, etc. Architectural
Classes Car state, Floor control reader, Car
control reader, Car control sender, Scheduler
Physical Interfaces
62
6363
64Architecture
- Computation and I/O occur at
- Floor control panels/displays
- Elevator cars
- System controller
- Panels Controller
- Car Controller
- read buttons and send events to system
controller - read sensor inputs and send to system controller
64
65System Controller
- Must take inputs from many sources
- Must control cars to hard real-time deadlines
- User interface, scheduling are soft deadlines
- Testing
- Build an elevator simulator using SystemC,
Verilog, VHDL and FPGA - Simulate multiple elevators
- Simulate real-time control demands
65
66Homework
- The simplest possible custom single-purpose
processor - Design a processor to multiply two numbers. The
initial data are in registers/counters A and B.
The result should be in register/counter C. - You have only reversible counters (with reading)
to be used in the data path. - The counters perform the following operations
- Add one
- Subtract one
- Read new value
- Invent the algorithm for multiplication. Use
minimum number of counters - Design the reversible counter by hand using logic
gates and D FFs. - Design the control unit
- Design the data path
- Draw the timing diagram of the whole system.
- You can use VHDL or Verilog to help you, but I
need your design by hand.
67Summary
- Custom single-purpose processors
- Straightforward design techniques
- Can be built to execute algorithms
- Typically start with FSMD
- CAD tools can be of great assistance
68Questions to Exams (1)
- What are the main methods of Combinational logic
design? - What is Mealy FSM (Finite State Machine)?
- What is Moore State Machine?
- Think about a robot controller as a Sequential
logic Circuit. What are the blocks and their
role? - Role of abstraction in FSM design. Give examples.
- Explain the concepts from Gajskis Chart in a
Custom single-purpose processor design - RT-level custom single-purpose processor design.
Explain briefly all design stages from bottom of
design hierarchy (layout) to the top (system
design of a GCD processor as an example) - List and explain logic gates.
- List and explain combinational blocks.
- List and explain sequential blocks.
- List and explain sensors to be used with embedded
systems of FSM type. - List and explain actuators to be used with such
embedded systems.
69Questions to Exams (2)
- What are the main synthesis processes and CAD
tools in Combinational logic design? - What are the methods to solve the covering
problem? - Explain the concept of search and give examples.
- Explain the concept of heuristic in search and
give examples. SOP minimization can be very
useful. Also ESOP. - Explain design tradeoffs and Pareto Optimization
on one practical example. - Explain in detail on example the basic synthesis
method for Mealy FSM from specification to a
circuit from D type flip-flops (FFs) and logic
gates. - Explain and illustrate how D, T and JK flip-flops
work. - What is a difference between
- Register with enable
- Register without enable
- Reversible register
- Draw the schematic of the FSMD.
- Explain GCD algorithm of Euclides on examples.
- Without looking to the slides, convert GCD
algorithm to a FSMD. - How can we optimize GCD?
- Apply these ideas to Least Common Multiplier
algorithm and FSMD for two numbers.
70Questions to Exams (3)
- The role of GO-TO commands in FSMD design. Are
they good or bad? Give examples. The role of
structured design of FSMD. - How the data path is created from FSMD? This is
one of main topics for this whole class. You have
to know it well. - How CU (Control Unit) is created from FSMD? This
is one of main topics for this whole class. You
have to know it well. - Compare state graph, state transition table and
flow-chart. Why we need all of them? - In this class we are not optimizing combinational
logic or FSMs too much. But if you have taken ECE
572 or ECE 573 classes you know many methods to
optimize on these levels. Can you give practical
examples of these optimizations in GCD or other
similar system? - Complete the Bus bridge FSMD that converts
4-bit bus to 8-bit bus and is given in these
slides. - Discuss Optimizing the single-purpose processors.
Give examples. Explain levels of optimization,
such as the original program, the FSMD, the data
path, the CU, the register, the combinational
logic, finally the technology mapping. - Design the complete elevator system for a villa
of a crazy millionaire artist from Hollywood.
Cost does not count. You have to amaze his
guests.
71Sources
- EECE 353-1
- Real-Time Systems
- T. John Koo
- Embedded Computing Systems Laboratory
- Institute for Software Integrated Systems
- Department of Electrical Engineering and Computer
Science - Vanderbilt University
- 5306 Stevenson Center
- January 16, 2006
- john.koo_at_vanderbilt.edu
Slides from S. Mohammadi Vahid, Siamak Mohammadi
Givargis and Marwedel
72What we can cover on Monday meeting?
- Design of SOP circuits from KMaps. Prime
implicants and Covering - Design of POS circuits from KMaps. Prime
implicates and Covering - Design of ESOP circuits from KMaps. Algebraic
rules for AND/EXOR logic. - Design using NAND and NOR gates. De Morgan Rules.
- Factorization.
- Multiplexers.
- Iterative circuits and their types.
- Using State Machines to design one-directional
iterative circuits - Predicates
- Oracles
- SAT oracles
- Graph Coloring oracles and distributed processors
- SENDMOREMONEY problem and its oracle.
- The idea of Constraint Satisfaction and
Distributed Software/hardware for it.