Title: EEE515J1 ASICs and DIGITAL DESIGN
1EEE515J1ASICs and DIGITAL DESIGN Lecture 6 Data
Processors and Control Units
Ian McCrum Room 5D03B Tel 90 366364 voice mail
on 6th ring Email IJ.McCrum_at_Ulster.ac.uk Web
site http//www.eej.ulst.ac.uk
Last changed 01/11/04_at_1800
2Designing Larger Digital Systems
- We have seen how designing Finite state machines
(FSMs) is relatively straightforward once the
state diagram or design specification is drawn. - Together with combinational logic these design
methods will stand you in good stead. - Of course there are problems that would be rather
large or tedious to solve using these methods
such as a system with a large number of inputs or
one with a large variety of actions or steps to
be performed. - We can modify the FSM approach.
- Having one FSM send inputs and receive outputs
from another FSM is a useful technique, such
cascaded or coupled FSMs are found in real
designs - the design techniques used will depend on whether
the two FSMs have synchronous clocks. - If not then the system is an asynchronous one
and will use handshake and control to effect
synchronisation between the machines. - We will not dwell (sic) on such machines here
except to note that testing asynchronous systems
is difficult, error prone and can give a design
which is difficult to modify late in the design
cycle.
3The Algorithmic State Machine method
- Other modifications to the basic FSM method might
add memory such as stack or heap structures and
have state machines route data to and from these
memory structures. - A more general approach is described below.
- Another alternative is to use a computer or
microprocessor system and write software. - Actually a computer is just an instance of a
digital system and the stored program concept on
which its application is based is similar to the
design method below so it should come as no
surprise that if you can master the method below
you will understand how computers actually work,
and could even design your own CPU.
4The ASM Method
- Instead of concentrating on simply moving from
state to state we can decompose our problem into
a number of sections. - If we must process input data and can identify
simple operations to be performed on the data
then we can sequence and control the flow of data
to and from each data processing block using FSM
design methods. - Thus we partition our system into a DATA
PROCESSOR and a CONTROL LOGIC section. - The data processor has functional blocks that do
something to the incoming data or locally
generated data such as a count of items
processed. - A good design rule is that each functional block
should do one thing and be easily described. It
might be a counter, an added or comparator or
shift register. It could even be a complete ALU. - The Control Logic sends control signals to each
block and receives status signals or information
about the data but not the data itself. Many
choices can be made by the designer but as a rule
this partition gives an easily designed, easily
tested and easily modified system
5The ASM Method
An ALU or Arithmetic Logic Unit has typically 2
data inputs and a data output all 8 or 9 bits
wide. It also has 3 or 4 inputs to indicate what
to do. The 3 bit binary number 000111 might
specify FAB, A-B, B-A, A and B, A or B and
maybe FA, FB and F11111111
6Example of ASM method
- Averaging 16 numbers each of 8 bits in size
- Method 1 use 8 adders to add 8 pairs of numbers,
this gives 8 9 bit numbers (worst case) - Use 4 9-bit adders to give four 10 bit answers
- Use 2 10 bit adders to give two 11 bit answers
- Finally use a 11 bit adder giving a 12 bit
answer, we can use a trick to divide by 16
simply use the 8 left most bits of the 12 bit
number, akin to shifting right 4 bits, this is
division by 24. - This is obviously most wasteful of space, but
achieves a reasonably fast answer, 4 add-times. - Actually adders are slow, though there are a
number of special techniques to speed up
addition, c.f carry-lookahead-adders. - Clearly a more space efficient system would be to
do the calculation the way humans would do it.
Use a running total and add sequentially, I.e use
one adder and pass the data through it one number
at a time.
7Example of ASM method
- State equations
- S0.D S0./s S2
- S1.D S0. S
- S2.D S5.EQ16
- S3.D S1 S6
- S4.D S3
- S5.D S4
- S6.D S5./EQ16
Output equations CLEAR S1 ADD S3 STROBE
S5 COUNT S6 DATAVALID S2
8Signals to the outside world
- Several unanswered problems remain with the
previous design - Exactly when the input arrives
- The datavalid pulse is only available for a short
time - It would be better ( cheaper?)to use
countdown counter. - Often when doing an initial ASM design, the
interface to the outside world (or the next
machine in the chain)is not given much attention. - A typical, useful approach is to provide
handshake lines to allow flow control. Thus
ack
RECEIVER driven, Wait for REQUEST I/p then o/p
data, then o/p DATAVALID, often just a timed
pulse , a low-high-low
Sender driven, o/p data, then o/p strobe, keep it
high until ack is seen from far end
9ASM machines demand synchronous logic
- Even simple latches are best driven in a
synchronous manner, even though applying a
latch or strobe signal to the clocks of a
register ( e.g 8 D-type flip-flops) will work, a
more testable circuit results if the master clock
goes to every component. - Thus the D-types spend most of their time in a
held state and only load data when the strobe
signal is high - This is easily achieved by adding multiplexors
10Using a CLOCK
- The role of the clock is very important in the
ASM method. - As has been said before, having everything
synchronised to a single clock can ease testing
and last minute design modifications. - In very large systems you will find systems that
use two phase clocks where the rising edge is
used by one section of a system and the next
section uses the falling edge. - Or latches are provided to isolate adjacent
sections. - Multiphase clocks exist, a 4 phase solution
allows the soldiers all to march in step. - Very large fast systems will have problems
routing a clock signal from one edge of a chip to
the other and several solutions exist to fix
this. - Often the designer will lay down the clock
distribution network before adding other gates. - A matrix of equal delay buffers may allow
distribution with a low timing skew across chip. - Also used today is local generation of the clock
and a system of phase locking ( cf www.altera.com
for a description of their DPLL cells). This can
also allow the clock frequency off-chip to be
much lower than the clock on the chip, the phase
locking can be done at a sub multiple of the
clock frequency. I first saw this on a Transputer
chip were the chip internally worked at 20MHz but
you only needed to supply the chip with a 5 Mhz
oscillator. The PCB layout was less critical and
the emitted RF noise was much less with this
approach. You may be aware it is used a lot in
modern PC CPU design, sometimes the internal
clocks run at 3.5 times the external clocks!. (
cf www.tomshardwareguide.com )
11Synchronous Control signals
- A key to initial ASM designs is to have very
strict synchronisation. This rule has even
prompted some TTL companies to bring out two
versions of their chips the 74163 and 74163A
counters are identical except that the RESET
action is synchronised on one version but
asynchronous on the other. - Once you are familiar with the method and have a
dozen designs under your belt you may relax this
strict rule somewhat. - Chips such as counters and shift registers can
undertake various control actions the RESET,
LOAD, PRESET, DIRECTION controls for a counter
are all VERBS of ACTION. An important part of
the method is to recognise that whilst your
control logic may assert these control inputs
they are NOT acted upon until the next clock
pulse. Thus the ACTION is not taken until the
clock pulse. This makes the design diagrams
easier to follow.
12The Design Method
- There are two main steps both graphical in
nature a block diagram of the data processor and
the ASM chart describing the sequence of data
operations to be performed. Different problems
sometimes lend themselves to applying these in
different orders. The data processor is a block
diagram or circuit diagram where each block is a
simple functional circuit. As a guide each block
should be available as a TTL chip but if you have
little experience of the TTL family a further
guide should be to ensure that it performs a
single, easily explained task. Each block should
be simple to design such as a combinational
problem or a very simple FSM. -
- All control signals MUST be synchronous.
Combinational circuits such as ADDERS might have
a synchronous ADD control signal or you can just
assume the answer pops out the bottom of the
adder. You must ensure that the propagation
delays of each data processor block do not cause
problems if these are all much faster than the
clock then there will be no problem. It is
possible to insert dummy states into the Control
logic to wait for answers to appear, or we must
complicate our system by adding status signals
e.g ADDER_COMPLETE
13The Design Method continued
- The ASM chart is comprised of boxes of just three
types. - It superficially resembles a programming
flowchart. There is one crucial difference
Programming Flowcharts are read sequentially from
the top of the page to the bottom, if there is
only one CPU then this also represents the time
behaviour of the program. - Obviously in a hardware circuit with a couple of
counters the counting of one counter does not
wait for the counting of another. Both pieces of
hardware operate at the same time, concurrently. - In fact the different parts of the Data Processor
in an ASM all operate at the same time. If we
have a section of an ASM chart where a counter is
told to count, an input is tested and an output
is generated then these actions will all be
scheduled to happen at the same time. - Of course it will take the next clock pulse to
action the events. - Each state in an ASM chart has only one output
box. - It may have a number of input testing boxes and
output boxes conditional on some inputs but there
must only be one main output box per state. - All arrows arriving at that state must go through
this box. - We label the state by labelling that output box
but be clear where the dotted lines that form the
boundary of our state lie, see Figure 2 overleaf.
14The Design Method continued
- Note some texts will name the state inside a
bubble shown as a dotted circle. Here I have
listed the state S0, with a state code of 0001.
(I will use one-hot codes for the state code but
there is no reason why a more efficient code
couldnt be used) - When in state zero you are in all boxes inside
the dotted line simulaneously! Depending on input
conditions. Thus the single bit input E is
tested at the same time as the single bit input
F is tested, the PRESET or LOAD_ALL_ONES
control signal of the 8 bit register R2 is
asserted if E is high, it flickers if E flickers
but of course we should try and use synchronous
inputs where possible. The Adder ( or counter?) A
is to increment and the RESET signal of R1 is
asserted. -
- Maybe you see now why all control signals are
only activated on a clock pulse. All these
control signals are set or cleared but NO
action takes place until the clock pulse arrives
that will take the machine to its next state,
down one of the three arrows exiting the box.
15The Design Method continued
- One of the consequences of this method means that
if a test is activated instantly on entering a
state then it is based on the old values of the
inputs. - If the state alters an input then we must be most
careful. If the conditional boxes above tested
the counter/adder A then it would exit depending
on the old value of A, despite A altering as we
left the state. - It is a good idea not to test a signal in the
same state as you attempt to alter it - It is easy to add dummy states (empty state
boxes) to cause a one clock cycle delay and this
can decouple the two effects. It is usually a
good idea to avoid two tests within one state. - These rules or guidelines can be broken but
adherence will increase the likelihood that the
system will work!
16Counting 1s in a 16 bit word.
The previous example was extremely abstract, a
more typical application follows we begin with
an English description of the problem. A
system is needed that will count the number of
ones in a 16 bit word. The design should be
easily modified for a 32 bit word. This is a
nice example because, as in real life, there are
many possible solutions, the good designer will
reject all but one of these, the one that is
picked will be for a good reason! Here we will
adopt an ASM method to illustrate the design
method. Speed of response or cost may push a real
designer to different conclusions.
17Solution 1b create a 4 bit cell and iterate the
answer. Adders will be needed to combine the four
outputs and this will be a slower, but easier to
design solution.
Solution 1a
The answer will be between zero and 16 inclusive.
This needs 5 bits to represent it (0000010000)
Solution 2 Use a shift Register and counter.
This will demonstrate the ASM method quite
nicely. Note that the two solutions trade space
and time. The pure combinational approach is
fastest but largest. We will use a shift register
and shift each bit out in turn if it is a 1 we
will increment a counter. As is often the case we
need to know when to stop. This could be done by
having a loop counter keep track of how many
shifts we had done, beginners usually set up a
counter to go from zero ( or 1) to 16. This may
be out by one and a comparator is needed.
Experienced ASM designers ( and programmers)
preload a counter with 15 and decrement to zero
or find an alternative. Here we will use a clever
trick to save time. By shifting zeros into our
word as we shift our data out we can test for all
zeros to exit our loop. In the case where there
are few ones this may give an impressive speed
advantage, at the disadvantage that the execution
time of our machine varies according to the input
data that is not always allowed.
18Solution 2 Shift Register and adder
- This will demonstrate the ASM method quite
nicely. Note that the two solutions trade space
and time. The pure combinational approach is
fastest but largest. - We will use a shift register and shift each bit
out in turn if it is a 1 we will increment a
counter. - As is often the case we need to know when to
stop. This could be done by having a loop counter
keep track of how many shifts we had done,
beginners usually set up a counter to go from
zero ( or 1) to 16. This may be out by one and a
comparator is needed. - Experienced ASM designers ( and programmers)
preload a counter with 15 and decrement to zero
or find an alternative. - Here we will use a clever trick to save time. By
shifting zeros into our word as we shift our data
out we can test for all zeros to exit our loop. - In the case where there are few ones this may
give an impressive speed advantage, at the
disadvantage that the execution time of our
machine varies according to the input data that
is not always allowed.
Initial sketch of Data Processor
19Solution 2 Shift Register and adder
20Solution 2 Shift Register and adder
The one-hot equations for this machine are as
follows T0.d T0 /S T1 Z T1.d
T3 E T0 S T2.d T1 /Z T3
/E T3.d T2 this causes a one clock
delay between altering E and testing E. Also the
control signals are LOAD T0 S COUNT
T1 SHIFT T2
21Try the tut questions!
See the file ASMTUTS.pdf on the website
The only trick to some of them is the use of a
pipeline, a line of registers to allow access to
older data
Ill do a DSP pipeline design on the board, its
not hard. Remember real ADCs will need to be
given a SC control signal and will return an EOC
status signal. These stand for START_CONVERSION
and END_OF_CONVERSION.