Title: Lava II
1Lava II
- Mary Sheeran, Koen Claessen
- Chalmers University of Technology
- Satnam Singh, Xilinx
2Style hints
dStudcct a b outs where
....and2(a,b)... This is perfectly correct, but
it means I have to go looking among the
parameters to see which are the circuit inputs.
GATHER them into a single structure, which should
be last (rightmost) input. Then I can easily tell
what the interface of the circuit is. mycct
(a,b) outs ......
3Style hints
More generally, have circuit parameters
as separate inputs, followed by all circuit
inputs in one structure (tuple or list, possibly
nested).
the input cctName p1 p2 (a,bs)
(ds,e) .... used to control generation.
For example integer to control size. Or constants
for use in the circuit (see next
example). Usually have zero or one parameters.
4Register
reg init (w,din) dout where dout
delay init m m mux (w,(dout,din))
5Register
parameter reg init (w,din)
dout where dout delay init m m
mux (w,(dout,din))
6Register
reg init (w,din) dout where dout
delay init m m mux (w,(dout,din))
Maingt reg low (high,high) . orlandlhigh,high,
andlinvhigh,delaylow,orlandlhigh,high,andl
invhigh,delaylow,orlandlhigh,h igh,andlinv
high,delaylow,orlandlhigh,high,andlinvhigh
,delaylow,orlandlhigh,high,andlinvhig h,del
aylow,orlandlhigh,high,andlinvhigh,delaylo
w,orlandlhigh,high,andlinvhigh,delaylow,orl
andlhigh,high,andlinvhigh,delaylow,orland
lhigh,high,andlinvhigh,delaylow,orlandlhig
h,high,a ndlinvhigh,delaylow,orlandlhigh,hi
gh,andlinvhigh,delaylow,orlandlhigh,high,a
ndlinvhigh,del aylow,orlandlhigh,high,andl
invhigh,delaylow,orlandlhigh,high,andlinvh
igh,delaylow,orlandlh igh,high,andlinvhigh
,delaylow,orlandlhigh,high,andlinvhigh,dela
ylow,orlandlhigh,high,andlin vInterrupted!
7Register
This is why we have a two stage process. First
make internal representation in a data type and
THEN simulate or generate formats.
the circuit Maingt
simulateSeq (reg low) (high,low),(low,high),(high
,high),(low,high) low,low,low,high
8Sequential circuits
MUST be simulated using simulateSeq
Maingt simulate (reg low) (high,high) Program
error evaluating a delay component
9Working on lists
g
f
parl f g halveList -gt- (f -- g) -gt- append
10two f
f
f
11 two (two f)
12Many twos
twoN 0 circ circ twoN n circ two (twoN
(n-1) circ)
13Interleave
f
f
ilv f unriffle -gt- two f -gt- riffle
14Many interleaves
ilv (ilv (ilv C))
15Many interleaves
ilvN 0 circ circ ilvN n circ ilv (ilvN
(n-1) circ)
16Wiring
id2
swap
17Butterfly
bfly circ
bfly circ
18Defining Butterfly
bfly 0 circ id bfly n circ ilvN (n-1)
circ -gt- two (bfly (n-1) circ)
19Defining Butterfly
Connection pattern parameter
circuit bfly 0 circ id bfly n
circ ilvN (n-1) circ -gt- two (bfly (n-1)
circ)
20Another style (matter of taste)
- bflly 0 circ as as
- bflly n circ as os
- where
- bs ilvN (n-1) circ as
- os two (bflly (n-1) circ) bs
21Butterfly Layout on an FPGA
22mergers and sorters
- Can be made recursively from butterfly of
two-input two-output comparators on (say) binary
or complex numbers, or even on bit-serial
numbers. (Batchers bitonic sorter) - Such a sorting network is correct if it sorts
BITs (theorem known as the 0-1 principle) - Means we can plug in bit-sorters and check the
property that the output is always sorted using a
SAT-solver or SMV. (Another example of a
non-standard component, and of squeezing a
difficult problem (integer sorting) into an
easier one (bit sorting))
23Note
- Could be viewed as Lustre (or similar) embedded
in Haskell - Generic circuits and connection patterns easy to
describe (the power of Haskell) - Verify FIXED SIZE circuits (squeezing the
problem down into an easy enough one)
24Another example
25Multiplication
- 11010
- 01001
- 11010
- 00000
- 00000
- 11010
- 00000
- 0011101010
26Multiplication
- msb 1 1 0 1 0
- 0 0 0 0 0
- 0 0 0 0 0
- 1 1 0 1 0
- 0 0 0 0 0
-
27Multiplication
- lsb 0 1 0 1 1
- 0 0 0 0 0
- 0 0 0 0 0
- 0 1 0 1 1
- 0 0 0 0 0
-
28Structure of multiplier
29 - multBin comps (as,bs) p1ss
- where
- (p1p2,p3ps) prods_by_weight (as,bs)
- is redArray comps
ps - ss binaryAdder
(p2,p3is) - redArray comps ps is
- where
- (is,) row (compress comps) (,ps)
30 Reduction tree for multiplier
5
4
4
3
3
carries
2
Fast Adder
31- Will concentrate on the reduction tree (a row of
compress cells) - Partial products generated using and gates. May
also include recoding to reduce size of tree (cf.
Booth)
32Compress (diff2)
n-2
2
33n
weight w
weight w1
n-1
34diff gt 2 diff lt 2
k
k
wcell
hcell
k2
k-1
35weight w
weight w1
n-1
36n
weight w
n1
37 - compress comps (as,bs)
- (diff gt 2) (compress comps - hcell
comps) (as,bs) - (diff 2) column (fcell comps)
(as,bs) - (diff lt 2) (compress comps - wcell
comps) (as,bs) - where diff length bs - length as
38(No Transcript)
39possible fcell
c
fullAdd
s
halfAdd cells similar. Gives standard array
multiplier. Not great!
40Only need to vary wiring!Make it explicit
iC
s3
cc
iS
41Dadda-like
c
fullAdd
toEnd (a,as) asa
s
Excellent log depth reduction tree , but known
for irregularity, difficult layout
42picture by Henrik Eriksson, Chalmers
43Regular reduction tree (Eriksson et al. CE)
c
fullAdd
toEnd (a,as) asa
s
Nowhere near as good as Dadda, but inspired this
work
44picture by Henrik Eriksson, CE
45Back to Dadda
c
fullAdd
toEnd (a,as) asa
s
Excellent log depth reduction tree , but known
for irregularity, difficult layout
46Simple delay analyis (again)
fullAddL a,b,cc s,c where (s,c) fullAdd
(a,(b,cc)) fAddI (a1s, a2s, a3s, a1c, a2c, a3c)
a1,a2,a3 s,cout where s max
(a1sa1) (max (a2sa2) (a3sa3)) cout max
(a1ca1) (max (a2ca2) (a3ca3)) fI Signal
Int -gt Signal Int fI as fAddI
(20,20,10,10,10,10) as (Have changed the
full-adder interface to be list to list. Was
handier in this example.)
47Checking gate delay
comps, tuple of building blocks
- dDadG n
- simulate(redArray (hI,fI,
-
toEnd,toEnd,id,splitAt 2,splitAt 3)) (ppzs n) - Gate delay models
-
wiring cells (allow later inclusion of
.
wiring delay, in
next lecture)
(will return to splitAt shortly)
48Checking gate delay (as before)
- Maingt dDadG 16
- 0,10,5,20,20,30,30,40,40,50,50,50,50
,60,60,70,70,70, - 70,70,70,80,70,80,80,90,90,90,90,90,9
0,90,90,90,90,90, - 80,90,80,80,70,80,70,80,70,70,60,70,6
0,60,50,60,50,50, - 40,20,0,20
49Checking gate delay (as before)
- Maingt dDadG 54
- 0,10,5,20,20,30,30,40,40,50,50,50,50
,60,60,70,70,70,70,70,70,80,70,80,80,9
0, - 90,90,90,90,90,90,90,100,90,100,90,100
,100,110,110,110,110,110,110,110,110,110
, - 110,110,110,120,110,120,110,120,110,120,
120,120,120,130,130,130,130,130,130,130,
- 130,130,130,130,130,130,130,130,130,130,
130,140,130,140,130,140,130,140,130,140,
- 140,140,140,140,140,150,150,150,150,150,
150,150,150,150,150,150,150,150,150,150,
- 150,150,150,150,150,150,150,150,150,150,
150,150,140,140,140,140,140,140,140,140,
- 140,140,130,140,130,140,130,140,130,140,
130,140,130,130,130,130,130,130,130,130,
- 130,130,130,130,120,120,120,120,120,120,
120,120,110,120,110,120,110,120,110,110,
- 110,110,110,110,110,110,100,100,100,100,
100,100,90,100,90,100,90,90,90,90,80,90
, - 80,80,70,80,70,80,70,70,60,70,60,60,5
0,60,50,50,40,20,0,20
50Verifying the multiplier
- multDadda (as,bs) ps
- where
- ps multBin(halfAddL,fullAddL,
- toEnd,toEnd,id,split
At 2,splitAt 3) - prop_Equivalent circ1 circ2 a ok
- where
- out1 circ1 a
- out2 circ2 a
- ok out1 ltgt out2
51Use of predefined Haskell functions
splitAt is a library function from the
standard prelude. See
http//www.haskell.org/definition/haskell98-report
.pdf
Reading the standard prelude is a good way to
learn! Saves you from reinventing commonly used
functions (for example on lists). Your code gets
shorter and easier for me to read. (Starting from
scratch will not be penalised, if correct!)
52an ordinary Haskell function
Maingt t splitAt splitAt Int -gt a -gt
(a,a) Maingt splitAt 7 1..10 (1,2,3,4,5,6,7
,8,9,10) Maingt splitAt 7 1..3 (1,2,3,)
Maingt splitAt 2 1..10 (1,2,3,4,5,6,7,8,9,10)
53Verifying the multiplier
built-in multiplier
- Maingt smv (prop_Equivalent multi multDadda)
- ERROR - Unresolved overloading
- Type Fresh Signal Bool gt IO
ProofResult - Expression smv (prop_Equivalent multi
multDadda) - Doesnt work because we have NOT FIXED the SIZE
of the inputs
54prop_mults mymult n forAll (list n) \as -gt
forAll (list n) \bs -gt
prop_Equivalent multi mymult (as,bs) OR prop_mults
mymult n forAll (list n) \as -gt
forAll (list n) \bs -gt multi(as,bs) ltgt
mymult (as,bs) Now smv(prop_mults multDadda 8)
goes through in less than half a second. But size
16 doesnt. Why? See section 4.2 of Lava
tutorial (replace verify by smv)
55The cool thing
- The same description with just some different
wiring cells gives a GREAT VARIETY of different
multipliers - One begins to see some order in the chaos...
- The key point was finding the right connection
pattern - Ideally, one would like to prove this extremely
generic description correct! Open research
question....
56(No Transcript)
57Note
- Layout for the Dadda-like tree is no more
difficult than for any of the others. Important
in practice! - We call it the High Performance Multiplier
reduction tree (Henrik, Per, Mary ) - Henrik Eriksson, CE, had first idea and then my
mult. descriptions suggested something similar.
This led to a layout strategy, which Henrik
followed. - Next step is to generate layout from Wired
(wire-aware version of Lava)
58Promising, but we can do better!
- Next lecture circuits that adapt to their
surroundings