Title: Introduction to Silicon Programming in the Tangram/Haste language
1Introduction to Silicon Programmingin the
Tangram/Haste language
- Material adapted from lectures by
- Prof.dr.ir Kees van Berkel
- Dr. Johan Lukkien
- Dr.ir. Ad Peeters
- at the Technical University of Eindhoven, the
Netherlands
2Handshake signaling and data
push channel versus pull channel
request ar
3Handshake signaling push channel
req ar
ack ak
early ad
broad ad
late ad
4Data bundling
- In order to maintain event ordering at both sides
of a channel, the circuit must satisfy data
bundling constraint - for push channel delay along request wire must
exceed delay of data wire - for pull channel delay along acknowledge wire
must exceed delay of data wire.
5Handshake signaling pull channel
- When data wires are invalid multiple and
incomplete transitions allowed.
req ar
ack ak
early ad
broad ad
late ad
6Tangram assignment x f(y,z)
Handshake circuit
7Four-phase data transfer
?r / br
ba / cr
ca / ?a
bd / cd
1 2 3 4 5
8Handshake latch
- ? w?? w?? rd wd r??
r?? - 1-bit handshake latch wd ? wr ? rd ?
?wd ? wr ? rd ? wk wr rk
rr
9N-bit handshake latch
- area, delay, energy
- area 2(N1) gate eqs.
- delay per cycle 4 gate delays
- energy per write cycle 4 0.52N
transitions, in average
10Transferrer
- ? a?? (b?? c??) a??
(b?? cd bd c?? cd ?)
11Multiplexer
- ? a?? c?? a?? (cd ad c?? cd
?) b?? c?? b?? (cd bd c?? cd
?) - Restriction ?ar ? ?br must hold at all times!
12Multiplexer realization
control circuit
data circuit
13Logic/arithmetic operator
- ? a?? (b?? c??) a?? ((b??
c??) ad f(bd , cd )) - Cheaper realization (delay sensitive)
- ? a?? (b?? c??) a?? ((b??
c??) ad f(bd , cd )) delay ad
?
14A one-place fifo buffer
byte type 0..255 BUF1 main
proc(a?chan byte b!chan byte).begin x var
byte forever do a?x b!x odend
15A one-place fifo buffer
byte type 0..255 BUF1 main
proc(a?chan byte b!chan byte).begin x var
byte forever do a?x b!x odend
?
a
x
b
x
162-place buffer
- byte type 0..255
- BUF1 proc (a?chan byte b!chan byte).begin
x var byte forever do a?x b!x od end - BUF2 main proc (a?chan byte c!chan
byte).begin b chan byte BUF1(a,b)
BUF1(b,c) end
17Two-place ripple buffer
18Two-place wagging buffer
byte type 0..255 wag2 main
proc(a?chan byte b!chan byte).begin x,y
var byte a?x forever do (a?y
b!x) (a?x b!y) odend
19Two-place ripple register
begin x0, x1 var byte forever do b!x1
x1x0 a?x0 odend
204-place ripple register
- byte type 0..255 rip4 main proc
(a?chan byte b!chan byte). begin x0,
x1, x2, x3 var byte forever do b!x3
x3x2 x2x1 x1x0 a?x0 od end
214-place ripple register
- area N (Avar Aseq )
- cycle time Tc (N1) T
- cycle energy Ec N E
22Introducing vacancies
- begin x0, x1, x2, x3, v var byte
forever do (b!x3 x3x2 x2v) (vx1
x1x0 a?x0) odend - what is wrong?
23Introducing vacancies
- forever do ((b!x3 x3x2) (vx1
x1x0 a?x0)) x2v od -
- or
- forever do ((b!x3 x3x2) (vx1
x1x0)) (x2v a?x0)od
24synchronous 4-p ripple register
- forever do (s0m0 s1m1 s2m2
b!m3 ) ( a?m0 m1s0 m2s1
m3s2)od
254-place wagging register
- forever do b!x1 x1x0 a?x0 b!y1
y1y0 a?y0od
268-place register
- 4-way wagging
- forever do b!u1 u1u0 a?u0 b!v1
v1v0 a?v0 b!x1 x1x0 a?x0 b!y1
y1y0 a?y0od
27Four 8?8 shift registers compared
28Tangram/Haste
- Purpose programming language for asynchronous
VLSI circuits. - Creator Tangram team _at_ Philips Research Labs
(proto-Tangram 1986 release 2 in 1998). - Inspiration Hoares CSP, Dijkstras GCL.
- Lectures no formal introduction manual hand-out
(learn by example, learn by doing). - Main tools compiler, analyzer, simulator, viewer.
292-place buffer
- byte type 0..255
- BUF1 proc (a?chan byte b!chan byte).begin
x var byte forever do a?x b!x od end - BUF2 main proc (a?chan byte c!chan
byte).begin b chan byte BUF1(a,b)
BUF1(b,c) end
30Median filter
- median main proc (a? chan W b! chan W).
begin x,y,z var W xy, yz, zw var bool
forever do ((zy yx) yzxy)
a?x (xy xlty zx zltx) if zxxy
then b!x or xyyz then b!y or yzzx
then b!z fi odend
31Greatest Common Divisor
- gcd main proc (ab?chan ltltbyte,bytegtgt c!chan
byte).begin x,y var byte forever do
ab?ltltx,ygtgt do xlty then y y-x or xgty
then x x-y od c!x odend
32Nacking Arbiter
- nack main proc (a?chan bool b!chan
bool).begin na,nb var bool ltltna,nbgtgt
ltlttrue,truegtgt forever do sel probe(a) then
a!nb na nanb or probe(b) then b!na
nb nbna les od end
33C Tangram ? handshake circuit
34C Tangram ? handshake circuit
35C Tangram ? handshake circuit
C (RS)
36Tangram Compilation
37VLSI programming of asynchronous circuits
behavior, area, time, energy, test coverage
Tangram program
feedback
compiler
simulator
Handshake circuit
expander
Asynchronous circuit
(netlist of gates)
38Tangram tool box
- Let Rlin4.tg be a Tangram program
- htcomp -B Rlin4
- compiles Rlin4.tg into Rlin4.hcl, a handshake
circuit - htmap Rlin4
- produces Rlin4.v files, a CMOS standard-cell
circuit - htsim Rlin4 a b
- executes Rlin4.hcl with files a, b for
input/output - htview Rlin4
- provides interactive viewing of simulation results
39Tangram program Conway
P
Q
R
a
b
c
d
- B1 type 0..1 B2 type ltltB1,B1gtgt
B3 type ltltB1,B1,B1gtgt P Q
R conway main proc (a?chan B2
d!chan B3). begin b,c chan B1 P(a,b)
Q(b,c) R(c,d) end
40Tangram program Conway
- P proc(a?chan B2 b!chan B1). begin x var
B2 forever do a?x b!x.0 b!x.1 od end - Q proc(b?chan B1 c!chan B1). begin y var
B1 forever do b?y c!y od end - R proc(c?chan B1 d!chan B3). begin x,y,z
var B1 forever do c?x c?y c?z d!ltltx,y,zgtgt
od end
41VLSI programming for
- Low costs
- introduce resource sharing.
- Low delay (high throughput)
- introduce parallelism.
- Low energy (low power)
- reduce activity
42VLSI programming for low costs
- Keep it simple!!
- Introduce resource sharing commands, auxiliary
variables, expressions, operators. - Enable resource sharing, by
- reducing parallelism
- making similar commands equal
43Command sharing
P proc(). S P() P()
44Command sharing example
ax proc(). a?x ax() ax()
45Procedure definition vs declaration
- Procedure definition P proc (). S
- provides a textual shorthand (expansion)
- each call generates copy of resource, i.e. no
sharing - Procedure declaration P proc (). S
- defines a sharable resource
- each call generates access to this resource
46Command sharing
- Applies only to sequentially used commands.
- Saves resources, almost always(i.e. when command
is more costly than a mixer). - Impact on delay and energy often favorable.
- Introduced by means of procedure declaration.
- Makes Tangram program less well readable.
Therefore, apply after program is
correct sound. - Should really be applied by compiler.
47Sharing of auxiliary variables
- xE is an auto assignment when E depends on x.
This is compiled as auxE x aux ,
where aux is a fresh auxiliary
variable. - With multiple auto assignments to x, as in
- xE ... xF
- auxiliary variables can be shared, as in
- auxE aux2x() ... auxF aux2x()
with aux2x() proc(). xaux
48Expression sharing
f func(). E xf() a!f()
e0
e1
49Expression sharing
- Applies only to sequentially used expressions.
- Often saves resources, (i.e. when expression is
more costly than the demultiplexer). - Introduced by means of function declarations.
- Makes Tangram program less well readable.
Therefore apply after program is
correct sound. - Should really be applied by compiler.
50Operator sharing
- Consider x0 y0z0 x1 y1z1 .
- Operator can be shared by introducing
- add func(a,b? var T) T. ab
-
- and applying it as in x0 add(y0, z0)
x1 add(y1,z1) .
51Operator sharing the costs
- Operator sharing may introduce multiplexers to
(all) inputs of the operator and a demultiplexer
to its output. - This form of sharing only reduces costs when
- operator is expensive,
- some input(s) and/or output are common.
52Operator sharing example
- Consider x yz0 x yz1 .
- Operator can be shared by introducingadd2y
proc(b? var T). xyb -
- and applying it as inadd2y(z0) add2y(z1) .
53Greatest Common Divisor
- gcd main proc (ab?chan ltltbyte,bytegtgt c!chan
byte).begin x,y var byte forever do
ab?ltltx,ygtgt do xlty then y y-x or
xgty then x x-y od c!x odend
54Assigment make GCD smaller
- Both assignments (y y-x and x x-y) are auto
assignments and hence require an auxiliary
variable. - Program requires 4 arithmetic resources (twice lt
and ) . - Reduce costs of GCD by saving on auxiliary
variables and arithmetic resources. (Beware the
costs of multiplexing!) - Use of ff variables not allowed for this
exercise.