Title: Introduction to VLSI Programming High Performance DLX
1Introduction to VLSI Programming High
Performance DLX
- (course 2IN30)
- Prof. dr. ir.Kees van Berkel
-
2Recap Pipelining in Tangram
- Compare three programs
- P0 a?x0 b!f2(f1(f0(x0)))
- P1 a?x0 x1 f0(x0) x2 f1(x1)
b!f2(x2) - P2 a?x0 a1!f0(x0) a1?x1
a2!f1(x1) a2?x2 b!f2(x2)
3Pipelining in Tangram (cntd)
- Output sequence b identical for P0, P1, and P2.
- P0 and P1 have same communication behavior P1
is larger, slower, and warmer. - P2 vs P1 similar in size, energy, and latency,
but up to 3 times higher throughput, depending
on (relative) complexity of f0, f1, f2.
4Recap DLX Moore machine(ignoring interrupts)
- ?Reg0,pc ? ?0,0?
- do ?MemRegrs1 immediate, pc, Regrd ?
- ? if SW ? Regrd fi
- , if J ? pc4offset
- BEQZ ? if Regrs0 ? pc4
immediate Regrs0 ? pc4 fi - else ? pc4
- fi
- , if LW ? Memrs1immediate
- ADD ? ALU(add, Regrs1, Regrs2)
- fi ?
- od
5DLX0 instruction loop
- do -halted then
- ROMaddr!PC
- ROMdata?ir
- PCPC4
auxPCPC4 PCPCaux - case (ir cast Itype.0)
- is ltltt,f,f,f,f,fgtgt then LW()
- or ltltt,f,f,f,f,tgtgt then SW()
- or ltltf,f,f,f,f,fgtgt then if (ir cast
Rtype.4 1) then SLT() fi - or ltltf,t,f,f,f,fgtgt then BEQZ()
- or ltltf,t,f,f,f,tgtgt then J()
- or ltltf,f,t,f,f,fgtgt then haltedtrue
- si
- od
6DLX0 instruction loop
- Each instruction cycle
- 4 sequential commands for each instruction
type - 1-3 sequential commands for specific
instructions - 5-7 sequential commands each cycle
- Pipelining
- split these 5-7 commands over 2 stages,
- in a (more or less) balanced way.
- is simple when instruction does not affect PC,
but more difficult for jump and branch
instructions.
72-stage DLX example template
8DLX 3-stage pipelined execution
Time ? instruction cycles 1 2
3 4 5 6 7
...
Program execution ? instructions
Stage EX includes memory access and writeback
93-stage DLX example template
10Reducing pipeline branch penalties
- Problem
- which instruction to fetch after branch
instruction? - Strategies
- wait until branch address is computed (DLX0)
- predict branch not taken
- predict branch taken
- introduce branch-delay slots (next assignment)
11Branch delay slots
- Single branch delay slot
-
- branch instruction
- branch-delay instruction
- branch target (if not taken)
-
- Branch-delay instruction, various possibilities
- e.g. instruction preceding branch instruction(if
branch condition does not depend on outcome) - ... or an instruction succeeding the branch, if
- NOP instruction if no productive alternative
available. - This constitutes a change in the ISA!
12Final Project
- 3-stage DLX, with instruction rate exceeding 80
MIPS when executing GCD (measured over several
GCD cycles). - NB1 exploit branch delay slots. This requires a
different version of the assembler text!!. - NB2 can be achieved using command level
parallelism and pipelining. (Expression-level
parallelism may yield a bonus.) - NB3 speed up the environment (RAM, ROM) when
necessary.
13VLSI programming of asynchronous circuits
behavior, area, time, energy, test coverage
Tangram program
feedback
compiler
simulator
Handshake circuit
expander
Asynchronous circuit
(netlist of gates)
14Demonstrator ICs
15Added value
- 1985 modularity, ease of design (no value
added to product!) - 1990 low power (ESPRIT project ?????)
- 1992 low noise, low EME
(Electro-Magnetic Emission) - 2000 ...
16Added value low power DCC Error Corrector
17A sync-async arms race
18Added value Low Power
Synchronous 80C51 - Asynchronous 80C51
19Added value Low EM Emission
20Roadblock circuit sizethe 80C51 learning curve
1995/6
1999/4
21Just in time processing
22ADPCM
23ADPCM
24ADPCM
25Industrialization of the Technology
- Philips Semiconductors Zürich (1994 Dec)
We want to set a world record in low power,
.. by using asynchronous technology. - Their choice for a vehicle the 80C51
micro-controller (used in many consumer
products). - Result 4 less power, minimal EME.
- Follow-up pager baseband ICs,
-
- In parallel transfer and upgrade of tools
design flow
26Pager Baseband Controller ICs
- Myna pager
- FLEX protocol
- 32 alphanumeric messages
- a single AAA battery (1V)
- up to 25 weeks battery life
- Pager baseband controller ICs
- PCA5007, PCA 5010
- http//www.semiconductors.philips.
com/pip/PCA5007 - http//www.win.tue.nl/pa/wsinap/ async.html
271998-Sep the PCA 5007
28A new generation of pagersa common platform for
all standards
29EMI a critical design factor (Electro-Magnetic
Interference)
- Antenna signal may be as small as 25?V.
- Clock harmonics of synchronous micro-controllers
interfere with RF (X00 MHz). - With asynchronous 80C51 signal decoding by means
of (standard-specific) software. (This also
enables upgrading/downloading!) - Furthermore no shielding is required between
controller and RF receiver.
30PCA5007 block diagram
31Contactless smartcard IC (ESPRIT project
DESCALE)
Power regulator
80C51 micro-controller DES engine UART RAM, ROM,
EEPROM
13.56 MHz clock power (a few mW) bi-directional
communication (106 kbit/s)
Radio link
32Contactless smartcard IC
- Properties
- a) low average power
- lower peak power
- speed adaptation
- Merits
- Maximum speed for received power (a,c)
- Robust operation against voltage drops (c)
- Smaller buffer capacitor (b,c)
33Conclusion
- First asynchronous VLSI circuits on the market
(high volume sales). - Prospects for more async products look good.
- Added value low power, EME performance.
- Added costs test, IC area, being different.
- Asynchronous VLSI technology
- there is room for it in market niches,
- but it may contribute to main-stream VLSI.
34Bibliography
- Computer Architecture a Quantitative Approach
(3rd Ed.) John L Hennessy David A Patterson
Morgan Kaufmann Publishers Inc, 1996. - ARM System Architecture Steve Furber Addison
Wesley, 1996. - DSP Processor Fundamentals, Architectures and
Features Phil Lapsey et al (Berkeley Design
Technology Inc.), IEEE, 1996. - www.handshakesolutions.com
- www.arm.com/news/6936.html
- www.research.philips.com/ newscenter/archive/2004/
handshake.html
35Lab-work and report
- You are allowed to team up with a colleague (Not
mandatory.) - Report more than listing of functional Tangram
programs - analyze the specifications and requirements
- present design options, alternatives, trade-offs
- motivate your design choices
- explain functional correctness of your Tangram
programs - analyze explain area, time, energy of your
programs.