Title: Thanks for the invite!
1Ian G. Clark
IGClark_at_iee.org
http//IanGClark.net/
Thanks for the invite!
2Talk Layout
3The Whole Group
4MOVIE - Model Visualisation for Asynchronous
Circuit Design
The project addresses the development of
theoretical models and an associated set of
algorithms and software tools for graphical
representation and visualisation of highly
complex asynchronous circuit behaviour. New tools
will enable skilled designers to achieve greater
quality and productivity, and greater confidence
in their designs.
A few slides from DATE03
5Visualisation and Resolution of Coding Conflicts
in Asynchronous Circuit Design
- A. Madalinski, V. Khomenko, A. Bystrov and A.
Yakovlev - University of Newcastle upon Tyne
MOVIE Project
6Motivation
- state coding is a necessary for implementability
- manual vs. automatic resolution of coding
conflicts - automatic ? can produce sub-optimal solutions
- manual ? crucial for finding good (low-latency,
compact elegant) synthesis solutions - interactivity is good!
- conflict ? complementary set (i.e. b,a-,b-,a)
? called a core - select cores ? insert a signal to break the
conflict.
7Core selection Height map
Core map
Height map
8Signal insertion an example
Phase 1
Phase 2
Core map
Part of the solving process
888 CSC conflicts 4 cores
9BEhavioural Synthesis of Systems with
heterogeneous Timing (BESST) supported by EPSRC
at Newcastle University (project GR/R 16754)
- Aim The overall strategic goal of the project
is generic methods and an associated set of
software tools for synthesis of systems with
heterogeneous timing --- primarily focused on
self-timed controllers and interfaces. - Prof. Alex Yakovlev, Dr. Albert Koelmans,
- Dr. Frank Burns, and Mr. Delong Shang
10Design Flow
11System Synthesis Method
- A new method has been proposed. It is not a
syntax-direct translation. It semantically
translates a system specification from high level
to an intermediate format, LPNs (Labelled Petri
Net) and CPNs (Coloured Petri Net), and then
directly maps the LPNs and CPNs to an SI (Speed
Independent) circuit. - Some examples have been done using the method,
such as DMA controller, and others.
12What Has Been Done?
13Current and Future Work.
- Currently more research is focused on
optimization and scheduling, and will be focused
on the system level synthesis, for example
partitioning and communication synthesis. - More complex examples are being studied.
- Relative Timing (RT) techniques among others will
be introduced to improve performance.
14STELLA Synthesis and Testing of Low-Latency
Asynchronous Circuits
- Prof. A. Yakovlev (PI)
- Dr. A. Bystrov
- Prof. D. Kinniment
- Dr. A. Koelmans
- Dr. G. Russell
- Jan. 2003 -- Dec. 2005
15Aims and Objectives
- Develop the detailed implementation architecture
of a low-latency controller with techniques for
automated decomposition, synthesis and timing
analysis (see e.g. CS-TR-743, CS-TR-754 from
http//www.cs.ncl.ac.uk/). - Develop the main supporting structures for
off-line testing, such as internal scanning, for
a class of stuck-at, bridging and delay faults
with minimum speed overheads (see e.g.
CS-TR-746). - Develop the detailed architecture for a snooper
for on-line testing of self-timed structures with
minimum area and power consumption overheads. - Develop a demonstrator chip employing the
testable low-latency methodology the application
area will be an on-chip communication adaptor.
16Example of Low-Latency structure
- Output precomputation Explicit Context Signals
(ECS) - Latency reduction inputs connected to output
flip-flops
17Interfacing to standard CAD tools
- Maximum reuse of industrial CAD tools
- Providing alternative solutions to the parts of
the standard design flow - Compilation of RTL specs and structural Verilog
netlists into asynchronous designs - Reuse of test-related standard CAD tools
- Methods developed in the course of work will
be implemented in software tools and interfaced
to the industrial CAD toolkits (Cadence), acting
as a performance and test oriented asynchronous
design front-end.
18COMFORT - "asynchronous COmmunication Mechanisms
FOr Real-Time systems"
Objectives
- To study a range of asynchronous communication
mechanisms (ACMs) that can be used in
constructing (distributed and concurrent) systems
with heterogeneous timing - To develop hardware implementations for ACMs,
(including self-timed circuits) for potential
use in Systems-On-a-Chip (SOCs) and embedded
(miniature, low power and EMC) applications
19COHERENT - "COmputational HEteRogEneously timed
NeTworks"
Objectives
- Development of a parameterised library of ACMs
- Formal synthesis of multi-slot ACM algorithms
- Develop RTNoC architecture (HETS)
- Develop RTNoC design flow functional spec,
design, simulation, analysis, prototyping,
implementation and testing - Test RTNoCs on real examples of control or vision
systems comparison with existing (centrally
clocked) solutions
20The Timing Modes Spectrum
Introduction and Background
Multiple clock domains
Heterogeneous
Asynchronous (self-timed)
Single clock synchronous
Parallel
Analogue
GALS
HETS
- Sequential and synchronous easier.
- An intermediate solution GALS
- Transfer of knowledge from the existing methods
to the new solutions.
21Introduction and Background
- Benefits of Asynchronous processing
- Improved EMC - dependent on data being
processed. - Lower power - energy only used when work is done.
Example A to D conversion.
22(No Transcript)
23NoC Network on Chip
- Large existing knowledge base.
- Philips ethernet on chip.
- Current networks are synchronous cannot handle
non-synchronous cores like self-timed. - Global chip communication increased power
consumption. - Good for non-deterministic data communication.
- Side step the synchronization and global clock
issues. - Not suitable for Real-Time applications.
24Baseline Architectural aspect
- Real-time networks and MASCOT approach from
RSRE/Phillips(67), BAe/Simpson(86) for software
systems - high time heterogeneity but relatively low speed
- Globally-Asynchronous-Locally-Synchronous (GALS)
Chapiro(84), Muttersbach(00), Ginosar(00) for
VLSI circuits - high speed but very limited time heterogeneity
25Heterogeneously Timed Nets (hets)(based on
MASCOT standard symbols)
A2
C2
A4
A1
C1
A3
C3
26Hets
Time/event/data-driven Data processing
elements (active)
A2
C2
A4
A1
C1
A3
C3
27Hets
Data communication elements (passive) - ACMs
A2
C2
A4
A1
C1
A3
C3
28Asynchronous data communications
Processes are single threads of execution.
writer
reader
writer time domain
reader time domain
Level of asynchrony is defined by WRITE and READ
rules
29Classification of ACMs
- Hugo Simpsons classification
Destructive read (read can be held up) Non-destructive read (read cannot be held up)
Destructive write (write cannot be held up) Signal (event data) Pool (reference data)
Non-destructive write (write can be held up) Channel (message data) Constant (configuration data)
Other ACM classifications e.g. L. Lamport, 1986
(safe, regular and atomic registers)
30Difficulty with Simpsons classification
- Destructive/Non-destructive does not intuitively
imply temporal, Wait/No-wait division - Destructive write cannot wait
- Destructive read can wait
- There is symmetry between Pool and Channel but no
symmetry between Signal and Constant -
31Petri net capture of Simpsons protocols
Signal
Pool
non-destr write
empty
destr write
non-destr read
destr read
destr write
full
full
Channel
Constant
empty
empty
non-destr write
destr read
non-destr read
non-destr write
full
full
32Our interpretation
Signal
Pool
read
read
write
write
re-read
over-write
over-write
read
read
unread
unread
Channel
Message/Command
read
read
write
write
re-read
Constant is a special case of Command
read
read
unread
unread
33Our interpretation
Signal
Pool
read
read
write
write
re-read
over-write
over-write
read
read
unread
unread
Channel
Message/Command
read
read
write
write
re-read
read
read
unread
unread
34Our classification of ACMs
Lazy read read only previously unread data (read can be held up) Busy read may re-read data already read (read cannot be held up)
Busy write may over-write unread data (write cannot be held up) BW-LR (Signal) BW-BR (Pool)
Lazy write write only if previous read data (write can be held up) LW-LR (Channel) LW-BR (Command)
35Signal vs Pool
Real time 1 (busy domain)
Real time 2 (busy domain)
Pool
Real time (busy domain)
Data-driven (lazy domain)
Signal
Low Power!
36Sample algorithms
Pool with 3 slots fully asynchronous
wr write slot n w0 ln w1 n(l,r)
r0 rl rd read slot r
Signal with 2 slots conditionally asynchronous
wr write slot w w0 wr
r0 rr rd wait until wr read slot r
37What is a slot?
38Data Properties
39SIGNAL Data latency
If a reader cycle immediately follows a writer
cycle what data does it get?
40SIGNAL Data latency
Write X
post
41write slot w w not r
SIGNAL Data latency
This implies 0 capacity
r not r wait until wr read slot r
Trade off between slots and capacity and
latency. 3 slot signal has capacity 1, and does
not make the reader wait as here.
42Modeling the algorithms
Example statement - w not r
subnet W0 in the Signal
Non-abstract models for ease of understanding
This is atomic some statements need to be 2
stage
43Modeling the algorithms
setting
referencing
44Sub-models and the enable place
45Sub-models and the enable place
46Metastability
47a normal state-transition
48Metastability
49Metastable transients
50Metastability
Keep away from data path!
51Analysis and Some Results
Exhaustive reachability search all process
interleaving covered.
3 slot pool Control 1,2,3 Arbiter req. Capacity 1delay
4 slot pool Control 0,1 No arbiter Capacity 1
2 slot signal Control 0,1 No arbiter Capacity 01
3 slot signal Control 1,2,3 No arbiter Capacity 1
52VLSI design layout (chip fabed in June 2000 via
EUROPRACTICE)
4-slot Pool ACM
534-slot ACM part
(details on testing in 9thAsync UK Forum paper)
54Applications
- Distributed CCTV
- Advisor EU Project.
- Control systems
- Broom balancer.
- Sensor networks
- Condition based maintenance
- In car network
- simple RC oscillator vast clock range with
temp.
55Conclusion
56Open questions
Analysis of dynamic systems with ACMs in.
Testing intermittent faults, online-testing (e.g.
cross talk).
- Folding of Petri Nets
- Synthesis from partial orders.
57Acknowledgements
More info on team and projects
Leader Alex Yakovlev. Academics Graeme Chester
, Tony Davies, David Kinniment, Albert Koelmans,
Maciej Koutny, Gordon Russell, Sergio
Velastin. Collaborators Eric Campbell, Hugo
Simpson, . Researchers Frank Burns, Alex
Bystrov, David Fraser, Marta Pietkiewicz-Koutny,
Delong Shang, Fei Xia. Students Fei Hao, Victor
Khomenko, Agnes Madalinski, Danil Sokolov, Maria
Valera, .