Title: 'Data Communication Mechanisms for Systems with Heterogeneous Timing'.
1'Data Communication Mechanisms for Systems with
Heterogeneous Timing'.
Ian G. Clark
Fei Xia, Alex Yakovlev, Delong Shang
IGClark_at_iee.org
http//IanGClark.net/
http//async.org.uk/
Thanks for the invite!
2Talk layout
3Introduction and Background
- Systems are becoming larger and more complex.
- Large systems are difficult to synchronize.
- Physical material limits the maximum clock
speed. - Power consumption.
- EMC fields.
- Synchronous operation typically means that the
system is running at a speed dictated by the
slowest element.
4Now Pentium 4 processor 100 Million
transistors
Number of transistors per chip (log)
Chip complexity
Design productivity
Verification
time
By 2012 Semiconductor Industry
Association (SIA) predicts 1400 million
transistors per chip, 3000 GHz
clock, 1000 Gbit memory chips,
1 Volt power supply
5The delay ratio problem
Introduction and Background (2)
6The Timing Modes Spectrum
Introduction and Background (3)
Multiple clock domains
Heterogeneous
Asynchronous (self-timed)
Single clock synchronous
Parallel
Analogue
GALS
HETS
7Introduction and Background (4)
- Asynchronous processing.
- Improved EMC - dependent on data being
processed. - Lower power - energy only used when work is done.
Example A to D conversion.
However
8Introduction and Background (5)
- Sequential and synchronous easier.
- Most current commercial tools support sequential
and synchronous, some parallel but not
asynchronous. - An intermediate solution GALS
- Use synchronous and sequential in processing,
and asynchronous in communication.
Can there be an easy transfer of knowledge from
the existing methods to the new solutions?
This is not just an academic question!
9Introduction and Background (6)
ITRS (International Technology Roadmap for
Semiconductors) (http//public.itrs.net/)
Systems on Chip SoC are increasingly becoming
heterogeneous in the their behaviour, including
mixed analogue-discrete components, time-driven
and power-saving subsystems.
Wilfred Pinfold of Intel Microprocessor Lab by
the end of this decade, when Intel expects to be
producing billion transistor devices, todays
essentially homogenous microprocessor market will
have to become more diverse, characterised by
multiple heterogeneous designs, each optimised
for the requirements of different application
segments.
10(No Transcript)
11NoC Network on Chip
- Large existing knowledge base.
- Philips ethernet on chip.
- Current networks are synchronous cannot handle
non-synchronous cores like self-timed. - Global chip communication increased power
consumption. - Good for non-deterministic data communication.
- Side step the synchronization and global clock
issues. - Not suitable for Real-Time applications.
12Baseline Architectural aspect
- Real-time networks and MASCOT approach from
RSRE/Phillips(67), BAe/Simpson(86) for software
systems - high time heterogeneity but relatively low speed
- Globally-Asynchronous-Locally-Synchronous (GALS)
Chapiro(84), Muttersbach(00), Ginosar(00) for
VLSI circuits - high speed but very limited time heterogeneity
13Heterogeneously Timed Nets (hets)(based on
MASCOT standard symbols)
A2
C2
A4
A1
C1
A3
C3
14Hets
Time/event/data-driven Data processing
elements (active)
A2
C2
A4
A1
C1
A3
C3
15Hets
Data communication elements (passive) - ACMs
A2
C2
A4
A1
C1
A3
C3
16Asynchronous data communications
Processes are single threads of execution.
writer
reader
writer time domain
reader time domain
Level of asynchrony is defined by WRITE and READ
rules
17Classification of ACMs
- Hugo Simpsons classification
Destructive read (read can be held up) Non-destructive read (read cannot be held up)
Destructive write (write cannot be held up) Signal (event data) Pool (reference data)
Non-destructive write (write can be held up) Channel (message data) Constant (configuration data)
Other ACM classifications e.g. L. Lamport, 1986
(safe, regular and atomic registers)
18Difficulty with Simpsons classification
- Destructive/Non-destructive does not intuitively
imply temporal, Wait/No-wait division - Destructive write cannot wait
- Destructive read can wait
- There is symmetry between Pool and Channel but no
symmetry between Signal and Constant -
19Quick aside - Petri Nets
Not to be confused with the Hets / MASCOT symbols
20Petri net capture of Simpsons protocols
Signal
Pool
non-destr write
empty
destr write
non-destr read
destr read
destr write
full
full
Channel
Constant
empty
empty
non-destr write
destr read
non-destr read
non-destr write
full
full
21Our interpretation
Signal
Pool
read
read
write
write
re-read
over-write
over-write
read
read
unread
unread
Channel
Message/Command
read
read
write
write
re-read
Constant is a special case of Command
read
read
unread
unread
22Our interpretation
Signal
Pool
read
read
write
write
re-read
over-write
over-write
read
read
unread
unread
Channel
Message/Command
read
read
write
write
re-read
read
read
unread
unread
23Our classification of ACMs
Lazy read read only previously unread data (read can be held up) Busy read may re-read data already read (read cannot be held up)
Busy write may over-write unread data (write cannot be held up) BW-LR (Signal) BW-BR (Pool)
Lazy write write only if previous read data (write can be held up) LW-LR (Channel) LW-BR (Command)
24Signal vs Pool
Real time 1 (busy domain)
Real time 2 (busy domain)
Pool
Real time (busy domain)
Data-driven (lazy domain)
Signal
Low Power!
25Sample algorithms
Pool with 3 slots fully asynchronous
wr write slot n w0 ln w1 n(l,r)
r0 rl rd read slot r
Signal with 2 slots conditionally asynchronous
wr write slot w w0 wr
r0 rr rd wait until wr read slot r
26What is a slot?
27Data Properties
28SIGNAL Data latency
If a reader cycle immediately follows a writer
cycle what data does it get?
29SIGNAL Data latency
Write X
post
30write slot w w not r
SIGNAL Data latency
This implies 0 capacity
r not r wait until wr read slot r
Trade off between slots and capacity and
latency. 3 slot signal has capacity 1, and does
not make the reader wait as here.
31Modeling the algorithms
Example statement - w not r
subnet W0 in the Signal
Non-abstract models for ease of understanding
This is atomic some statements need to be 2
stage
32Modeling the algorithms
setting
referencing
33Sub-models and the enable place
34Sub-models and the enable place
35Metastability
36a normal state-transition
37Metastability
38Metastable transients
39Metastability
Keep away from data path!
40Analysis and Some Results
Exhaustive reachability search all process
interleaving covered.
3 slot pool Control 1,2,3 Arbiter req. Capacity 1delay
4 slot pool Control 0,1 No arbiter Capacity 1
2 slot signal Control 0,1 No arbiter Capacity 01
3 slot signal Control 1,2,3 No arbiter Capacity 1
41VLSI design layout (chip fabed in June 2000 via
EUROPRACTICE)
4-slot Pool ACM
424-slot ACM part
(details on testing in 9thAsync UK Forum paper)
43Conclusion
44Current and Future work
Applications distributed CCTV, Control systems.
Modelling of ACMs in system - analysis -
(Moses/Metropolis?)
- Open questions
- Have the best ACM algorithms been found?
- How should best be defined?
45Acknowledgements
More info on team and projects
Other team members
Tony Davies David Fraser David Kinniment Albert
Koelmans Graeme Chester Fei Hao Maria
Valera Sergio Velastin
Collaborators
Hugo Simpson Eric Campbell
Coherent project Grants GR/32895
GR/32666 http//async.org.uk/coherent/