Operation Chaining Asynchronous Pipelined Circuits - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Operation Chaining Asynchronous Pipelined Circuits

Description:

Regs act as 'glitch filters' Objective of Operation Chaining: ... Bit-operations predict datapath glitch impact. Quadratic complexity ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 31

Provided by: Gir957

Category:

more less

Transcript and Presenter's Notes

Title: Operation Chaining Asynchronous Pipelined Circuits

1
Operation Chaining Asynchronous Pipelined Circuits

Girish Venkataramani
Seth C. Goldstein

2
Introduction
t1

Operation chaining coalesces nodes across time
steps
Optimize energy efficiency in bundled data
asynchronous pipelines circuits
Formulated as a vertex covering problem
Implemented within CASH IWLS 04
Average energy-delay improves by 1.4x

t2

t1

3
Outline

Background and Motivation
Problem Formulation Related Work
Algorithm Overview
Experimental Results
Conclusions

4
Asynchronous Pipelines
ALU1
Data
ALU2
Latch
Latch
Delay
Delay
Req
H/S
H/S
Ack
Stage 1
Stage 2
Control-path

Self-timed circuits
Clocked by handshake controller (H/S)
Protocol-based communication
Dynamically scheduled using ReqAck signals
Can be clustered into stages

Datapath
5
Motivation for Operation Chaining

Energy profile for gsm_encoder kernel
Generated by CASH IWLS 04, TCAD 06
Protocol Four-phase bundled data

C
Delay
60

H/S
reg
6
The Op-Chaining Idea
C
C
C

Delay

Delay
Delay
H/S
H/S
reg
reg
H/S
reg
C

Delay

Benefits
Eliminated H/S Reg
Eliminated some H/S signals
Faster datapath

H/S
reg
7
The Op-Chaining Idea
C
C
C

Delay

Delay
Objective of Operation Chaining Minimize
overall energy without degrading performance
compared to fully pipelined system
Delay
H/S
H/S
reg
reg
H/S
reg
C

Delay

Drawbacks
Reduced pipeline parallelism
Increased datapath power
Regs act as glitch filters

H/S
reg
8
Outline

Background and Motivation
Problem Formulation Related Work
Algorithm Overview
Experimental Results
Conclusions

9
Problem Formulation

Given G(V,E)
Node v is a potential pipeline stage
Edge (u,v) is a potential bundled data channel
Find vertex cover,
s S1,,Sn
Each sub-graph, Si, is a pipeline stage
For a fully pipelined system
S V

Constraints
Correctness is preserved
Performance is equal to or better than fully
pipelined system
Objective Maximize energy savings
Control-path energy savings gt Datapath energy
increase

10
Related Work

Minimizing registers in asynchronous circuit is
NP-complete Kim, ICCAD 00
Control-path unchanged
Retiming in synchronous world
Clock cycle constraints exist
Registers per pipeline loop is constant
This work addresses pipeline register and
handshake controller minimization
NP-hard
Potential for more energy savings

11
Outline

Background and Motivation
Problem Formulation Related Work
Algorithm Overview
Experimental Results
Conclusions

12
Requirements

Correctness of solution
Output functionality must be preserved
System should be deadlock-free
Predict impact of op-chaining on performance
Use Global Critical Path and global slack DAC
07
Predict impact on datapath energy
Bit operations heuristic
Simplifying constraint single-output stages

13
Divide-and-Conquer Strategy
Use constraints to partition graph
Dynamic programming evaluates candidates
Dual Problem Find stages assigned regs
?
14
Correctness Constraints

Pre-assign some nodes to contain registers and
handshake controllers
I/O functionality
Deadlock-free guarantees
Every primary input and output node, say I/O, of
G must contain a handshake controller and register

15
Deadlock Constraints

Protocol induces pipeline requirements
Regs(d) registers in a loop with d
execution threads
Pre-assign (at least) Regs(d) stages in every
pipeline loop to contain registers
Let all pre-assigned stages be

16
Constraints Partition the Graph
x
i
k
w
m
d
c
z
v
n
s
e
r
b
h
a
17
Single-Output Constraint
x post-dominates y if all paths from x to primary
output passes through y Use post-dominator
relationships to further partition each candidate
sub-graphs
w
d
c
v
n
n
s
e
b
a
Cannot be within an op-chained sub-graph
18
Post-Dominator Trees
w
d
c
v
n
s
e
b
a
Post-Dom Trees
All post-dom tree roots must contain registers
b
a
n
c
e
d
v
s
w
19
Evaluate Single-Output Candidates
x
i
k
w
m
d
c
z
v
n
s
e
r
b
h
a
Each candidate is evaluated independent of each
other. Final step Find best partitions within
each candidate ? Partition the
post-dominator tree of each candidate
20
Performance Cost
m

D() 1 D() 3
b

Cycle time determines system-level timing
Largest cycle in the underlying Petri-Net
Global Critical Path DAC 07
Compute Global Slack for each node
How much can node be delayed without affecting
cycle time?
Timing budget for system-level timing

c

e
a
21
Performance Cost
for each primary input, there exists an ack signal
ack

Find delay in response time for each ack at P.I.
Compare with Global Slack of each ack leading to
the constraint

Ack Response Time Before op-chaining After
op-chaining
22
Datapath Power Cost

Glitches (intermediate results) in datapath lead
to useless switching
Registers between stages filter these glitches
With op-chaining, glitches increase
Heuristic Bit-operations is indicator of
potential glitching
Roughly number of std-cell gate

23
Algorithm Overview
Pre-assign Nodes
1

Requirements
Correctness
Single-output
Timing
Power

Enumerate Post-Dominator (PD) tree candidates
Dyn. prog. evaluates timing/power costs
2
Partition each PD tree independently
4
3
Complexity O(E V2)
24
Outline

Background and Motivation
Problem Formulation Related Work
Algorithm Overview
Experimental Results
Conclusions

25
Experimental Results

Applied op-chaining within CASH
C to asynchronous circuits compiler
Benchmarks Mediabench Lee 97
Circuits mapped to 180nm/2V ST Microelectronics
standard-cell library
Four data points

no timing, power constraints
strict timing, power constraints
R1
R2
R3
Greedy
Most Conservative
Most Aggressive
26
H/SRegs Eliminated
27
Energy Efficiency
Relative to fully pipeline system
Energy-Delay
Performance
28
Outline