Performance Analysis and Optimization of Latency Insensitive Systems - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Performance Analysis and Optimization of Latency Insensitive Systems

Description:

... and synchronization properties, while neglecting the particular data items ... Focus on data synchronization, neglecting data values. 1. 2. 3. 4. 5. 6. 7 ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 35
Provided by: lucaca4
Category:

less

Transcript and Presenter's Notes

Title: Performance Analysis and Optimization of Latency Insensitive Systems


1
Performance Analysis and Optimization of Latency
Insensitive Systems
  • Luca P. Carloni
  • Alberto L. Sangiovanni-Vincentelli

UC Berkeley
Design Automation Conference Los Angeles, June
2000
2
Motivation System-on-a-Chip Design
3
Sequential Modules and RTL Design
Output Register
Primary Outputs
Primary Inputs
Combinational Logic
State Register
4
Block Diagram of a MAC Circuit
RTL Design separates functional specification
from performance analysis
5
Intra-Module Delay and Timing Constraints
Once all modules are composed, the overall
system works correctly as far as it is running
with a clock period Tclk max T1 ,T2 ,T3 ,T4
6
Impact of Inter-Module Path Delays
7
DSM Percentage of Reachable Die
  • For a 0.06 micron process a signal can reach only
    5 of the dies length in a clock cycle D.
    Matzke, (TI) 1997
  • Cause Combination of high frequencies and slower
    wires

8
Need of a New Design Approach
  • To relax time constraints during early phases of
    the design when correct measures of the
    inter-module delay paths are not available
  • To simplify the composition of sequential modules
    in pipeline mode
  • To facilitate the insertion of extra pipeline
    stages between one module and the next one with
    the purpose of buffering those signals which
    propagate on long wires

9
Latency Insensitive Design ICCAD99
10
Latency Insensitive Design ICCAD99
RS
11
Latency Insensitive Design ICCAD99
P3
P2
RS
RS
12
Informative Events and Stalling Events
  • Each RelayStation introduces 1 stalling event
  • A module receiving a stalling event as input
  • emits stalling events as outputs at the next
    cycle

13
Advantages of LID Methodology
14
Robustness of LID Performance
Performance Loss (after RelayStation
insertion)
The Latency Insensitive Protocol does not affect
performance only if the design does not present
any feedback path between the sequential modules
15
Latency Insensitive Systems (LIS) Graph
  • Capture the structure of a Latency Insensitive
    System without getting lost into the details of
    the logic inside each sequential module
  • Focus on communication and synchronization
    properties, while neglecting the particular data
    items exchanged among the modules
  • Model the system performance by enabling
    early-exploration as well as late-adjustments
    of the latency-throughput trade-offs

16
LIS-Graph for the MAC Circuit

REG
REG

Composite
REG
REG
MPY
REG

REG
REG
REG
REG
REG
17
Weight of LIS-Graph Arcs
The weight of an arc is equal to the number of
relayStations on the corresponding channel
18
Equivalence of LIS-Graphs
19
Progressive Trace of a LIS-Graph Arc
20
Behavior of a LIS-Graph
S1
S5
The notion of LIS-graph behavior captures
the communication and synchronization
properties of a latency insensitive system
S4
S2
S3
S1
S2
S3
S4
S5
21
Firing Semantic of a LIS-Graph
  • Independence Rule every vertex Vj fires the
    first informative event (number 1) on each
    outgoing arc Ai (Vj, Vk). However, if arc Ai
    has weight w(Ai), the down-link vertex Vj will
    observe w(Ai) stalling events before seeing the
    first informative events from Vj
  • AND-Causality Rule every vertex Vj fires the
    n-th informative event only after the (n-1)-th
    informative event has appeared on each arc
    entering Vj

22
Cycle Means and System Throughput
23
Computing the Maximum Cycle Mean
  • Acyclic LIS-Graph (pipelined system with no
    feedback)
  • Thp(G) MCM(G) 1
  • Cyclic LIS-Graph (1 Strongly Connected Component
    (SCC))
  • all K cycles can be detected in O((VA)
    (K1))
  • Cyclic LIS-Graph (more than 1 SCC)
  • use Tarjans algorithm to detect all SCCs,
  • then derive the largest MCM among all SCCs

24
Recycling an Illegal LIS-Graph
  • Annotated LIS-Graph each arc ai has a length
    l(ai) that corresponds to the smallest multiple
    of the clock period that is larger then the delay
    of the channel associated to the arc
  • Illegal Arc iff w(ai) lt l(ai) 1
  • Illegal LIS-Graph iff contains an illegal arc
  • Recycling Operation Legalize a graph be
    increasing the weights of illegal arcs (i.e.
    adding relay stations to the corresponding
    channels)

25
Recycling Legalization Equalization
  • Legalization after deriving the annotated
    LIS-graph G legalize it by augmenting the weights
    of each illegal arc ai by DW(ai) l(ai) - 1
    - w(ai)
  • Equalization compute the max throughput Tk
    sustainable by each SCC Sk in the legalized graph
    G and equalize them by distributing Nk extra
    relay stations on the critical cycle Ck of Sk
  • Key Point avoid being forced to augment weights
    of cycles having small cardinality

26
Case Study MPEG-2 Video Encoder
Frame Memory
DCT
Preprocessing
Input
Quantizer (Q)
Motion Compensation
Frame Memory

IDCT
Regulator
Motion Estimation
VLC Encoder
Buffer
Output
27
LIS-graph of MPEG-2 Video Encoder
S
V1
V2
V3
V4
V5
V15
V10
V6
V7
V8
V9
V14
V11
V12
V13
T
28
Detecting Cycles in MPEG-2 LIS-graph
S
V1
V2
V3
V4
Cycles
  • C1

V10
V8
  • C2

V5
V11
  • C3

V6
V7
V9
  • C4

V15
V14
  • C5
  • C6

V12
V13
T
29
MPEG2 - Throughput Degradation
Cycles
Cardinality
3
4
5
8
9
10
Cycle Weight
30
Moving Around the Latency - 1
Critical Cycle
S
V1
V2
V3
V4
V10
V8
V5
V11
V6
V7
V9
thp(G)
V15
V14
V12
V13
T
31
Moving Around the Latency - 2
Critical Cycle
S
V1
V2
V3
V4
V15
V10
V6
V5
V7
V8
V9
V11
V14
thp(G)
V12
V13
T
32
Practical Guidelines for LI Design
  • All modules should put comparable timing
    constraints on the global clock
  • Modules whose corresponding lis-graph nodes
    belong to the same cycle should be kept close
    while deriving the final implementation
  • Relay Station Insertion should be automatically
    performed in a way similar to Buffer Insertion

33
Conclusions
  • LIS-graphs are a formal model to analyze the
    properties of a Latency Insensitive System
  • Recycling is a rigorous method
  • to capture latency variations of the
    communication channels
  • to compute exactly the final throughput of the
    system
  • MPEG-2 Case Study shows that the present work
  • enables the exploration of latency/throughput
    trade-offs at any stages of the design process,
  • facilitates the integration of pre-designed IP
    cores on a single chip.

34
Performance Analysis and Optimization of Latency
Insensitive Systems
  • Luca P. Carloni
  • Alberto L. Sangiovanni-Vincentelli

UC Berkeley
Design Automation Conference Los Angeles, June
2000
Write a Comment
User Comments (0)
About PowerShow.com