Lecture 4 Clocked DataFlow Models - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 4 Clocked DataFlow Models

Description:

none – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 67
Provided by: Forrest2
Category:

less

Transcript and Presenter's Notes

Title: Lecture 4 Clocked DataFlow Models


1
Lecture 4Clocked Data-Flow Models
  • Forrest Brewer

2
Data Flow Model Hierarchy
  • Kahn Process Networks (KPN) (asynchronous)
  • Dataflow Networks
  • special case of KPN
  • actors, tokens and firings
  • Static Data Flow (Clocked automata assumptions)
  • special case of DN
  • static scheduling
  • code generation
  • buffer sizing (resources!!)
  • Other Data Flow models
  • Boolean Data Flow
  • Dynamic Data Flow
  • Sequence Graphs, Dependency Graphs, Data Flow
    Graphs
  • Control Data Flow

3
Data Flow Models
  • Powerful formalism for data-dominated system
    specification
  • Partially-ordered model (over-specification)
  • Deterministic execution independent of
    scheduling
  • Used for
  • simulation
  • scheduling
  • memory allocation
  • code generation
  • for Digital Signal Processors (HW and SW)

4
Data Flow Networks
  • A Data Flow Network is a collection of actors
    which are connected and communicate over
    unbounded FIFO queues
  • Actors firing follows firing rules
  • Firing rule number of required tokens on inputs
  • Function number of consumed and produced tokens

  • Actors are functional i.e. have no internal
    state
  • Breaking processes of KPNs down into smaller
    units of computation makes implementation easier
    (scheduling)
  • Tokens carry values
  • integer, float, audio samples, image of pixels
  • Network state number of tokens in FIFOs

5
Intuitive semantics
  • At each time, one actor is fired
  • Can fire more but one is always safe (atomic
    firing)
  • When firing, actors consume input tokens and
    produce output tokens
  • Actors can be fired only if there are enough
    tokens in the input queues

6
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1

o
7
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1

o
8
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1

o
9
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1

o
10
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1

o
11
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1
i(-1)

o
12
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1
i(-1)

o
13
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1
i(-1)

o
14
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1
i(-1)

o
15
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1
i(-1)

o
16
Filter example
  • Example FIR filter
  • single input sequence i(n)
  • single output sequence o(n)
  • o(n) c1 i(n) c2 i(n-1)

c2
i
c1
i(-1)

o
17
Scheduling Data Flow
  • Given a set of Actors and Dependencies
  • How to construct valid execution sequences?
  • Static Scheduling
  • Assume that you can predefine the execution
    sequence
  • FSM Scheduling
  • Sequencing defined as control-dependent FSM
  • Dynamic Scheduling
  • Seqnencing determined dynamically (run-time) by
    predefined rules
  • In all cases, need to not violate resource or
    dependency constraints
  • In general, both actors and resources can
    themselves have sequential (FSM) behaviors

18
A RISC Instruction Execution
19
Another RISC Instruction Execution
20
A Complete RISC Behavior Graph
21
Scheduling Best Valid Sequences
0
RISC Instruction Task
22
Scheduling Best Valid Sequences
1
23
Scheduling Best Valid Sequences
2
24
Scheduling Best Valid Sequences
3
25
Scheduling Best Valid Sequences
4
26
Examples of Data Flow actors
  • SDF Synchronous (or Static) Data Flow
  • fixed number of input and output tokens per
    invocation
  • BDF Boolean Data Flow
  • control token determines consumed and produced
    tokens

T
F
TTTF
TTTF
fork
join
F
T
27
Examples of Data Flow actors
  • Sequence Graphs, Dependency Graph, Data Flow
    Graph
  • Each edge corresponds to exactly one value
  • No buffering
  • Special Case of SDF
  • CDFG Control Data Flow Graphs
  • Adds branching (conditionals) and iteration
    constructs
  • Many different models for this

1
1


1
1
1
1

1
Typical model in many behavioral/architectural
synthesis tools
28
Synthesis in Temporal Domain
  • Scheduling and binding can be done in different
    orders or together
  • Schedule
  • Mapping of operations to time slots binding to
    resources
  • A scheduled sequencing graph is a labeled graph

Gupta
29
Operation Types
  • Operations have types
  • Each resource may have several types and timing
    constraints
  • T is a relation that maps an operation to a
    resource by matching types
  • T V ? 1, 2, ..., nres.
  • In general
  • A resource type may implement more than one
    operation type ( ALU)
  • May have family of timing constraints
    (data-dependent timing?!)
  • Resource binding
  • Notion of exclusive mapping
  • Pipeline resources or other state?
  • Arbitration
  • Choice linked to complexity of interconnect
    network

30
(No Transcript)
31
Scheduling and Binding
  • Resource constraints
  • Number of resource instances of each type ak
    k1, 2, ..., nres.
  • Link, register, and communication resources
  • Scheduling
  • Timing of operation
  • Binding
  • Location of operation
  • Costs
  • Resources ? area (power?)
  • Registers, steering logic (Muxes, busses),
    wiring, control unit
  • Metric
  • Start time of the sink node
  • Might be affected by steering logic and schedule
    (control logic) resource-dominated vs.
    ctrl-dominated

32
Architectural Optimization
  • Optimization in view of design space flexibility
  • A multi-criteria optimization problem
  • Determine schedule f and binding b.
  • Given area A, latency l and cycle time t
    objectives
  • Find non-dominated points in solution space
  • Pareto-optimal solutions
  • Solution space tradeoff curves
  • Non-linear, discontinuous
  • Area / latency / cycle time (Power?, Slack?,
    Registers?, Simplicity?)
  • Evaluate (estimate) cost functions
  • Constrained optimization problems for resource
    dominated circuits
  • Min area solve for minimal binding
  • Min latency solve for minimum l scheduling

Gupta
33
Operation Scheduling
  • Input
  • Sequencing graph G(V, E), with n vertices
  • Cycle time t.
  • Operation delays D di i0..n.
  • Output
  • Schedule f determines start time ti of operation
    vi.
  • Latency l tn t0.
  • Goal determine area / latency tradeoff
  • Classes
  • Unconstrained
  • Latency or Resource constrained
  • Hierarchical (accommodate control transfer!)
  • Loop/Loop Pipelined

Gupta
34
Min Latency Unconstrained Scheduling
  • Simplest case no constraints, find min latency
  • Given set of vertices V, delays D and a partial
    order on operations E, find an integer labeling
    of operations f V ? Z Such that
  • ti f(vi).
  • ti ? tj dj ? (vj, vi) ? E.
  • l tn t0 is minimum.
  • Solvable in polynomial time
  • Bounds on latency for resource constrained
    problems

Algorithm?
ASAP algorithm used topological order
35
ASAP Schedules
  • Schedule v0 at t00.
  • While (vn not scheduled)
  • Select vi with all scheduled predecessors
  • Schedule vi at ti max tjdj, vj being a
    predecessor of vi.
  • Return tn.

36
ALAP Schedules
  • Schedule vn at t0l.
  • While (v0 not scheduled)
  • Select vi with all scheduled successors
  • Schedule vi at ti min tj-dj, vj being a
    succecessor of vi.

NOP
1
?
?
?
?
2

?
-
?
3
-


4
NOP
37
Resource Constraint Scheduling
  • Constrained scheduling
  • General case NP-complete (3 or more resources)
  • Minimize latency given constraints on area orthe
    resources (ML-RCS)
  • Minimize resources subject to bound on latency
    (MR-LCS)
  • Exact solution methods
  • ILP Integer Linear Programming (Lin, Gebotys)
  • Symbolic Scheduling (Haynal, Radevojevic)
  • Hus heuristic algorithm for identical
    processors
  • Heuristics
  • List scheduling
  • Force-directed scheduling
  • Taboo search, Monte-Carlo, many others

38
Simplified ILP Formulation
  • Use binary decision variables
  • i 0, 1, ..., n
  • l 1, 2, ..., l1 l given upper-bound on
    latency
  • xil 1 if operation i starts at step l, 0
    otherwise.
  • Set of linear inequalities (constraints),and an
    objective function (min latency)
  • Observations
  • ti start time of op i.
  • is op vi (still) executing at step l?

39
Start Time vs. Execution Time
  • Each operation vi , exactly one start time
  • If di1, then the following questions are the
    same
  • Does operation vi start at step l?
  • Is operation vi running at step l?
  • But if di1, then the two questions should be
    formulated as
  • Does operation vi start at step l?
  • Does xil 1 hold?
  • Is operation vi running at step l?
  • Does the following hold?

40
Operation vi Still Running at Step l ?
  • Is v9 running at step 6?
  • Is x9,6 x9,5 x9,4 1 ?
  • Note
  • Only one (if any) of the above three cases can
    happen
  • To meet resource constraints, we have to ask the
    same question for ALL steps, and ALL operations
    of that type

41
ILP Formulation of ML-RCS (cont.)
  • Constraints
  • Unique start times
  • Sequencing (dependency) relations must be
    satisfied
  • Resource constraints
  • Objective min cTt.
  • t start times vector, c cost weight (e.g., 0 0
    ... 1)
  • When c 0 0 ... 1, cTt

42
ILP Example
  • First, perform ASAP and ALAP (l 4)
  • (we can write the ILP without ASAP and ALAP, but
    using ASAP and ALAP will simplify the
    inequalities)

v2
v1
v2
v1
v6
v8
v10
v6
v3
v7
v9
v11
v3
v4
v8
v4
v10
v7
v9
v5
v11
v5
vn
vn
43
ILP Example Unique Start Times Constraint
  • Using ASAP and ALAP
  • Without using ASAP and ALAP values

44
ILP Example Dependency Constraints
  • Using ASAP and ALAP, the non-trivial inequalities
    are (assuming unit delay for and )

45
ILP Example Resource Constraints
  • Resource constraints (assuming 2 adders and 2
    multipliers)
  • Objective Min Xn,4

46
ILP Formulation of Resource Minimization
  • Dual problem to Latency Minimization
  • Objective
  • Goal is to optimize total resource usage, a.
  • Objective function is cTa , where entries in
    c are respective area costs of resources
  • Constraints
  • Same as ML-RCS constraints, plus
  • Latency constraint added
  • Note unknown ak appears in constraints.

Gupta
47
Hus Algorithm
  • Simple case of the scheduling problem
  • All operations have unit delay
  • All operations (and resources) of the same type
  • Graph is forest
  • Hus algorithm
  • Greedy
  • Polynomial AND optimal
  • Computes lower bound on number of resources for a
    given latencyOR computes lower bound on latency
    subject to resource constraints

Gupta
48
Basic Idea Hus Algorithm
  • Relies on labeling of operations
  • Based on their distances from the sink
  • Length of the longest path passing through that
    node
  • Try to schedule nodes with higher labels
    first(i.e., most critical operations have
    priority)
  • Schedule a nodes at a time
  • a is the number of resources
  • Only schedule nodes that have all their
    parent/predecessors scheduled
  • Each time you schedule one time step (start with
    step 1, 2, 3,

Gupta
49
Hus Algorithm
  • HU (G(V,E), a)
  • Label the vertices // label length of longest
    path passing through the vertex
  • l 1
  • repeat
  • U unscheduled vertices in V whose
    predecessors have been scheduled (or have no
    predecessors)
  • Select S ? U such that S ? a and labels in
    S are maximal
  • Schedule the S operations at step l by
    setting til, i vi ? S.
  • l l 1
  • until vn is scheduled.

50
Hus Algorithm Example
Step 1 Label Vertices (Assume all operations
have unit delays)
Gupta
51
Hus Algorithm Example
Find unscheduled vertices with scheduled parents
pick 3 (num. resources) that maximize labels
Gupta
52
Hus Algorithm Example
Repeat until all nodes are scheduled
Gupta
53
List Scheduling
  • Heuristic methods for RCS and LCS
  • Does NOT guarantee optimum solution
  • Similar to Hus algorithm
  • Greedy strategy
  • Operation selection decided by criticality
  • O(n) time complexity
  • More general input
  • Works on general graphs (unlike Hus)
  • Resource constraints on different resource types

54
List Scheduling Algorithm ML-RCS
LIST_L (G(V,E), a) l 1 repeat for eac
h resource type k Ul,k available vertices
in V Tl,k operations in progress. Select
Sk ? Ul,k such that Sk Tl,k ? ak
Schedule the Sk operations at step l
l l 1 until vn is scheduled.
55
List Scheduling Example
Assumptions three multipliers with latency 2 1
ALU with latency 1
Gupta
56
List Scheduling Algorithm MR-LCS
LIST_R (G(V,E), l) a 1, l 1 Compute t
he ALAP times tL. if t0L ible) repeat for each resource type k
Ul,k available vertices in V.
Compute the slacks si tiL - l, ? vi? Ul,k
. Schedule operations with zero slack, update
a Schedule additional Sk ? Ul,k under a const
raints l l 1 until vn is scheduled
.
57
Force-Directed Scheduling
  • Paulin and Knight DAC87
  • Similar to list scheduling
  • Can handle ML-RCS and MR-LCS
  • For ML-RCS, schedules step-by-step
  • BUT, selection of the operations tries to find
    the globally best set of operations
  • Difference with list scheduling in selecting
    operations
  • Select operations with least force
  • Consider the effect on the type distribution
  • Consider the effect on successor nodes and their
    type distributions
  • Idea
  • Find the mobility mi tiL tiS of operations
  • Look at the operation type probability
    distributions
  • Try to flatten the operation type distributions

Gupta
58
Force-Directed Scheduling
  • Rationale
  • Reward uniform distribution of operations across
    schedule steps
  • Force
  • Used as a priority function
  • Related to concurrency sort operations for
    least force
  • Mechanical analogy Force constant x
    displacement
  • Constant operation-type distribution
  • Displacement change in probability
  • Definition operation probability density
  • pi ( l ) Pr vi starts at step l .
  • Assume uniform distribution

Gupta
59
Force-Directed Scheduling Definitions
  • Operation-type distribution (NOT normalized to
    1)
  • Operation probabilities over control steps
  • Distribution graph of type k over all steps
  • qk ( l ) can be thought of as expected operator
    cost for implementing operations of type k at
    step l.

60
Example
0
61
Forces
  • Self-force
  • Sum of forces to other steps
  • Self-force for operation vi in step l
  • Successor-force
  • Related to scheduling of the successors
    operations
  • Delay an operation may cause the delay of its
    successors

62
Example operation v6
Multiply
Add
  • It can be scheduled in the first two steps
  • p(1) p(2) 0.5, p(3) p(4) 0.0
  • Distribution q(1) 2.8, q(2) 2.3
  • Assign v6 to step 1
  • Variation in probability of step 1 1 0.5
    0.5
  • Variation in probability of step 2 0 0.5
    -0.5
  • Self-force 2.8 x 0.5 - 2.3 x 0.5 0.25

63
Example operation v6
Multiply
Add
  • Assign v6 step 2
  • Variation in probability 0 0.5 -0.5
  • Variation in probability 1 0.5 0.5
  • Self-force 2.8 x -0.5 2.3 x 0.5 -0.25

64
Example operation v6
Multiply
Add
  • Successor-force
  • Operation v7 assigned to step 3
  • 2.3(0 0.5) 0.8(1 0.5) -.75
  • Total-force -1
  • Conclusion
  • Least force is for step 2
  • Assigning v6 to step 2 reduces concurrency

65
Force Directed Scheduling Algorithm
66
Conclusions
  • ILP optimal, but exponential runtime (often)
  • Hus
  • Optimal and polynomial
  • Very restricted cases
  • List scheduling
  • Extension to Hus for general case
  • Greedy (fast) O(n2) but suboptimal
  • Force directed O(n3)
  • More complicated list scheduling algorithm
  • Take into account more global view of the graph
  • Still suboptimal
  • Next Time Automata-Based Scheduling
Write a Comment
User Comments (0)
About PowerShow.com