Buffersizing for Precedence Graphs on Restricted Multiprocessor Architectures

1 / 68
About This Presentation
Title:

Buffersizing for Precedence Graphs on Restricted Multiprocessor Architectures

Description:

Edward A. Lee, Thomas M. Parks - Proceedings of the IEEE, 1995. 11/12/09. 13. Preliminaries ... increasing communication buffer size to hold all the Wa tokens; ... –

Number of Views:64
Avg rating:3.0/5.0
Slides: 69
Provided by: eecsBe
Category:

less

Transcript and Presenter's Notes

Title: Buffersizing for Precedence Graphs on Restricted Multiprocessor Architectures


1
Buffer-sizing for Precedence Graphs on Restricted
Multiprocessor Architectures
  • Thomas Feng
  • Yang Yang
  • Mentors Qi Zhu, Abhijit Davare

2
Outline
  • Motivation
  • Previous Work
  • Preliminaries
  • Problem Statement
  • Investigative Approach
  • Summary and Conclusion
  • Compare with Prior Work
  • Our Contribution
  • Future Work

3
Outline
  • Motivation
  • Previous Work
  • Preliminaries
  • Problem Statement
  • Investigative Approach
  • Summary and Conclusion
  • Future Work

4
Parallel Heterogeneous Platforms (PHPs)
  • Advantages
  • High computational capability
  • Challenges
  • Explore the theoretically high performance

(From Abhijit Davares Quals Presentation)
5
Project Goals
  • Dataflow Programming Model
  • Infinite Buffers
  • Blocking read, non-blocking write
  • Many scheduling and allocation techniques
  • Multiprocessor Platform
  • Limited connectivity between processors
  • Limited, finite depth FIFOs
  • Low overhead reads and writes to FIFOs

6
Deploying applications on PHPs
  • Computation Synthesis
  • Task Allocation
  • Task Scheduling
  • Communication Synthesis
  • Interconnection Synthesis
  • Buffer sizing (The part we are working on)

7
Buffer Sizing
  • Architectures have bounded buffer resources.
  • If more communication buffer resources are
    utilized, processors may spend less time waiting
    to send/receive data.
  • Additional buffer resources may adversely affect
    communication overhead, achievable clock speed,
    or design closure.

8
Example Flow
Function Model
Architecture
Architecture Model
Function
Allocation Scheduling
Buffer Sizing
9
Outline
  • Motivation
  • Previous Work
  • Preliminaries
  • Problem Statement
  • Investigative Approach
  • Summary and Conclusion
  • Future Work

10
Previous Work
  • Transformations from various statically-schedulabl
    e dataflow variants into precedence DAGs 1
  • Survey on allocation and scheduling algorithms
    for precedence DAGs assuming infinite-length
    buffers 2
  • Minimizing Buffer Requirements for uniprocessor
    architectures 3
  • Minimizing multiprocessor buffer sizing for SDF
    applications under conservative
    (non-interleaving) conditions 4

1 Software Synthesis and Code Generation for
Signal Processing Systems S. Bhattacharyya, R.
Leupers, P. Marwedel - IEEE Transactions on
Circuits and Systems, 2000.
2 Static scheduling algorithms for allocating
directed task graphs to multiprocessorsYK Kwok,
I Ahmad - ACM Computing Surveys, 1999.
3 Minimizing Buffer Requirements of Synchronous
Dataflow Graphs with Model Checking M. Geilen, T.
Basten, S. Stuijk, DAC 2005.
4 Data Memory Minimization for Synchronous Data
Flow Graphs Emulated on DSP-FPGA Targets M. Adé,
R. Lauwereins, J.A. Peperstraete DAC 1997.
11
Outline
  • Motivation
  • Previous Work
  • Preliminaries
  • Problem Statement
  • Investigative Approach
  • Summary and Conclusion
  • Future Work

12
Preliminaries
  • Precedence DAG
  • A precedence DAG is a common representation for
    the deployment of an application across multiple
    processors.
  • Precedence DAG can be generated from statically
    schedulable dataflow descriptions, such as
    synchronous dataflow or cyclo-static dataflow.
    This is suitable for most applications in the
    multimedia domain. 45
  • Our synthesis process starts from precedence DAG.

4 A Hierarchical Multiprocessor Scheduling
System for DSP Applications Jose Luis Pino,
Edward A. Lee, Shuvra S. Bhattacharyya - 29th
Asilomar Conference on Signals, Systems and
Computers, 1995
5 Dataflow process networks Edward A. Lee,
Thomas M. Parks - Proceedings of the IEEE, 1995
13
Preliminaries
  • Synthesis process
  • Relationships among allocation, scheduling and
    buffer sizing
  • Allocation assign each node in precedence DAG to
    a particular processor in the architecture.
  • Scheduling specify an execution sequence for the
    set of tasks on each processor.
  • Buffer sizing assign sizes to inter-processor
    communication channels.
  • In our approach, allocation and scheduling are
    done by assuming unbounded communication buffer
    size. Then buffer sizing will be based on the
    result of allocation and scheduling.

14
Preliminaries
  • Artificial deadlock is deadlock that results when
    the size of buffers between processors is reduced
    from infinity to some finite numbers. 6
  • In buffer sizing, we want to minimize the
    objective function, avoiding artificial deadlock.
  • (Deadlock implies artificial deadlock in
    the following slides.)

6 Requirements on the Execution of Kahn Process
Networks Marc Geilen and Twan Basten -
Programming Languages and Systems 12th European
Symposium on Programming, ESOP 2003.
15
Outline
  • Motivation
  • Previous Work
  • Preliminary
  • Problem Statement
  • Investigative Approach
  • Summary and Conclusion
  • Future Work

16
Problem Statement
Allocation and Scheduling with unbounded
communication buffer size
Bounded communication buffer size
Artificial deadlock
Use internal buffer
Use communication buffer
P1 Can we always find a legal scheduling with
one-place communication buffer by using internal
buffers, assuming we have legal scheduling for
unbounded communication buffer?
P3 Minimize total (or largest) communication
buffer size
P2 Does using internal buffer increase makespan
or not?
similar problems
Minimize total (or largest) internal buffer size
P4 Give an optimal internal buffer assignment
17
Assumptions
  • Interleaving communication is
  • For inter-processor communication, when write
    and read tasks are both active, they can
    communicate any amount of data through one-place
    buffer.

18
Outline
  • Motivation
  • Previous Work
  • Preliminary
  • Problem Statement
  • Investigative Approach
  • Summary and Conclusion
  • Future Work

19
Classification of Blocked nodes
  • In a precedence DAG, we classify nodes which will
    be blocked during execution into 3 kinds.
  • read blocked node -- the node will be blocked
    because it can not read in enough tokens.
  • write blocked node -- the node will be blocked
    because it can not finish writing all the
    produced tokens.
  • scheduling blocked node -- the node can not be
    fired because its previous node on the same
    processor has not finished execution.

20
Our Observations of Deadlock
  • We proved that it is impossible to have deadlock
    with only scheduling blocked nodes and read
    blocked nodes.
  • We proved that if a precedence DAG has deadlock,
    then it must has at least such a pattern called
    write blocked cycle in which
  • - all the schedule edges are in the same
    direction
  • - there must be 1 or more write blocked nodes,
    whose incoming degree is 0 in the cycle
  • - there could be read blocked nodes, whose
    incoming degree is one or more in the cycle
  • - if reversing the directed data edges from all
    the write blocked nodes, it becomes a directed
    cycle.
  • Note not every write blocked edges must be in a
    write block cycle.

21
Write Blocked Cycle
Pi
Pj


Data edge
Schedule edge
ni
nj
Series of several data/schedule edges, in which
the schedule edges are in the same direction as
the schedule edge from ni to ni1 , while the
data edges could be in either direction.
ni1
nj1


nim
njn

mgt1, ngt1

Buffer space from Pi to Pj lt Token count on the
write edge from ni to njn
22
Examples of Write Blocked Cycle
P1
P2
P3
a
c
e
a
a
c
e
2
2
2
2
!
!
!
b
d
f
2
2
!
Fig.2
Fig.1
BufferPiPj 1 i,j1,2,3
23
How to Avoid Deadlock
  • There is no artificial deadlock in a precedence
    DAG if and only if there is no write blocked
    cycles in the graph.
  • Proof by contradiction. (Omitted here.)
  • We can resolve the write blocked cycles by using
    enough communication buffer or internal buffer.

24
How to Avoid Deadlock
a
c
Wa
Bij
b
d
Pi
Pj
  • Wa gt Space(Bij), a is write blocked. Deadlock is
    solved by
  • - increasing communication buffer size to hold
    all the Wa tokens
  • or increasing internal buffer size in Pi to hold
    all the Wa tokens, and then write them to d after
    b and c
  • - or reading all the tokens before c, and
    increasing internal buffer size in Pj to hold all
    the Wa tokens,

25
How to Avoid Deadlock
e
g
Wa
We
a
d
a
c
Wb
Wa
b
e
Bmn
f
h
Pm
Pn
Bij
c
f
Bij
b
d
Pi
Pj
Pi
Pj
Wa gt Space(Bij), We gt Space(Bmn), a and e are
write blocked. Deadlock is solved by increasing
the size of Bij to hold all the Wa, or increasing
the size of Bmn to hold all the Wb tokens or
using internal buffer.
Wa Wb gt Space(Bij), a and b are write
blocked. Deadlock is solved by increasing buffer
size to hold all the Wa Wb tokens or by
increasing internal buffer in Pi or Pj.
26
Problem Statement
Allocation and Scheduling with unbounded
communication buffer size
Bounded communication buffer size
Artificial deadlock
Use internal buffer
Use communication buffer
P1 Can we always find a legal scheduling with
one-place communication buffer by using internal
buffers, assuming we have legal scheduling for
unbounded communication buffer?
P1 Can we always find a legal scheduling with
one-place communication buffer by using internal
buffers, assuming we have legal scheduling for
unbounded communication buffer?
P3 Minimize total (or largest) communication
buffer size
P2 Does using internal buffer increase makespan
or not?
similar problems
Minimize total (or largest) internal buffer size
P4 Give an optimal internal buffer assignment
27
Using Internal Buffers
  • We can always find a legal scheduling S1 with one
    place communication buffers by using internal
    buffers, assuming we have a legal scheduling SU
    requiring unbounded communication buffer.
  • Proof
  • Since the scheduling SU (a partial order) is
    legal, we can always find a sequential order SS
    to execute the nodes, which is also legal. We can
    get SS by arbitrarily assigning a total order
    conforming to the partial order. (to be continued)

28
  • Let the communication buffer between every two
    processors be one place. If there is deadlock in
    SS, obviously the first blocked node must be
    write blocked node.

Simulate this execution. Let x be the first
blocked node and it writes tokens to nodes y1, ,
yk. The block can be eliminated by letting x
write all the produced tokens to the internal
buffers For every node yi on processor pi, write
the tokens from x to the internal buffer of pi.
Corresponding reading codes are inserted right
after the executed codes of pi. Therefore, the
deadlock at x is solved, and following execution
will not be affected by this solution. Repeat
this process until all the write blocked nodes is
eliminated. Consequently, a legal schedule is
found by using internal buffers.
29
Problem Statement
Allocation and Scheduling with unbounded
communication buffer size
Bounded communication buffer size
Artificial deadlock
Use internal buffer
Use communication buffer
P1 Can we always find a legal scheduling with
one-place communication buffer by using internal
buffers, assuming we have legal scheduling for
unbounded communication buffer?
P3 Minimize total (or largest) communication
buffer size
P2 Does using internal buffer increase makespan
or not?
P2 Does using internal buffer increase makespan
or not?
similar problems
Minimize total (or largest) internal buffer size
P4 Give an optimal internal buffer assignment
30
MakeSpan
  • Make span is the maximum completion time for a
    set of processors.
  • Assumptions
  • 1. Interprocessor communication takes place
    through bounded depth FIFOs with blocking reads
    and writes
  • 2. Unlimited internal buffer space is available
    on each processor
  • Conjecture
  • For a task precedence graph, if insufficient
    FIFO depth leads to deadlock, reading and writing
    can be reordered in such a way that deadlock is
    eliminated and makespan is not affected.
  • Counterexample
  • The example is scheduled in such a way that
    multiple paths are relatively critical.
    Reordering the reads and writes to eliminate the
    deadlock increases the length of some of the
    relatively critical paths, extending the
    makespan, even if tx/rx time ltlt computation time.

31
P3
215
e
Communication Model Tx/Rx time 5 units Latency
0 units
P1
P2
a
c
f
10
10
10
g
d
b
200
280
50
P5
i
80
290
305
h
300
315
P4
315
32
  • In the example, edges a-gtd and c-gtg may be
    blocked due to insufficient FIFO depth. Without
    increasing the FIFO depth, there are 4 ways to
    resolve this
  • Move a-gtd communication after b
  • Move a-gtd communication before c
  • Move c-gtg communication after d
  • Move c-gtg communication before f
  • Options 1 and 3 delay d and g by a large amount,
    and increase the makespan significantly
  • Options 2 and 4 extend the critical paths that
    end at h and i

33
Problem Statement
Allocation and Scheduling with unbounded
communication buffer size
Bounded communication buffer size
Artificial deadlock
Use internal buffer
Use communication buffer
P1 Can we always find a legal scheduling with
one-place communication buffer by using internal
buffers, assuming we have legal scheduling for
unbounded communication buffer?
P3 Minimize total (or largest) communication
buffer size
P3 Minimize total (or largest) communication
buffer size
P2 Does using internal buffer increase makespan
or not?
similar problems
Minimize total (or largest) internal buffer size
P4 Give an optimal internal buffer assignment
34
NP-hard Problem
  • Formally, the problem DEADLOCK-FREE-MIN-BUFFER
    (DFMB) is defined as follows Given a Precedence
    DAG D, find the minimal total buffer size, such
    that there is no deadlock in D.
  • The problem DEADLOCK-FREE-MIN-BUFFER is NP-hard.
  • Proof
  • We prove it by reducing FEEDBACK ARC SET (FAS)
    problem, which known to be NP-complete, step by
    step to the DRMB problem.

35
  • The FEEDBACK ARC SET (FAS) Problem is the
    following Given a directed graph G(V, E), and a
    positive integer K, does there exist a subset
    , such that B contains at least one
    edge from every directed cycle in G?
  • This problem is known to be NP-complete 7

7 Computers and Intractability M. R. Garey and
D.S. Johnson - W. H. Freeman and Co., NY 1979
36
  • First, we prove FAS problem can be reduced to the
    Problem below
  • Problem B Given a directed graph G(V, E) with
    weight w(e) on every edge , find the
    minimal , such that
    and B contains at least one edge from every
    directed cycle in G.
  • Then we prove Problem B can be reduced to DFMB
    problem, by proving that an arbitrary instance X
    of Problem B can be transformed to an instance X
    of DFMB problem in polynomial time, and the
    result of X is got by solving X.

37
Instance X
Instance X
A vertex in G(V, E)
The corresponding nodes and schedule edge in D(N,
E)
The corresponding data edge in D(N, E)
An edge in G(V, E)
38
A directed cycle in G (V, E)
A write blocked cycle in D (N, E), where E DE
U SE
39
  • Solution to Instance X
  • min , and B
    contains at least one edge from every directed
    cycle in G
  • min , and M
    contains at least one edge from every writing
    block cycle in D
  • Solution to Instance X

40
Algorithms
41
Minimizing Maximum FIFO Size (1)
  • Mathematical Model
  • V v1, v2, , vm. The set of vertices.
  • P p1, p2, , pl. The set of processors.
  • M V ? P. Mapping from vertices to the processors
    they are scheduled on.
  • E e1, e2, , en. The set of edges.
  • S e e ? E ? M(src(e)) M(des(e)). Set of
    schedule edges.
  • D e e ? E ? M(src(e)) ? M(des(e)). Set of
    data edges.
  • W D ? R . The weight function.
  • F P ? P ? R. The function that returns the FIFO
    size. F(pi, pj) need not be equal to F(pj, pi).

42
Minimizing Maximum FIFO Size (2)
  • Formalizing the problems
  • Find an algorithm such that given a schedule ltV,
    P, M, E, Wgt, find a valid F function, such that
    maxF(pi, pj) is minimized (a.k.a. min max
    problem).
  • With interleaving communication.
  • Without interleaving communication.
  • Find an algorithm such that given a schedule ltV,
    P, M, E, Wgt, find a valid F function, such that
    ?F(pi, pj) is minimized (a.k.a. min total
    problem).

43
Min Max Problem (1)
  • Free vertices the vertices with no incoming
    edges. (a and c in the figure)
  • Free edges the edges starting from free
    vertices. (ab, ad, cd, ce in the figure)
  • Our algorithm always deals with free edges. When
    one free edge is resolved with the algorithm,
    some other edges may become free.

44
Dependency Graph
  • Dependency graph A graph constructed from the
    precedence DAG by making all the data edges
    bidirectional.
  • A data edge implies bidirectional dependency
    between the two vertices. A schedule edge is
    still unidirectional.

45
Min Max Problem (2)
  • 4 types of free edges (priority 1gt2gt3gt4)
  • 1 Free Schedule edge, and the source has no
    other outgoing edges.
  • 2 Free data edge between two free vertices
    (ignoring the incoming data edges to the second
    vertex).
  • 3 Free data edge that is not 2 and is not in a
    dependency cycle.
  • 4 Free data edge that is not 2 and is in a
    dependency cycle.

46
Min Max Problem (3)
  • 1 Just delete it, because a can finish
    immediately. b becomes free.
  • 2 Just delete it, because a and c can run
    simultaneously with interleaving communication.

47
Min Max Problem (4)
  • 3 Just delete it, because d will be ready
    later, and a just needs to wait.
  • 4 Resolve blocking before deleting the edge.
    Increase FIFO size if no space left otherwise,
    use the space first.

48
Choice of Free Edges (Min Max)
  • If edges of 1, 2 or 3 exists, remove them
    first.
  • If only edges of 4 are left, choose one of them
    to resolve in a greedy manner
  • Among the edges of 4, always pick the one e such
    that F(M(src(e)), M(des(e))) is minimal after e
    is resolved.

49
Min Max Proof of Optimality
  • Induction. G is the complete precedence DAG. At
    step i, Gi is the sub-graph we have solved. G
    Gi is the sub-graph with only the remaining
    edges. Fi is the F function at step i.
  • Base case G0 is empty. G0 is optimal.
  • Induction step Assume Gk is optimal
    (maxFk(M(src(e)), M(des(e))) e ? Gk is
    minimal). Prove Gk1 is also optimal.
  • Gk1 is obtained by either removing an edge of
    type 1, 2 or 3 (in which case Gk1 is
    obviously optimal), or updating FIFO for an edge
    of type 4. In the latter case, we always pick an
    edge ek1 such that Fk1(M(src(ek1)),
    M(des(ek1)) is minimum among such edges. Then,
  • max Fk1 (M (src(e)), M (des(e))) e ?
    Gk1
  • max( max Fk (M (src(e)), M (des(e))) e ?
    Gk,
  • Fk1 (M (src(ek1)), M (des(ek1)))
  • is also minimal. So, Gk1 is optimal.
  • This proof does not work with min total.

50
Linear cycle detection algorithm Ac
  • To decide whether data edge from a to d is in a
    cycle
  • Without considering edge (a, d) in the dependency
    graph, can we find d by traversing the graph from
    a?
  • Without considering edge (d, a) in the dependency
    graph, can we find a by traversing the graph from
    d?
  • If either case is true, then return true
    otherwise, false.

51
Quadratic Min Max Algorithm Am
  • ?i?P, j?P. spaceij0, fifoij0
  • while E is not empty do
  • type0 sel_srcNone sel_desNone
    min_fifo-1.0
  • for each edge e(src,des) do
  • if src is free
  • if e is 1 then
  • type3 sel_srcsrc sel_desdes
  • else if typelt2 and e is 2 then
  • type2 sel_srcsrc sel_desdes
  • else if typelt1 then // 3 or 4
  • new_fifo_sizecalculate_fifo(src,d
    es)
  • if min_fifolt0 or
    min_fifogtfifo_size then
  • type1 sel_srcsrc
    sel_desdes min_fifofifo_size
  • switch (type)
  • case 3 remove(sel_src,sel_des) break
  • case 2 remove(sel_src,sel_des)
    remove(sel_des,sel_src) break
  • case 1 if (sel_src,sel_des) is in a
    cycle according to Ac then
  • ... / resolve blocking and
    update fifo and space /
  • remove(sel_src,sel_des)
    remove(sel_des,sel_src) break

52
Exponential Min Total Algorithm At
  • At is very similar to Am
  • Except that if only free edges of 4 are left, At
    picks them one by one in an arbitrary order, and
    each time it recursively computes the FIFO size
    based on that choice.
  • After finishing computing one FIFO, it backtracks
    and picks another such edge to try again.
  • This process ends when all possible sequences of
    choices are exhausted. The FIFO with the minimum
    total size is returned.
  • Some intermediate result can be saved and reused.
  • Because the exact min total problem is NP-hard,
    At has to be exponential.

53
Lower bound of FIFO size for Non-interleaving
Communication Ln
  • For platforms that do not allow interleaving
    communication, we compute a conservative lower
    bound of FIFO size.
  • A common assumption in the literature.
  • Let
  • ?p1 ? P, p2 ? P. (p1 p2) ? Ln(p1, p2) 0 and
    (p1 ? p2) ?
  • Ln(p1, p2) maxW(e) e ? E and M(src(e)) p1
    and M(des(e)) p2
  • Under this assumption, this must be true for any
    valid F function
  • ? p1 ? P, p2 ? P. Ln(p1,p2) F(p1,p2)
  • Ln(p1,p2) F(p1,p2) may not be achievable in
    most non-trivial cases.

54
Example Am, At and Ln
  • Am
  • resolve (v0, v4)
  • resolve (v3, v7)
  • F(p0, p1) 2, F(p1, p2) 3
  • max 3, total 5
  • At
  • resolve (v3, v7)
  • F(p1, p2) 3
  • max 3, total 3
  • Ln
  • Ln(p0, p1) 2, Ln(p0, p2) Ln(p1, p2) 3
  • max 3, total 8

55
Implementation
  • The code is written in C and compiled with GCC
    and VC.
  • It uses the BGL (Boost Graph Library).
  • Adjacency list data structure for graphs.
  • Input
  • A text description of the tasks to be scheduled
    and the number of processors. Hand written or
    generated by TGFF (Task Graphs For Free)
    randomly.
  • Output
  • Screen output of the FIFO sizes between each two
    processors, and the time spent in each algorithm.

56
Benchmark of Bigger Tests
57
Outline
  • Motivation
  • Previous Work
  • Preliminaries
  • Problem Statement
  • Investigative Approach
  • Summary and Conclusion
  • Future Work

58
Compare with Prior Work
  • We are focusing on
  • Deploying buffer for multi-processors
    architecture using interleaving communication.
  • - Uniprocessor v.s. Multiprocessor
  • Lots of prior work was on uniprocessor.
  • For multiprocessor, it is more complicated to do
    buffer sizing not much work on this, especially
    no work under interleaving communication form.
  • - Non-interleaving communication v.s.
    Interleaving communication for inter-processor
    communication
  • Non-interleaving communication is write and
    read tasks can not communicate data in the
    interleaving way, so the read tasks will not
    start reading until all the data are ready to be
    read. It a conservative way.
  • Interleaving communication is when write and
    read tasks are both active, they can communicate
    any amount of data through one-place buffer. This
    way is more efficient.

59
Our Contribution
  • Theory
  • We proved
  • - Several properties about deadlock in a
    precedence graph.
  • - Given a legal scheduling for unbounded
    communication buffer, we can always find a legal
    scheduling with one-place communication buffer by
    using internal buffers. (P1)
  • - Using internal buffer will affect makespan.
    (P2)
  • - The problem given a Precedence DAG, find the
    minimal total buffer size avoiding deadlock is
    NP-hard. (P3)
  • Implementation
  • We implemented and tested
  • - Algorithms for minimizing maximum FIFO size
    without using internal buffers (P3)
  • - Algorithms for minimizing total FIFO size
    without using internal buffers. (P3)

60
Summary
  • Avoidance of artificial deadlock on architectures
    that support low-overhead interleaving
    communication
  • Show sufficiency of one-place buffers for
    inter-processor communication
  • Possible increase of minimum makespan with
    limited FIFO depth
  • Algorithms for minimizing two cost functions for
    FIFO size without using internal buffers
  • Minimize maximum FIFO size
  • Minimize total FIFO size (proved to be NP-hard)

61
Conclusions
  • Identification of an implementation gap between
    dataflow programming models and architectural
    platforms
  • Must be bridged to enable automated code synthesis

62
Future work
  • Better heuristics for Min-Total
  • Compare with optimal buffer sizing under no
    interleaving communication
  • Industrial case studies

63
Future Work Case Studies
  • Apply to industrial applications
  • JPEG encoder
  • Motion JPEG encoder
  • H.264 encoder
  • Deployment on multiprocessor architectures
  • Intel MXP5800
  • Xilinx Virtex II Pro with Fast Simplex Link (FSL)
    communication

64
Thank you!
65
Background Slides
66
Precedence DAG
  • A precedence directed acyclic graph (DAG) is a
    common representation for the deployment of an
    application across multiple processors.
    Precedence DAGs can be generated from statically
    schedulable dataflow descriptions, such as
    synchronous dataflow or cyclo-static dataflow.
  • The nodes in this graph represent an execution
    (or firing) of a particular task in the
    application. Node weights represent the estimated
    execution times on each of the processors in the
    architecture.
  • The directed edges in the graph represent data
    dependencies between nodes. A node can only be
    activated after data is received from all
    predecessor nodes. Edge weights represent the
    amount of data transferred between edges.

67
Assumptions
  • Firing a node
  • we abstract the firing of a node (or a task) to
    be such an execution sequence on a single
    processor.

Pi
68
Assumptions
  • Two kinds of deadlock are already avoided in
    scheduling step

Data edge
Schedule edge
Pi
Pj
either of the above
ni
nj
nm
n1
ni1
nj1


n2
n4
n3
nim
njn
2. Directed cycle
1. Cross
Write a Comment
User Comments (0)
About PowerShow.com