Examples of One-Dimensional Systolic Arrays - PowerPoint PPT Presentation

About This Presentation
Title:

Examples of One-Dimensional Systolic Arrays

Description:

Using systolic array for polynomial evaluation. This pipelined array can produce a polynomial on new X value on every cycle - after 2n stages. ... – PowerPoint PPT presentation

Number of Views:201
Avg rating:3.0/5.0
Slides: 25
Provided by: MarekPe2
Learn more at: http://web.cecs.pdx.edu
Category:

less

Transcript and Presenter's Notes

Title: Examples of One-Dimensional Systolic Arrays


1
Examples of One-Dimensional Systolic Arrays
2
Motivation Introduction
  • We need a high-performance , special-purpose
    computer
  • system to meet specific application.
  • I/O and computation imbalance is a notable
    problem.
  • The concept of Systolic architecture can map
    high-level
  • computation into hardware structures.
  • Systolic system works like an automobile
    assembly line.
  • Systolic system is easy to implement because of
    its
  • regularity and easy to reconfigure.
  • Systolic architecture can result in
    cost-effective , high-
  • performance special-purpose systems for a wide
    range
  • of problems.

3
Pipelined Computations
4
Pipelined Computations
  • Pipelined program divided into a series of tasks
    that have to be completed one after the other.
  • Each task executed by a separate pipeline stage
  • Data streamed from stage to stage to form
    computation

5
Pipelined Computations
  • Computation consists of data streaming through
    pipeline stages
  • Execution Time Time to fill pipeline (P-1)
    Time to run in steady state (N-P1)
  • Time to empty pipeline (P-1)

P of processors N of data items (assume P
lt N)
This slide must be explained in all detail. It is
very important
6
Pipelined Example Sieve of Eratosthenes
  • Goal is to take a list of integers greater than 1
    and produce a list of primes
  • E.g. For input 2 3 4 5 6 7 8 9 10, output is
    2 3 5 7
  • A pipelined approach
  • Processor P_i divides each input by the i-th
    prime
  • If the input is divisible (and not equal to the
    divisor), it is marked (with a negative sign) and
    forwarded
  • If the input is not divisible, it is forwarded
  • Last processor only forwards unmarked (positive)
    data primes

7
Sieve of Eratosthenes Pseudo-Code
  • Code for last processor
  • xrecv(data,P_(i-1))
  • If xgt0 then send(x,OUTPUT)
  • Code for processor Pi (and prime p_i)
  • xrecv(data,P_(i-1))
  • If (xgt0) then
  • If (p_i divides x and p_i x ) then
    send(-x,P_(i1)
  • If (p_i does not divide x or p_i x) then
    send(x, P_(i1))
  • Else
  • Send(x,P_(i1))

/
Processor P_i divides each input by the i-th prime
8
Programming Issues
  • Algorithm will take NP-1 to run where N is the
    number of data items and P is the number of
    processors.
  • Can also consider just the odd bnys or do some
    initial part separately
  • In given implementation, number of processors
    must store all primes which will appear in
    sequence
  • Not a scalable approach
  • Can fix this by having each processor do the job
    of multiple primes, i.e. mapping logical
    processors in the pipeline to each physical
    processor
  • What is the impact of this on performance?

processor does the job of three primes
9
Processors for such operation
  • In pipelined algorithm, flow of data moves
    through processors in lockstep.
  • The design attempts to balance the work so that
    there is no bottleneck at any processor
  • In mid-80s, processors were developed to support
    in hardware this kind of parallel pipelined
    computation
  • Two commercial products from Intel
  • Warp (1D array)
  • iWarp (components for 2D array)
  • Warp and iWarp were meant to operate
    synchronously Wavefront Array Processor (S.Y.
    Kung) was meant to operate asynchronously,
  • i.e. arrival of data would signal that it was
    time to execute

10
Systolic Arrays
11
Example 1 pipelined polynomial evaluation
12
Example 1 pipelined polynomial evaluation
  • Polynomial Evaluation is done by using a Linear
    array with 2D.
  • Expression
  • Y ((((anxan-1)xan-2)xan-3)xa1)x a0
  • Function of PEs in pairs
  • 1. Multiply input by x
  • 2. Pass result to right.
  • 3. Add aj to result from left.
  • 4. Pass result to right.

13
Example 1 polynomial evaluation
Y ((((anxan-1)xan-2)xan-3)xa1)x a0
Multiplying processor
X is broadcasted
Adding processor
  • Using systolic array for polynomial evaluation.
  • This pipelined array can produce a polynomial on
    new X value on every cycle - after 2n stages.
  • Another variant you can also calculate various
    polynomials on the same X.
  • This is an example of a deeply pipelined
    computation-
  • The pipeline has 2n stages.

x
an-1
an-2
an
x
a0
x
x
.
X


X

X
X

14
For you to think about
  1. Pipelined Graph Coloring
  2. Pipelined Satisfiability
  3. Pipelined sorting/absorbing
  4. Pipelined decision function like Petrick
    Function.
  5. Pipelined multiplication.
  6. Pipelined calculation of (A B) (C D) on
    vectors A, B, C, D.

15
Example 2Matrix Vector Multiplication
16
Example 2Matrix Vector Multiplication
  • There are many ways to solve a matrix problems
    using systolic arrays, some of the methods are
  • Triangular Array performing gaussian elimination
    with neighbor pivoting.
  • Triangular Array performing orthogonal
    triangularization.
  • Simple matrix multiplication methods are shown in
    next slides.

17
Example 2Matrix Vector Multiplication
  • Matrix Vector Multiplication
  • Each cells function is
  • 1. To multiply the top and bottom inputs.
  • 2. Add the left input to the product just
    obtained.
  • 3. Output the final result to the right.
  • Each cell consists of an adder and a few
    registers. (Booth Algorithm for mul).
  • Or, a cell can include a hardware multiplier.

18
Matrix Multiplication
Example 2Matrix Vector Multiplication
  • At time t0 the array receives 1, a, p, q, and r
    ( The other inputs are all zero).
  • At time t1, the array receive m, d, b, p, q, and
    r .e.t.c
  • The results emerge after 5 steps.

19
  • Explain how to multiply the first row of the
    matrix by the vector,
  • how data are shifted from left to right in the
    architecture

To visualize how it works it is good to do a
snapshot animation
20
Systolic Algorithms and Architectures
21
Systolic Algorithms
  • Systolic arrays were built to support systolic
    algorithms, a hot area of research in the early
    80s
  • Systolic algorithms used pipelining through
    various kinds of arrays to accomplish
    computational goals
  • Some of the data streaming and applications were
    very creative and quite complex
  • CMU a hotbed of systolic algorithm and array
    research (especially H.T. Kung and his group)

22
Systolic Arrays from Intel
  • Warp and iWarp were examples of systolic arrays
  • Systolic means regular and rhythmic,
  • data was supposed to move through pipelined
    computational units in a regular and rhythmic
    fashion
  • Systolic arrays meant to be special-purpose
    processors or co-processors.
  • They were very fine-grained
  • Processors implement a limited and very simple
    computation, usually called cells
  • Communication is very fast, granularity meant to
    be around one operation/communication!

23
Systolic Processors, versus Cellular Automata
versus Regular Networks of Automata
Data Path Block
Data Path Block
Data Path Block
Data Path Block
Systolic processor
Control Block
Control Block
Control Block
Control Block
Cellular Automaton
These slides are for one-dimensional only
24
Systolic Processors, versus Cellular Automata
versus Regular Networks of Automata
Control Block
Control Block
Control Block
Control Block
Data Path Block
Data Path Block
Data Path Block
Data Path Block
Regular Network of Automata
Write a Comment
User Comments (0)
About PowerShow.com