Examples of One-Dimensional Systolic Arrays - PowerPoint PPT Presentation

About This Presentation

Title:

Examples of One-Dimensional Systolic Arrays

Description:

Using systolic array for polynomial evaluation. This pipelined array can produce a polynomial on new X value on every cycle - after 2n stages. ... – PowerPoint PPT presentation

Number of Views:204

Avg rating:3.0/5.0

Slides: 25

Provided by: MarekPe2

Learn more at: http://web.cecs.pdx.edu

Category:

more less

Transcript and Presenter's Notes

Title: Examples of One-Dimensional Systolic Arrays

1
Examples of One-Dimensional Systolic Arrays
2
Motivation Introduction

We need a high-performance , special-purpose
computer
system to meet specific application.
I/O and computation imbalance is a notable
problem.
The concept of Systolic architecture can map
high-level
computation into hardware structures.
Systolic system works like an automobile
assembly line.
Systolic system is easy to implement because of
its
regularity and easy to reconfigure.
Systolic architecture can result in
cost-effective , high-
performance special-purpose systems for a wide
range
of problems.

3
Pipelined Computations
4
Pipelined Computations

Pipelined program divided into a series of tasks
that have to be completed one after the other.
Each task executed by a separate pipeline stage
Data streamed from stage to stage to form
computation

5
Pipelined Computations

Computation consists of data streaming through
pipeline stages
Execution Time Time to fill pipeline (P-1)
Time to run in steady state (N-P1)
Time to empty pipeline (P-1)

P of processors N of data items (assume P
lt N)
This slide must be explained in all detail. It is
very important
6
Pipelined Example Sieve of Eratosthenes

Goal is to take a list of integers greater than 1
and produce a list of primes
E.g. For input 2 3 4 5 6 7 8 9 10, output is
2 3 5 7
A pipelined approach
Processor P_i divides each input by the i-th
prime
If the input is divisible (and not equal to the
divisor), it is marked (with a negative sign) and
forwarded
If the input is not divisible, it is forwarded
Last processor only forwards unmarked (positive)
data primes

7
Sieve of Eratosthenes Pseudo-Code

Code for last processor
xrecv(data,P_(i-1))
If xgt0 then send(x,OUTPUT)

Code for processor Pi (and prime p_i)
xrecv(data,P_(i-1))
If (xgt0) then
If (p_i divides x and p_i x ) then
send(-x,P_(i1)
If (p_i does not divide x or p_i x) then
send(x, P_(i1))
Else
Send(x,P_(i1))

/
Processor P_i divides each input by the i-th prime
8
Programming Issues

Algorithm will take NP-1 to run where N is the
number of data items and P is the number of
processors.
Can also consider just the odd bnys or do some
initial part separately
In given implementation, number of processors
must store all primes which will appear in
sequence
Not a scalable approach
Can fix this by having each processor do the job
of multiple primes, i.e. mapping logical
processors in the pipeline to each physical
processor
What is the impact of this on performance?

processor does the job of three primes
9
Processors for such operation

In pipelined algorithm, flow of data moves
through processors in lockstep.
The design attempts to balance the work so that
there is no bottleneck at any processor
In mid-80s, processors were developed to support
in hardware this kind of parallel pipelined
computation
Two commercial products from Intel
Warp (1D array)
iWarp (components for 2D array)
Warp and iWarp were meant to operate
synchronously Wavefront Array Processor (S.Y.
Kung) was meant to operate asynchronously,
i.e. arrival of data would signal that it was
time to execute

10
Systolic Arrays
11
Example 1 pipelined polynomial evaluation
12
Example 1 pipelined polynomial evaluation

Polynomial Evaluation is done by using a Linear
array with 2D.
Expression
Y ((((anxan-1)xan-2)xan-3)xa1)x a0
Function of PEs in pairs
1. Multiply input by x
2. Pass result to right.
3. Add aj to result from left.
4. Pass result to right.

13
Example 1 polynomial evaluation
Y ((((anxan-1)xan-2)xan-3)xa1)x a0
Multiplying processor
X is broadcasted
Adding processor

Using systolic array for polynomial evaluation.
This pipelined array can produce a polynomial on
new X value on every cycle - after 2n stages.
Another variant you can also calculate various
polynomials on the same X.
This is an example of a deeply pipelined
computation-
The pipeline has 2n stages.

x
an-1
an-2
an
x
a0
x
x
.
X

X

X
X

14
For you to think about

Pipelined Graph Coloring
Pipelined Satisfiability
Pipelined sorting/absorbing
Pipelined decision function like Petrick
Function.
Pipelined multiplication.
Pipelined calculation of (A B) (C D) on
vectors A, B, C, D.

15
Example 2Matrix Vector Multiplication
16
Example 2Matrix Vector Multiplication

There are many ways to solve a matrix problems
using systolic arrays, some of the methods are
Triangular Array performing gaussian elimination
with neighbor pivoting.
Triangular Array performing orthogonal
triangularization.
Simple matrix multiplication methods are shown in
next slides.

17
Example 2Matrix Vector Multiplication

Matrix Vector Multiplication
Each cells function is
1. To multiply the top and bottom inputs.
2. Add the left input to the product just
obtained.
3. Output the final result to the right.
Each cell consists of an adder and a few
registers. (Booth Algorithm for mul).
Or, a cell can include a hardware multiplier.

18
Matrix Multiplication
Example 2Matrix Vector Multiplication

At time t0 the array receives 1, a, p, q, and r
( The other inputs are all zero).
At time t1, the array receive m, d, b, p, q, and
r .e.t.c
The results emerge after 5 steps.

Explain how to multiply the first row of the
matrix by the vector,
how data are shifted from left to right in the
architecture

To visualize how it works it is good to do a
snapshot animation
20
Systolic Algorithms and Architectures
21
Systolic Algorithms

Systolic arrays were built to support systolic
algorithms, a hot area of research in the early
80s
Systolic algorithms used pipelining through
various kinds of arrays to accomplish
computational goals
Some of the data streaming and applications were
very creative and quite complex
CMU a hotbed of systolic algorithm and array
research (especially H.T. Kung and his group)

22
Systolic Arrays from Intel

Warp and iWarp were examples of systolic arrays
Systolic means regular and rhythmic,
data was supposed to move through pipelined
computational units in a regular and rhythmic
fashion
Systolic arrays meant to be special-purpose
processors or co-processors.
They were very fine-grained
Processors implement a limited and very simple
computation, usually called cells
Communication is very fast, granularity meant to
be around one operation/communication!

23
Systolic Processors, versus Cellular Automata
versus Regular Networks of Automata
Data Path Block
Data Path Block
Data Path Block
Data Path Block
Systolic processor
Control Block
Control Block
Control Block
Control Block
Cellular Automaton
These slides are for one-dimensional only
24
Systolic Processors, versus Cellular Automata
versus Regular Networks of Automata
Control Block
Control Block
Control Block
Control Block
Data Path Block
Data Path Block
Data Path Block
Data Path Block
Regular Network of Automata

Write a Comment

User Comments (0)