StreamIt%20on%20Raw - PowerPoint PPT Presentation

About This Presentation
Title:

StreamIt%20on%20Raw

Description:

For Raw, we want the number of filters in the stream graph to equal the number of tiles. The final stream graph needs to be load balanced. ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 23
Provided by: Michael2207
Category:
Tags: 20raw | 20on | streamit | stream

less

Transcript and Presenter's Notes

Title: StreamIt%20on%20Raw


1
StreamIt on Raw
  • StreamIt Group
  • Michael Gordon, William Thies, Michal Karczmarek,
    David Maze, Jasper Lin, Jeremy Wong, Andrew Lamb,
    Ali S. Meli, Chris Leger, Sam Larsen, and Saman
    Amarasinghe

MIT Laboratory for Computer Science
MIT Computer Architecture Workshop September 19,
2002
2
Von Neumann Languages
  • Why C (FORTRAN, C etc.) became very successful?
  • Abstracted out the differences of von Neumann
    machines
  • Register set structure
  • Functional units and capabilities
  • Pipeline depth/width
  • Memory/cache organization
  • Directly expose the common properties
  • Single memory image
  • Single control-flow
  • A clear notion of time
  • Can have a very efficient mapping to a von
    Neumann machine
  • Today von Neumann languages are a curse!

3
StreamIt A Spatially-Aware Language
  • A language for streaming applications
  • Provides high-level stream abstraction
  • A filter is the autonomous unit of computation.
  • Breaks the von Neumann language barrier
  • Each filter has its own PC
  • Each filter has its own address space
  • No global time
  • Explicit data movement between filters

4
The Filter
  • A filter communicates using FIFO channels, with
    the following operations
  • pop() dequeue the bottom item from the incoming
    channel.
  • peek(index) return the value at position index
  • without dequeuing it.
  • push(value) enqueue value on the outgoing
    channel.
  • The pop, peek, and push rate for each firing of a
    filter must be statically determined.
  • Each filters contains
  • An initialization function
  • A steady-state work function

5
StreamIt Language
  • A collection of filters connected by channels.
  • Structured Streams
  • Streaming applications have structure, not a
    free-form graph.
  • Use a few constructs pipeline, splitjoin and
    feedback
  • Hierarchical composition
  • Intuitive textual representation
  • Greatly simplify compiler analysis

6
Hierarchical Structures
  • pipeline
  • Sequential composition of streams
  • splitjoin
  • Parallel composition of streams
  • feedback loop
  • Cyclic composition of streams

7
Compiler Flow Summary
8
Partitioning
  • Goal Granularity of the stream graph should
    match the target architecture.
  • For Raw, we want the number of filters in the
    stream graph to equal the number of tiles.
  • The final stream graph needs to be load balanced.
  • Partitioning is currently driven by a simple
    greedy algorithm.
  • Two primary transformations
  • Fission
  • Fusion

?
9
Partitioning - Fission
  • Fission - splitting streams
  • Duplicate a filter, placing the duplicates in a
    splitjoin to expose parallelism.

Splitter

Filter
Filter
Filter
Joiner
  • Split a filter into a pipeline for load
    balancing.


Filter
Filter0
Filter1
FilterN
10
Partitioning - Fusion
  • Fusion - merging streams
  • Reduce the number of filters in a construct for
    load balancing and synchronization removal.

11
Partitioning Example (Sort)
242 Filters
16 Filters
12
Layout
  • Goal To assign each filter to exactly one Raw
    tile.
  • The layout algorithm is implemented using
    Simulated Annealing.
  • The cost function (energy) tries to measure the
    added synchronization imposed by the layout.
  • Want to avoid
  • Crossed routes
  • Routes passing through tiles assigned to filters
  • Because of the static properties of StreamIt,
    exact communication properties of the stream
    graph are known at compile time.
  • Cost function is quite accurate
  • Leads to excellent layouts

13
Layout Example (FFT)
Partitioned Stream Graph
Zero-cost layout
14
Layout Example (Radio)
Partitioned Stream Graph
Best layout
15
Routing
  • At this time, data items are routed using a
    simple dimension-ordered router.
  • The router traces the path from source to
    destination by first routing the Y dimension and
    then the X dimension.
  • All items are sent over the first static network.
  • The second static network and the dynamic network
    are unused.

16
Communication Scheduling
  • The communication scheduler maps StreamIts
    channel abstraction to Raws static network.
  • The communication scheduler simulates the
    execution of a given schedule, recording the
    communication as it simulates.
  • Assume that each filter fires instantaneously.
  • Record the routing instruction for the source,
    destination, and intermediate hops.

17
Code Generation
  • For the compute-processor, we generate C code
    that is compiled using Raw's GCC port.
  • We introduce an internal buffer for each filter.
  • The buffer is necessary because of the peek
    operation.
  • All items are received into this buffer.
  • Loop work function infinitely in steady-state
  • Each filter buffers its input until it has peek
    items in its buffer, then it fires.
  • pop() and peek(index) are reads from the buffer.
  • A push(value) is a static network send.

18
Results
  • We have detailed performance measurements over
    our 9 benchmarks in our upcoming ASPLOS paper,
    but we will not give them here.
  • This is our initial implementation and we are
    working on optimizations.
  • But the results show that we are not
    communication limited.
  • We need to focus on optimizing the generated
    compute-processor code.
  • In the following slides we give a comparison of
    StreamIt and C code for our benchmarks.

19
Speedup Over Single Tile
  • For Radio we obtained the C implementation from a
    3rd party
  • For FIR, Sort, FFT, Filterbank, and 3GPP we wrote
    the C implementation following a reference
    algorithm.

20
Intel XeonTM Comparison
37
16
Sequential C program on 1 tile
14
StreamIt program on 16 tiles
12
10
Throughput / cycle
8
normalized to a Xeon _at_ 2.2GHz
6
4
2
0
FIR
Radar
Radio
Sort
FFT
Filterbank
GSM
Vocoder
3GPP
  • For Radio, GSM, and Vocoder we obtained the C
    implementation from a 3rd party
  • For FIR, Sort, FFT, Filterbank, Radar, and 3GPP
    we wrote the C implementation following a
    reference algorithm.
  • For Radar, GSM, and Vocoder the C implementation
    did not fit on a single Raw tile.

21
Conclusion
  • First step toward a portable stream language for
    communication-exposed architectures.
  • Future work
  • Optimizing the implementation
  • Support more features of StreamIt
  • Other cool StreamIt projects
  • New syntax
  • DSP domain specific linear dataflow analysis and
    transformation.
  • Constrained scheduling

22
For More Information
StreamIt Homepage
http//cag.lcs.mit.edu/streamit
  • William Thies, Michal Karczmarek, and Saman
    Amarasinghe, StreamIt A Language for Streaming
    Applications, 2002 International Conference on
    Compiler Construction, Grenoble, France. To
    appear in the Springer-Verlag Lecture Notes on
    Computer Science.
  • Michael I. Gordon, William Thies, et. al., A
    Stream Compiler for Communication-Exposed
    Architectures, Proceedings of the Tenth
    International Conference on Architectural Support
    for Programming Languages and Operating Systems,
    San Jose, CA, October, 2002.
  • Michael I. Gordon. A Stream-Aware Compiler for
    Communication-Exposed Architectures. S.M. Thesis,
    Massachusetts Institute of Technology, August
    2002.
Write a Comment
User Comments (0)
About PowerShow.com