Effective Automatic Parallelization of Stencil Computations* - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Effective Automatic Parallelization of Stencil Computations*

Description:

Effective Automatic Parallelization of Stencil Computations* Sriram Krishnamoorthy1 Muthu Baskaran1, Uday Bondhugula1, Atanas Rountev1, J. Ramanujam2, P. Sadayappan1 – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 18
Provided by: IBMU402
Category:

less

Transcript and Presenter's Notes

Title: Effective Automatic Parallelization of Stencil Computations*


1
Effective Automatic Parallelization of Stencil
Computations
  • Sriram Krishnamoorthy1
  • Muthu Baskaran1, Uday Bondhugula1, Atanas
    Rountev1,
  • J. Ramanujam2, P. Sadayappan1
  • 1The Ohio State University
  • 2Lousiana State University

Work supported by NSF
2
Introduction
  • Stencil computations
  • Sweep through large data set
  • Multiple time iterations
  • Simple load balanced schedule
  • Tiling essential to improve data locality
  • Dependences between tiles
  • Pipelined execution
  • Skewed iteration spaces load imbalance
  • Solution Adjust tiling re-enable concurrent
    execution

3
Motivation
FOR t 0 TO T-1 FOR i 1 TO N-1
At,i(At,i-1At,iAt,i1)/3
t
i
4
Notation
  • Iteration space B n-dim polyhedron
  • Dependences D n-dim vectors
  • Hyperplanes H
  • n-dim normal vectors
  • Tile bounded by pairs of hyperplanes

5
Approach
  • Concurrent start in non-tiled iteration space
  • Identify hyperplanes inhibiting concurrent start
    in tiled space
  • Replace one face for each inhibiting pair
  • Overlapped Tiling Replace back-face
  • Split Tiling Replace front-face

6
Concurrent Start Before Tiling
Condition A boundary that does not carry any
dependence
7
Inter-tile Dependences
  • Shift vectors
  • Tile traversal order
  • Normal to all other hyperplanes
  • Hyperplane carries dependence
  • A dependence pokes through
  • Inter-tile dependence vector
  • Shift vector
  • Corresponding hyperplane carries dependence

8
Concurrent Start Inhibition
  • Concurrent start in original iteration space
    along a boundary
  • But that boundary carries an inter-tile dependence

A boundary has concurrent start
S_j is an inter-tile dependence
That boundary carries Inter-tile dependence
9
Companion Hyperplane
  • Hyperplane that destroys the inter-tile
    dependence
  • Swivel a hyperplane backward
  • Dependences carried by original hyperplane are
    neutralized
  • Incoming dependences become non-incoming
  • Outgoing dependences become non-outgoing

10
Overlapped Tiling
  • Replace back face with companion hyperplane
  • Additional region is shared with preceding tile
  • Region of preceding tile that caused the
    dependence
  • Each new tile independent of preceding tile
    (do-all parallelism)
  • Increased computation cost communication volume

11
Split Tiling
  • Replace front face with companion hyperplane
  • Tile split into independent and dependent regions
  • Execute independent region followed by dependent
    region
  • Increased communications

12
Experimental Evaluation
  • Cluster
  • 2.8 GHz dual-processor Opteron 254
  • 1MB L2 cache 4GB RAM
  • Linux 2.6.9 Intel compiler (icc) O3
  • Comparison
  • Two pipelined schedules along space and time
  • 1000 time steps
  • 1 32 processors

13
Pipelined Execution Parameters
64000 elements 32 processors
Space tile size 1000 Time tile size 16
14
Performance with Problem Size
15
Weak Scaling
  • Problem size procs 20000
  • Horizontal line Linear Scaling

16
Conclusion
  • Time tiling stencils crucial for data locality
  • Might inhibit concurrent execution
  • Presented Two approaches to enabling concurrent
    execution
  • Ongoing work Modeling relative benefits of the
    two approaches

17
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com