Hardware/Compiler Co-Design and Compiler Optimizations

About This Presentation

Title:

Hardware/Compiler Co-Design and Compiler Optimizations

Description:

Complete and flexible support of inner-loop scheduling (SWP), instruction ... Unimodular (e.g. permutation?) Loop unrolling? Both? Others ? Objective: performace ? ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 26

Provided by: Intr1

Learn more at: https://www.capsl.udel.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hardware/Compiler Co-Design and Compiler Optimizations

1
Topic 8
Optimization for Parallel Computation
2
Reading List

Slides Topic 8x
Other readings as assigned in class or homework

3
Outline

Basic Concepts
Parallelism
Locality
Loop Nest Optimization
Summary

4
Parallelism

What is Parallelism ?
Parallelism in Computer Architecture
Instruction-Level Parallelism (ILP)
Thread-Level Parallelism (TLP)
Parallelism in Programs/Applications
Statement Level Parallelism
Loop Level Parallelism
Task Level Parallelism

5
General Compiler Framework
Source

Good IPO
Good LNO
Good global optimization
Good integration of IPO/LNO/OPT
Smooth information passing between FE and CG
Complete and flexible support of inner-loop
scheduling (SWP), instruction scheduling and
register allocation

Inter-Procedural Optimization (IPA)
Loop Nest Optimization (LNO)
Global Optimization (OPT)
ME
Global inst scheduling
Innermost Loop scheduling
Arch Models
Reg alloc
Local inst scheduling
BE/CG
Executable
6
A Multiprocessor Architecture

A generic modern multiprocessor

Node processor(s), memory system, plus
communication assist
Network interface and communication controller
Scalable network

7
Locality

Temporal Locality
the same data is used several times within a
short time period
Spatial Locality
when different data elements that are located
near to each other are used within a short period
of time

8
Loop Nest Tansformation and Optimization

Simple Loop Transformation
Unimodular Loop Transformations
Beyond Unimodular Transformations
Combining Loop Transformation
Summary

9
Simple Loop Transformation

Loop unrolling
Loop peeling
...

10
Unimodular Loop Transformation

Loop interchange
Loop reversal
Loop skewing

11
Loop Interchange
Why we wish to perform loop interchange ?
12
Safety of Loop Interchange
DO J 1, M DO I 1, N A(I, J1)
A(I1, J) B ENDDO ENDDO
Is it legal to do interchange of I, J?
13
Legality of Loop Interchange
DO J 1, M DO I 1, N A(I, J1)
A(I1, J)) B ENDDO ENDDO
Note Interchange here is Illegal!
14
Loop Reversal An Example
15
Loop Reversal An Example (Contd)
Interchange
DO J M, 1, -1 DO I A(I1, J)
A(I, J1)) B ENDDO ENDDO
16
Skewing - An Example
17
Skewing - An Example
(Contd)
DO j 2, NN DO I max(1,
j-n), min(N, j-1) AI, j-1
AI-1, j-1 AI, j-I-1 END
END
18
Disadvantage of Loop Skewing

Recompute loop bounds
Loop bounds changes
average vector length changes.

19
Unimodular Transformations

Motivation
Easy to represent compound transformations
Elegant formulation of objective functions under
compound loop transformations

20
Beyond Unimodular Loop Trasformation

Loop Strip-Mining
Loop Tiling
Loop Fusion
Loop Fission

21
Advanced Topics Toward A Framework of
Combining Loop Transformations
22
An Example

Assume a multi-issue architecture with resource
constraints to be considered
caches,
registers,
instruction scheduling
Question What loop transformations to apply and
in what order?
Unimodular (e.g. permutation?)
Loop unrolling?
Both?
Others ?

Subroutine nest (a, b, c)
Real8 a(1000)
Real8 b(1000, 1000), c(1000)
Do j 1, 1000
DO i 1, 1000
a(j) a(j) b(j, i) c(j)
END DO
END DO
end

23
Motivating Example
Contd

Subroutine nest (a, b, c)
Real8 a(1000)
Real8 b(1000, 1000), c(1000)
Do j 1, 1000
DO i 1, 1000
a(j) a(j) b(j, i) c(j)
END DO
END DO
end

Do i 1, 1000, 4 DO j 1, 1000
a(j) a(j) b(j, i) c(j) a(j) a(j)
b(j, i1) c(j) a(j) a(j) b(j, i2)
c(j) a(j) a(j) b(j, i3) c(j) END
DO END DO end
Question Is the above A good combination
? Loop interchange Outer loop unrolling
Inner loop fusion Why do this ? (cache effect
? of loads/stores ? Reg. Alloc ?)
24
What We Need?

Hardware/Compiler Co-Design and Compiler Optimizations - PowerPoint PPT Presentation

Hardware/Compiler Co-Design and Compiler Optimizations

Complete and flexible support of inner-loop scheduling (SWP), instruction ... Unimodular (e.g. permutation?) Loop unrolling? Both? Others ? Objective: performace ? ... – PowerPoint PPT presentation