Compiling for VIRAM - PowerPoint PPT Presentation

About This Presentation

Title:

Compiling for VIRAM

Description:

Variable vector register length (MVL) ... VPs given by the Vector Length register vl. Width of each VP ... Maximum vector length is not specified in IRAM ISA. ... – PowerPoint PPT presentation

Number of Views:262

Avg rating:3.0/5.0

Slides: 24

Provided by: yel3

Learn more at: http://iram.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Compiling for VIRAM

1
Compiling for VIRAM

Dave Judd
Kathy Yelick
Rich Fromm
David Martin
Computer Science Division
UC Berkeley

2
VIRAM/Cray Compiler vcc

VIRAM/Cray vectorizing compiler
Production compiler
Used on the T90, C90, as well as the T3D and T3E
Being ported (by SGI/Cray) to the SV2
architecture
Has C, C, and Fortran front-ends (focus on C)
Extensive vectorization capability
VIRAM code generator based on new SV2 code
generator
SV2 code gen being developed in parallel w/ VIRAM
SV2 vector architecture similar to VIRAM
SV2 scalar similar to T90 with more registers

3
VIRAM/Cray Compiler Status
Optimizer
Frontends
Code Generators
C
T3D/T3E
C
PDGCS
C90/T90
Fortran95
SV2/VIRAM

MIPS backend developed by last retreat
Compiled code for a commercial test suite
VIRAM vector support developed since last retreat
Compiles executes a commercial test suite
Compiles and executes a Cray vector test suite
Remaining issues with scheduler and memory
consistency model

4
Codegen/optimizer issues for VIRAM

Variable virtual processor width (VPW)
Variable vector register length (MVL)
Vector flag registers treated as 1 bit wide
vector register
Multiple base, incr, stride regs. autoincrement
Fixed point arithmetic (saturating add, etc.)
Memory Consistency

5
Vector Architectural State

Number of VPs given by the Vector Length register
vl
Width of each VP given by the register vpw
vpw is one of 8b,16b,32b,64b
Maximum vector length is given by a read-only
register mvl
mvl depends on implementation and vpw
NA,128,64,32 in VIRAM-1

6
Generating Code for Variable VPW

Strategy vectorizer determines minimum correct
vpw for each loop nest
Vectorizer assumes vpw64 initially
At end of vectorization, discard vectorized copy
of loop if greatest width encountered is less
than 64 and start vectorization over with new
vpw.
Code gen checks vpw for each loop nest.
Limitation a single loop nest will run at the
speed of the widest type.
Reason simplicity performance of the common
case
No attempt to split/combine loops based on vpw

7
Generating Code for Variable MVL

Maximum vector length is not specified in IRAM
ISA.
However, compiler assumes mvl at compile time
mvl based on vpw
mvl assumption dependent on VIRAM-1 hardware
implementation
Recompiling required for future hardware versions
if mvl changes
MVL knowledge useful for code gen and vectorizer
register spilling
short loop vectorization
length-dependent vectorization ( and may
eliminate safe vector length computation at run
time)
for (i 0 i lt n i)
ai ai32

8
Vector Flag Registers

Vector flag (mask) register treated as vector
register
Bit width of 1
Flag registyer under control of vector length
Can spill/reload directly to memory
Optimizer and code gen issues to handle correctly

9
Multiple Base, Incr, Stride Registers

Dedicated registers for vector memory references
16 vbase, 8 vinc and 8 vstride registers
optional automatic increment of base register
Vectorizer/Codegen strategy
Changed from computing base address each time
thru loop to incrementing base address by vl
stridemultiplier
Define compiler temporary for each base address
Teach codegen to assign vbase, vinc and vstride
registers as needed.
Trick code gen into handling multiple results for
single vector load/store instruction.
Results in very clean vector loops with only
vector instructions in inner loop vl
computation.

10
C Compiler Testing

vector regression test suite (CRAY)
Specifically tests for vectorization
Compares vector and scalar results
Easy to isolate problems
Status
56 of 62 tests pass
Some minor numerical differences
3 failures w/ wrong answers
1 failure causes assembler abort on bad
instruction (caused by vinc autoincrement
feature)

11
C Compiler Testing (cont.)

C regression test suite (industry standard suite)
Scalar emphasis, C conformance
All tests pass except
errors with functions returning a structure
larger than 16 bytes
errors with long double constants (128 bit
floating point)

12
What Essential Features Remain

Finish instruction scheduler
Implement sync strategy
Support -n32 ABI
Double / long double

13
Instruction Scheduler

Instruction scheduler working, but needs
Functional unit layout for VIRAM
Instruction latency and busy time for VIRAM
Support for chaining of vector instructions,
including mask registers
Scheduler responsible for sync processing
vector - vector sync analysis is working
vector - scalar scalar - vector analysis needed
synch instr. currently comments in assembler
output

14
Chaining
vadd.s vr1,vr3,vr4 vabs.s vr2,vr1 With
chaining 0 1 2 3 4 5 6 7 . . . V10 R X
X X X W V11 R X X X X W V12 R X X X X
W V13 R X X X X W V20 R X
X W V21 R X X W V22
R X X W V23 R X X W
15
Convert from -64 to -n32 ABI

Pointer and long type revert to 32 bits from 64
VIRAM tools were n32 originally
Switched to -64 to accommodate compiler, to match
sv2
Revert to n32 to match vector addressing hardware
Change will allow some gather/scatter loops to
execute faster (vpw32 instead of vpw64)

16
C