Compiling for VIRAM - PowerPoint PPT Presentation

About This Presentation
Title:

Compiling for VIRAM

Description:

Variable vector register length (MVL) ... VPs given by the Vector Length register vl. Width of each VP ... Maximum vector length is not specified in IRAM ISA. ... – PowerPoint PPT presentation

Number of Views:262
Avg rating:3.0/5.0
Slides: 24
Provided by: yel3
Category:

less

Transcript and Presenter's Notes

Title: Compiling for VIRAM


1
Compiling for VIRAM
  • Dave Judd
  • Kathy Yelick
  • Rich Fromm
  • David Martin
  • Computer Science Division
  • UC Berkeley

2
VIRAM/Cray Compiler vcc
  • VIRAM/Cray vectorizing compiler
  • Production compiler
  • Used on the T90, C90, as well as the T3D and T3E
  • Being ported (by SGI/Cray) to the SV2
    architecture
  • Has C, C, and Fortran front-ends (focus on C)
  • Extensive vectorization capability
  • VIRAM code generator based on new SV2 code
    generator
  • SV2 code gen being developed in parallel w/ VIRAM
  • SV2 vector architecture similar to VIRAM
  • SV2 scalar similar to T90 with more registers

3
VIRAM/Cray Compiler Status
Optimizer
Frontends
Code Generators
C
T3D/T3E
C
PDGCS
C90/T90
Fortran95
SV2/VIRAM
  • MIPS backend developed by last retreat
  • Compiled code for a commercial test suite
  • VIRAM vector support developed since last retreat
  • Compiles executes a commercial test suite
  • Compiles and executes a Cray vector test suite
  • Remaining issues with scheduler and memory
    consistency model

4
Codegen/optimizer issues for VIRAM
  • Variable virtual processor width (VPW)
  • Variable vector register length (MVL)
  • Vector flag registers treated as 1 bit wide
    vector register
  • Multiple base, incr, stride regs. autoincrement
  • Fixed point arithmetic (saturating add, etc.)
  • Memory Consistency

5
Vector Architectural State
  • Number of VPs given by the Vector Length register
    vl
  • Width of each VP given by the register vpw
  • vpw is one of 8b,16b,32b,64b
  • Maximum vector length is given by a read-only
    register mvl
  • mvl depends on implementation and vpw
    NA,128,64,32 in VIRAM-1

6
Generating Code for Variable VPW
  • Strategy vectorizer determines minimum correct
    vpw for each loop nest
  • Vectorizer assumes vpw64 initially
  • At end of vectorization, discard vectorized copy
    of loop if greatest width encountered is less
    than 64 and start vectorization over with new
    vpw.
  • Code gen checks vpw for each loop nest.
  • Limitation a single loop nest will run at the
    speed of the widest type.
  • Reason simplicity performance of the common
    case
  • No attempt to split/combine loops based on vpw

7
Generating Code for Variable MVL
  • Maximum vector length is not specified in IRAM
    ISA.
  • However, compiler assumes mvl at compile time
  • mvl based on vpw
  • mvl assumption dependent on VIRAM-1 hardware
    implementation
  • Recompiling required for future hardware versions
    if mvl changes
  • MVL knowledge useful for code gen and vectorizer
  • register spilling
  • short loop vectorization
  • length-dependent vectorization ( and may
    eliminate safe vector length computation at run
    time)
  • for (i 0 i lt n i)
  • ai ai32

8
Vector Flag Registers
  • Vector flag (mask) register treated as vector
    register
  • Bit width of 1
  • Flag registyer under control of vector length
  • Can spill/reload directly to memory
  • Optimizer and code gen issues to handle correctly

9
Multiple Base, Incr, Stride Registers
  • Dedicated registers for vector memory references
  • 16 vbase, 8 vinc and 8 vstride registers
  • optional automatic increment of base register
  • Vectorizer/Codegen strategy
  • Changed from computing base address each time
    thru loop to incrementing base address by vl
    stridemultiplier
  • Define compiler temporary for each base address
  • Teach codegen to assign vbase, vinc and vstride
    registers as needed.
  • Trick code gen into handling multiple results for
    single vector load/store instruction.
  • Results in very clean vector loops with only
    vector instructions in inner loop vl
    computation.

10
C Compiler Testing
  • vector regression test suite (CRAY)
  • Specifically tests for vectorization
  • Compares vector and scalar results
  • Easy to isolate problems
  • Status
  • 56 of 62 tests pass
  • Some minor numerical differences
  • 3 failures w/ wrong answers
  • 1 failure causes assembler abort on bad
    instruction (caused by vinc autoincrement
    feature)

11
C Compiler Testing (cont.)
  • C regression test suite (industry standard suite)
  • Scalar emphasis, C conformance
  • All tests pass except
  • errors with functions returning a structure
    larger than 16 bytes
  • errors with long double constants (128 bit
    floating point)

12
What Essential Features Remain
  • Finish instruction scheduler
  • Implement sync strategy
  • Support -n32 ABI
  • Double / long double

13
Instruction Scheduler
  • Instruction scheduler working, but needs
  • Functional unit layout for VIRAM
  • Instruction latency and busy time for VIRAM
  • Support for chaining of vector instructions,
    including mask registers
  • Scheduler responsible for sync processing
  • vector - vector sync analysis is working
  • vector - scalar scalar - vector analysis needed
  • synch instr. currently comments in assembler
    output

14
Chaining
vadd.s vr1,vr3,vr4 vabs.s vr2,vr1 With
chaining 0 1 2 3 4 5 6 7 . . . V10 R X
X X X W V11 R X X X X W V12 R X X X X
W V13 R X X X X W V20 R X
X W V21 R X X W V22
R X X W V23 R X X W
15
Convert from -64 to -n32 ABI
  • Pointer and long type revert to 32 bits from 64
  • VIRAM tools were n32 originally
  • Switched to -64 to accommodate compiler, to match
    sv2
  • Revert to n32 to match vector addressing hardware
  • Change will allow some gather/scatter loops to
    execute faster (vpw32 instead of vpw64)

16
C
  • All components being generated now
  • Include files and libC library differences? SGI
    / Cray
  • C testing on SV2 simulator now
  • Testing/ problem isotation needed

17
Fortran
  • Fortran 95 frontend
  • FCD (fortran character descriptor) code gen
    support needed for VIRAM
  • Differences between IRIX and UNICOS libraries for
    I/O and array intrinsics
  • Testing needed

18
Other Future Compiler Features ?
  • VIRAM machine target
  • Support for speculative execution
  • Support Cray additions
  • peephole optimizer
  • vector loop unrolling/ tiling
  • Compiler extensions for fixed point hardware

19
Vector functions
  • Define calling sequence conventions
  • Must be coded in assembler
  • Take advantage of C versions of Cray routines
  • Needed for some key benchmarks?

20
Memory consistency
  • Sync instructions

SaV VaS VaV vp RaW WaR WaW
21
VIRAM Tools
  • vas assembler
  • vdis disassembler
  • vsim-isa simulator
  • vsim-db debugger
  • vsim-p performance simulator
  • vsim-syncmemory consistency simulator

22
vsim-sync
  • Intended for debugging and optimizing syncs
  • Tells you when there is a data hazard (sync
    needed)
  • Tells you when a sync executed that didnt
    prevent a hazard
  • sync may not be needed
  • according to dynamic execution
  • sync may be needed on some other execution path

23
Summary
  • vcc is a reasonably robust compiler for VIRAM
  • All of the basic compiler elements are present
    now
  • Need to prioritize remaining work
  • Finish and tune scheduler
  • Implement sync strategy
  • C
  • Fortran
  • IRAM target
Write a Comment
User Comments (0)
About PowerShow.com