vcc cc Compiler for VIRAM - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

vcc cc Compiler for VIRAM

Description:

In order to separate micro-architectural performance from the ability of the ... strip of a[], compare to 0, use vcompress to compress the strip, and store to b ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 11
Provided by: csBer
Category:

less

Transcript and Presenter's Notes

Title: vcc cc Compiler for VIRAM


1
vcc c/c Compiler for VIRAM
  • Sam Williams
  • CS 265
  • samw_at_cs.berkeley.edu

2
Topics
  • Introduction
  • Simulation Methodology
  • Vectorization
  • Speedup
  • Quality of codegen
  • Instruction usage

3
Introduction
  • vcc is the c/c compiler for VIRAM
  • It quickly became evident that many features
    havent been implemented, including
  • inlining
  • scheduling
  • loop unrolling
  • code motion

4
Simulation Methodology
  • In order to separate micro-architectural
    performance from the ability of the compiler to
    take full advantage of the ISA and find potential
    parallelism, I assumed
  • The processor is a single issue machine
  • No stalls will occur do to number of
  • functional units, or bandwidth
  • All instructions take a single cycle to execute
  • Thus vsim-isa simulator could be used.

5
Vectorization
  • The compiler was able to vectorize most of the
    loops.
  • Primary reason for failing data dependence
  • Additionally Function calls, non-existent vector
    version of library function
  • Some loops were skipped entirely since they
    didnt produce any results.
  • Some loops were conditionally vectorized
  • There were a couple of bugs in the benchmark,
    which initially skewed the results.

6
Speedup
7
Quality
  • It appears the compiler does not consistently
    take full advantage of auto-increments found in
    the ISA.
  • It also doesnt keep track of vl/mvl efficiently
  • This resulted in a great deal of unnecessary loop
    overhead in each strip-mined loop.
  • Furthermore, there were many instances where code
    motion out of the loop should have been applied.

8
ISA usage
  • Loops are primarily a single precision FP,
    however integer and vector processing
    instructions can be used effectively in
    calculating addresses.
  • Relatively few of the vector processing
    instructions were used.
  • About half of the flag processing instructions
    were used.
  • Only 4 of the 16 FP compare predicates were used
  • No surprise that saturating and the more complex
    integer arithmetic instructions were not used.

9
Examples loop 72 (21.1x)
for(i0 iltn i) if(ai gt 0) bj
ai j When compiled each strip
would load mvl elements of a, compare to 0,
generate an index to the grater than 0 elements,
use that in an indexed load of a, then store
that to b. What it should do is load strip of
a, compare to 0, use vcompress to compress the
strip, and store to b
10
Examples loop 100 (31.8x)
for(i0 iltn i) ai bi
ci/2 Here the compiler maintains the base
for c in a vector register, and uses a vdiv to
generate an indexing vector to load strips of
c, furthermore it then has to increment all
elements in the addressing register each
iteration. All thats needed is to break the
loop into even and odd parts, and use stride2
load for b, and unit stride load for c.
Write a Comment
User Comments (0)
About PowerShow.com