Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping

About This Presentation

Title:

Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping

Description:

Electrical Engineering and Computer Science ... Electrical Engineering and Computer Science. Use scalar ISA to represent SIMD operations ... – PowerPoint PPT presentation

Number of Views:146

Avg rating:3.0/5.0

Slides: 22

Provided by: Kevin4

Category:

more less

Transcript and Presenter's Notes

Title: Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping

1
Liquid SIMD Abstracting SIMD Hardware Using
Lightweight Dynamic Mapping

Nathan Clark, Amir Hormati, Scott Mahlke,
Sami Yehia, Krisztián Flautner
University of Michigan ARM
Ltd.

2
Computational Efficiency

Low power envelope
More useful work/transistors
Hardware accelerators
Niagara II encryption engine

Source AMD Analyst Day 12/14/06
3
How Are Accelerators Used?

Control statically placed in binary

4
Problem With Static Control
CPU

Not forward/backward compatible

5
Solution Virtualization

Statically identify accelerated computation
Abstract accelerator features
Dynamically retarget binary

6
Liquid SIMD

Virtualize SIMD accelerators
Why virtualize SIMD?
Intel MMX to SSE2
ARM v6 to Neon
Wide vectors useful Lin 06

7
SIMD Accelerator Assumptions
SIMD Exec
Fetch
Decode
Retire
Scalar Exec

Same instruction stream
Separate pipeline memory interface

8
How to Virtualize

Use scalar ISA to represent SIMD operations
Compatibility, low overhead
Key easy to translate

Program
Branch
9
Virtualization Architecture
10
1. Data Parallel Operations
for(i 0 i lt 8 i) r1 Ai r2
Bi r3 r1 r2 r4 r3 constant
Ci r4
C
11
1a. What If Theres No Scalar Equivalent?
for(i 0 i lt 8 i) r1 Ai r2
Bi r3 r1 r2 cmp r3, FF r3
movgt FF ...
Idioms can always be constructed
12
2. Scalarizing Permutations
for(i 0 i lt 8 i) r1 r2 r3
tmpi r1 for(i 0 i lt 8 i) r1
offseti r2 tmpr1 i r3 r2
const
offset 4, 4, 4, 4, -4, -4, -4, -4
offset 4, 4, 4, 4, -4, -4, -4, -4
offset 4, 4, 4, 4, -4, -4, -4, -4
13
3. Scalarizing Reductions
for(i 0 i lt 8 i) r1 Ai r2
r2 r1
14
Applied to ARM Neon

All instructions supported except
VTBL indirect indexing
v1 vtbl v2, v3
Interleaved memory accesses
Not needed in evaluated benchmarks

15
Translation to SIMD
for(i 0 i lt 8 i) r1 Ai r2
Bi r3 r1 r2 r4 offseti Ci
r4 r3
for(i 0 i lt 8 i 4) v1 Ai v2
Bi v3 v1 v2 v4 v3 constant
for(i 0 i lt 8 i 4) v1 Ai v2
Bi v3 v1 v2 v4
i 4
for(i 0 i lt 8 i 4) v1 Ai v2
Bi v3 v1 v2 v4 offseti
for(i 0 i lt 8 i 4) v1 Ai v2
Bi v3 v1 v2 v3 shuffle v3
Ci v3

Update induction variable
Use inverse of defined translation rules

16
Translator Design

Translator efficiency, speed, flexibility

17
Evaluation

Trimaran ARM
Hand SIMDized loops
SimpleScalar model ARM926 w/ Neon SIMD
VHDL translator, 130nm std. cell

18
Liquid SIMD Issues

Code bloat
lt1 overhead beyond baseline
Register pressure
Not a problem
Translator cost
0.2 mm2 2KB cache
Translation overhead

19
Translation Overhead
MediaBench
Kernels
SPECfp
20
Summary

Accelerators are more common and evolving
Costly binary migration
SIMD virtualization using scalar ISA
One binary forward/backward compatibility
Negligible overhead

Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping - PowerPoint PPT Presentation

Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping

Electrical Engineering and Computer Science ... Electrical Engineering and Computer Science. Use scalar ISA to represent SIMD operations ... – PowerPoint PPT presentation