Alias Speculation using Atomic Regions - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Alias Speculation using Atomic Regions

Description:

Title: Speculative Shared-Memory Architectures Subject: Parallel Computing Author: Jos Fernando Mart nez Last modified by: Wonsun Ahn Created Date – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 26
Provided by: JosFe2
Category:

less

Transcript and Presenter's Notes

Title: Alias Speculation using Atomic Regions


1
Alias Speculation using Atomic Regions
  • (To appear at ASPLOS 2013)
  • Wonsun Ahn, Yuelu Duan, Josep Torrellas
  • University of Illinois at Urbana Champaign

2
Disclaimer
  • This talk is not about parallelism.
  • This talk is about decreasing the amount of work
    that needs to be done through better code
    generation.
  • We want to do this by making the
    software-hardware barrier more porous.

Assumptions
Compiler
Hardware
Information
3
What prevents good code generation?
  • Many popular optimizations require code motion
  • Loop Invariant Code Motion (LICM) From the body
    to the preheader of a loop
  • Redundancy elimination From the location of the
    redundant computation to the first computation
  • Memory aliasing prevents code motion

r1 a b c a b
r1 a b r2 a b c r2
r1 a b r2 a b c r2
r1 a b r2 r1 c r2
r1 a b p c a b
r1 a b p r2 a b c r2
r1 a b r2 a b p c r2
4
Alias Analysis is Difficult
  • Alias analysis returns one of three results
  • Must-Alias, No-Alias, May-Alias
  • Accurate static analysis is fundamentally
    difficult
  • Requires points-to analysis, heap modeling etc.
  • Quickly becomes intractable in space/time
    complexity
  • Alternative insert runtime checks
  • Software checks
  • Hardware checks (e.g. Itanium ALAT, Transmeta)
  • We propose to leverage atomic regions to do
    runtime checks and automatic recovery

5
Background Atomic Regions (aka Transactions)
  • Sections of code demarcated in software that are
    either committed atomically on success or rolled
    back on failure
  • Atomic regions are here and now
  • Intel TSX, AMD ASF, IBM Bluegene/Q, IBM Power
  • Originally to ease parallel programming but
    again thats not what the talk is about today
  • Does two things well that software finds
    difficult
  • Checkpointing to guarantee atomic commit of
    transaction
  • Exposed to software through begin atomic, end
    atomic
  • Memory alias detection to guarantee isolation of
    transaction
  • Hidden from software

6
Proposal Leverage Atomic Regions for Alias
Speculation
  • Expose alias checking HW to SW through ISA
    extensions
  • Use HW support for Atomic Regions to perform
    alias speculation in a compiler for optimizations
  • Cover path of code motion in an Atomic Region
  • Speculate may-aliases in code motion path are
    no-aliases
  • Check speculated aliases using alias checking HW
  • Recover from failure by rolling back to
    checkpoint
  • Apply this to optimizations such as
  • Loop Invariant Code Motion (LICM)
  • Partial Redundancy Elimination (PRE)
  • Global Value Numbering (GVN)

7
Modifications to Atomic Regions
  • Key insight
  • Atomic regions maintain a read set and a write
    set
  • Speculative Read (SR), Speculative Written (SW)
    bits in speculative cache
  • Only SW bits are needed for checkpointing
  • Repurpose SR bits to mark certain load locations
    for monitoring alias speculation failures
  • Do not mark SR bits for regular loads
  • Add ISA extensions to manipulate and check SR and
    SW bits to do alias checks

8
Extensions to the ISA(for Checkpointing)
already supported
  • begin_atomic PC / end_atomic / abort_atomic
  • Starts / ends / aborts atomic region
  • PC is the address of the Safe-Version of atomic
    region
  • atomic region code without speculative
    optimizations
  • abort_atomic jumps to Safe-Version after rollback

9
Extensions to the ISA(for Alias Checking)
newly added
  • load.add.sr r1, addr
  • Loads location addr to r1 just like a regular
    load
  • Marks SR bit in cache line containing addr
  • Used for marking monitored loads
  • clear.sr addr
  • Clears SR bit in cache line containing addr
  • Used to mark end of load monitoring
  • store.chk.(sr / sw / srsw) addr, r1
  • Stores r1 to location addr just like a regular
    store
  • sr If SR bit is set, atomic region is aborted
  • sw If SW bit is set, atomic region is aborted

10
How are these Instructions Used?
  • Instrumentation goals
  • Minimize alias checking instruction overhead
  • Allow alias checks on a subset of accesses in AR
  • A single AR can enable multiple optimizations
  • Each code motion involves only a subset of
    accesses
  • Two cases of code motion that involve alias
    checks
  • Moving (hoisting) loads
  • Moving (sinking) stores

11
Code Motion 1 Hoisting Loads
begin_atomic load.add.sr a store.chk.sr x store
y end_atomic
begin_atomic store x load a store y end_atomic
begin_atomic load.add.sr a store.chk.sr
x clear.sr a store y end_atomic
clear.sr a
  • Assume a may-alias with x and y
  • Hoist load a above store x and setup monitoring
    of a
  • store.chk.sr x will rollback AR on alias check
    failure
  • Sink clear.sr a to end of AR (if possible)
  • store y will not trigger rollback on alias with a
  • Now clear.sr a can be removed
  • Can selectively check against stores in path of
    code motion
  • (Often) no instruction overhead for checking

12
Code Motion 2 Sinking Stores
begin_atomic store a load x store y end_atomic
begin_atomic load.add.sr x store
y store.chk.srsw a end_atomic
  • Assume a may-alias with x and y
  • Sink store a below load x and store y
  • Alias with x is checked when SR bits are checked
    in store.chk.srsw a
  • Alias with y is checked when SW bits are checked
    in store.chk.srsw a
  • Can selectively check only loads in path of code
    motion
  • Must check against all previous stores in atomic
    region
  • Because SW bits cannot be set selectively

13
Illustrative Example LICM and GVN
// a,b may alias with p,q,s. // p,q,s may
alias with each // other. for(i0 i lt 100 i)
a b 10 p q 20 s q 20
// PC points to the original loop begin_atomic
PC for(i0 i lt 100 i) a b 10 p
q 20 s q 20 end_atomic
  • Put atomic region around loop
  • Perform optimizations after inserting appropriate
    checks

14
Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2 begin_atomic PC ld.add.sr r1, b r2 r1
10 for(i0 i lt 100 i) store a, r2
store.chk.sr p, q 20 store s, q
20 clear.sr b end_atomic
  • Put atomic region around loop
  • Perform optimizations after inserting appropriate
    checks
  • Hoist b 10 (LICM)

15
Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2, r3 begin_atomic PC ld.add.sr r1, b r2
r1 10 for(i0 i lt 100 i) store a, r2
ld.add.sr r3, q r4 r3 20 store.chk.sr
p, r4 clear.sr q store s, r4 clear.sr
b end_atomic
  • Put atomic region around loop
  • Perform optimizations after inserting appropriate
    checks
  • Hoist b 10 (LICM)
  • Eliminate 2nd q 20 (GVN)

16
Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2, r3 begin_atomic PC ld.add.sr r1, b r2
r1 10 for(i0 i lt 100 i) store a, r2
ld.add.sr r3, q r4 r3 20 store.chk.sr
p, r4 store s, r4 clear.sr q clear.sr
b end_atomic
  • Put atomic region around loop
  • Perform optimizations after inserting appropriate
    checks
  • Hoist b 10 (LICM)
  • Eliminate second c i (GVN)
  • Sink clear.sr q

17
Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2, r3 begin_atomic PC ld.add.sr r1, b r2
r1 10 for(i0 i lt 100 i) ld.add.sr r3,
q r4 r3 20 store.chk.sr p, r4 store
s, r4 store.chk.srsw a, r2 clear.sr
q clear.sr b end_atomic
  • Put atomic region around loop
  • Perform optimizations after inserting appropriate
    checks
  • Hoist b 10 (LICM)
  • Eliminate second c i (GVN)
  • Sink clear.sr q
  • Sink a r1 (LICM)

Checked needlessly but is fine since it does
not alias with a
18
Where should we Place Atomic Regions?
  • We chose to focus on loops
  • Where most of the execution time is spent
  • Loops provide ample range for opts such as LICM
    or PRE to perform large scale redundancy
    elimination
  • Can amortize cost of atomic region
    instrumentation over multiple iterations for a
    given optimization
  • When loops can potentially overflow speculation
    resources, loops are blocked into nested
    sub-loops appropriately

19
Memory Consistency Issues
  • In a multiprocessor system, disabling conflict
    checks on speculative read lines can change
    access ordering
  • Stores commit out of order at the end of an
    atomic region even when loads read values from
    remote processors
  • Conventionally, this causes a rollback
  • Not a problem in reality
  • Compiler code motion cause access re-orderings
    anyway.
  • If it is legal for the compiler to re-order, it
    is legal for HW
  • If it was illegal for the compiler to re-order
    (e.g. due to synchronization), the atomic region
    would not be placed there

20
Compiler Toolchain
  1. Run loop blocking pass that uses loop footprint
    estimation
  2. Run application instrumented with alias check
    instructions to profile how many Atomic Region
    aborts a particular speculation would have
    caused.
  3. Run Atomic Region instrumentation pass for loops
    that would benefit according to a cost-benefit
    model and the abort profile information.
  4. Run modified optimization passes (e.g. LICM, PRE,
    GVN) that perform the code movements deemed
    beneficial by the cost-benefit model. Insert
    appropriate alias checks.

21
Experimental Setup
  • Compare three environments using LICM and GVN/PRE
    optimizations
  • BaselineAA
  • Unmodified LLVM-2.8 using basic alias analysis
  • Default alias analysis used by O3 optimization
  • DSAA
  • Unmodified LLVM-2.8 using data structure alias
    analysis
  • Experimental alias analysis with high time/space
    complexity
  • LAS
  • Modified LLVM-2.8 using loop-based alias
    speculation
  • Applications
  • SPEC INT2006, SPEC FP2006
  • Simulation
  • SESC with Pin-based front end with Atomic Region
    support
  • 32KB 8-way associative speculative L1 cache w/
    64B lines

22
Alias Analysis Results
  • Breakdown of alias analysis results when run with
    LICM pass
  • LAS is able to convert almost all may-aliases to
    no-aliases using profile information

23
Speedups
  • Speedups normalized to BaselineAA

24
Atomic Region Characterization
  • Low L1 cache occupancy due to not buffering
    speculatively read lines
  • Overhead amortized over large atomic region

25
Summary
  • Proposed exposing HW Atomic Region alias checking
    primitive to SW using ISA extensions
  • Proposed loop-based Atomic Region instrumentation
  • To maximize speculation opportunity
  • To minimize instrumentation overhead
  • Proposed an alias speculation framework
    leveraging Atomic Regions and evaluated using
    LICM and GVN/PRE
  • May-alias results 56 ? 4 SPECINT2006, 43 ? 1
    SPECFP2006
  • Speedup 3 for SPECINT2006, 9 for SPECFP2006
Write a Comment
User Comments (0)
About PowerShow.com