Hiding Relaxed Memory Consistency with Compilers - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Hiding Relaxed Memory Consistency with Compilers

Description:

... is what the average programmer expects from shared memory programming ... The compiler provides programmers with a sequentially consistent view of the ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 30
Provided by: EECS
Category:

less

Transcript and Presenter's Notes

Title: Hiding Relaxed Memory Consistency with Compilers


1
Hiding Relaxed Memory Consistency with Compilers
2
Motivation
  • Sequential consistency is what the average
    programmer expects from shared memory programming
    even when they don't know what the memory
    consistency model is.
  • Many shared memory multiprocessors can reorder
    reads and writes, and provide various hardware
    level optimizations to take advantage of the
    relaxed memory consistency model.
  • Lee and Padua believe that a compiler can allow
    the programmer to have a sequential consistent
    view of the program while still profiting from
    the performance advantage of the underlying
    system.

3
Introduction
  • Programmers are presented with a sequentially
    consistent view of the program.
  • Fence instructions are inserted to enforce
    sequential consistency.
  • Optimizations such as reordering of memory
    operations can be done between two consecutive
    fence instructions.

4
Goals
  • The compiler provides programmers with a
    sequentially consistent view of the underlying
    architecture irrespective of the fact that the
    hardware follows a sequentially consistent model
    or a relaxed model.
  • The compiler makes it possible to apply compiler
    optimization techniques correctly to parallel
    programs that are not handled by conventional
    compilers.

5
Sequential Consistency
  • A multiprocessor system is sequential consistent
    if the result of the execution of any program is
    the same as if all operations were executed in
    some global sequential order, and the operations
    of each parallel component appear in program
    order.
  • Only the result is important. Operations can be
    reordered as long as the result appears to be
    sequential consistent.

6
Relaxed Memory Consistency Models
  • Reduces restrictions on overlapping and
    reordering of memory operations in order to
    improve performance
  • Memory access operations to different memory
    locations can be reordered if reordering does not
    violate ordering constraints of the memory model.

7
Weak Ordering
  • Classifies memory operations into two categories
    data and synchronization operations
  • Data operations between two consecutive
    synchronization operations can be reordered

8
Ordering Constraints
R read operation W write operation S synchron
ization operation
S
R
W
S
R
S
W
S
S
S
9
Delay Set Analysis
  • Finds a minimal set of execution orderings that
    guarantees sequential consistency
  • Delay relation D between two operations u and v
    forces v to wait until u completes execution.

10
Two Types of Delays
  • Delays found by delay set analysis,
  • Delays enforced by the ordering constraints,

11
Preserving Sequential Consistency Using Delays
  • Suppose there are three delays, uDv, vDw, and
    uDw, and v dominates w, it is not necessary to
    enforce uDw since it is implied by uDv and vDw.
  • In general, for a given D, it is sufficient to
    enforce the transitive reduction of D.
  • A transitive reduction of D is denoted

12
Minimal Delay Relation
  • Only the delays in the minimal delay relation
    need to be implemented with special instructions
    such as fences and synchronization operations.

13
Fence
  • Imposes ordering between memory operations
  • When a fence instruction is executed by a
    processor, all previous memory operations of the
    processor are guaranteed to have completed.
  • Memory operations that follow the fence
    instruction in the program are not issued until
    the fence completes execution.

14
Exploiting the Property of Fence and
Synchronization Instructions
  • Fences are inserted at a node of the control flow
    graph to enforce one or more delays.
  • A naïve algorithm may insert more fences than
    needed to enforce sequential consistency.
  • Property of fence and synchronization
    instructions can be used to reduce the number of
    fences.

15
Inserting Fence Instructions
16
Memory-Barrier Nodes
  • If a node needs a fence instruction or to be
    identified as a synchronization operation, we
    call the node a memory-barrier node.
  • The goal is to find a memory-barrier node that
    enforces as many delays as possible.

17
Memory-Barrier Node Illustration
  • There are two delays that need to be enforced
    nDw and uDu.
  • Operation u in the current iteration needs to
    complete before it can be issued again in the
    next iteration.
  • Fence F is enough to enforce both delays.

18
Dominators with respect to a Node
  • A node s dominates a node v with respect to a
    node u if every flow path from u to v goes
    through s.
  • The classical dominators of a node v are the
    dominators with respect to the program entry node.

19
Dominator Illustration
  • Node b is not a classical dominator of node c.
  • Node b dominates node c with respect to node a.

program entry
a
b
c
20
Dominator Computation
  • Let be the set of nodes that
    dominates n with respect to u
  • If a node n is unreachable from u, then n has an
    empty set of dominators with respect to u.
  • Uses the iterative algorithm for classical
    dominators to find dominators with respect to a
    node u by treating u as the program entry node

21
Data Flow Equation
  • U is the set of nodes that are unreachable from
    u.
  • At the beginning, each set for a
    node is initialized with N, the set of
    all nodes, and the set is
    initialized with .

22
After Fixed Point
  • Update of with null
  • Update with

23
Minimizing the Number of Memory-Barrier Nodes
  • The goal is to minimize the number of
    memory-barrier nodes executed in a program.
  • Reducing the total number of memory-barrier nodes
    is one approximation to this goal.

24
Complexity
  • Minimizing the number of memory-barrier nodes by
    using dominators with respect to a node is
    NP-hard.
  • Paper gives a proof that the decision version of
    the problem is NP-complete.
  • Proof gives a reduction from vertex cover to min
    nodes.

25
Approximation Algorithm
  • Naïve algorithm is exponential
  • Check each subset of the nodes in
    to determine whether the nodes in the
    subset enforces all the delays .
  • So it uses an approximation algorithm that is a
    slight modification of the greedy approximation
    algorithm for the optimization version of the
    minimum cover problem.

26
Profitability
  • The goal is to hide the memory latency by
    overlapping or reordering memory operations to
    different locations.
  • For uDv, we would like to insert a fence as close
    to v as possible to maximize the opportunities
    for reordering and overlapping.
  • It is more desirable to insert a fence at a
    memory-barrier node that is located in a less
    frequently executed path.

27
Profitability Illustration
28
Related Work
  • Hill 1998 makes the argument that future
    multiprocessors should implement sequential
    consistency as their hardware memory consistency
    model.
  • Provides data that shows the performance
    improvement of relaxed memory models over
    sequential consistency is about 20 for a set of
    scientific benchmarks
  • Predicts the gap will shrink in the future due to
    speculative execution in modern processors.

29
Conclusion
  • The tradeoff between sequential consistency
    memory model and relaxed memory model is between
    ease of programming and performance.
  • With hardware that implements a relaxed memory
    model, either the application programmer or the
    compiler writer need to deal with the complexity
    of the relaxed model.
  • This paper presents a compiler that deals with
    this complexity and presents a sequentially
    consistent view for the application programmer.
Write a Comment
User Comments (0)
About PowerShow.com