ViewOriented Parallel Programming for multicore systems

1 / 28
About This Presentation
Title:

ViewOriented Parallel Programming for multicore systems

Description:

An age of CMT. CMT offers us the power of parallel computing ... Performance evaluation on CMT. A view-based debugger for VOPP. Questions? ... –

Number of Views:20
Avg rating:3.0/5.0
Slides: 29
Provided by: brm6
Category:

less

Transcript and Presenter's Notes

Title: ViewOriented Parallel Programming for multicore systems


1
View-Oriented Parallel Programming for multi-core
systems
  • Dr Zhiyi Huang
  • World 45
  • Univ of Otago

2
An age of CMT
  • CMT offers us the power of parallel computing
  • To harness the power relies on good parallel
    applications and competent parallel programmers
  • Sound parallel programming methodology is the key

3
Two camps
  • Message passing vs. shared memory
  • Message passing style is complex
  • Communication with shared memory is simple and
    easy, but

4
Problems for SM-based PP (1)
  • Data race condition is a pain
  • Data race there are concurrent accesses to the
    same memory location, and at least one of them is
    write access
  • To debug a data race condition is difficult since
    a parallel execution is normally not repeatable

5
Problems for SM-based PP (2)
  • Deadlock is another pain
  • Mutual exclusive primitives such as locks are
    required to prevent data races, but
  • it may result in deadlock, a situation where
    multiple threads/processes wait for each other
    due to competing for locks
  • Mutual exclusion has complicated the mental model
    of parallel programming

6
Problems for SM-based PP (3)
  • Poor portability is yet another pain
  • Parallel applications are system dependent
  • Mutual exclusive primitives such as lock are not
    standardized
  • Synchronization primitives such as barrier are
    not standardized
  • Shared memory allocation is not standardized

7
Solutions?
  • A parallel programming style with the following
    features
  • Data race free
  • Mutual exclusion free
  • Deadlock free
  • Portable to any systems with shared memory

8
View-Oriented Parallel Programming
9
What is a view?
  • Suppose M is the set of data objects in shared
    memory
  • A view is a group of data objects from the shared
    memory
  • ? V, V?M
  • Views must not overlap each other
  • ? Vi, Vj, i ? j, Vi ? Vj ?
  • Suppose there are n views in shared memory
  • ? ViM

10
VOPP Requirements
  • The programmer should divide the shared data into
    a number of views according to the data flow of
    the parallel algorithm.
  • A view should consist of data objects that are
    always processed as an atomic set in a program.
  • Views can be created and destroyed anytime.
  • Each view has a unique view identifier

11
VOPP Requirements (cont.)
  • View primitives such as acquire_view and
    release_view must be used when a view is
    accessed.
  • acquire_view(View_A)
  • A A 1
  • release_view(View_A)
  • acquire_Rview and release_Rview can be used when
    a view is only read by a processor.

12
VOPP Requirements (cont.)
  • When a process/thread accesses multiple views at
    the same time, only one acquiring primitive is
    used.
  • acquire_3_views(V_A, V_B, V_C)
  • C A B
  • release_views()

13
Example
  • A VOPP program for a producer/consumer problem

If(prod_id 0) acquire_view(1)
produce(x) release_view(1) barrier(0) a
cquire_Rview(1) consume(x) release_Rview(1)
14
VOPP features
  • No concern of data race condition
  • The programmer is only concerned about views, not
    mutual exclusion
  • Mutual exclusion is implemented by the system
    which detects potential data races as well by
    checking view boundaries
  • Deadlock free
  • Mutual exclusion is implemented by the system and
    can be implemented data race free and deadlock
    free
  • Portability?
  • By standardization of API

15
Requirements for the system
  • Keep track of view locations
  • Capable to check view boundaries
  • Guarantee deadlock free when implementing mutual
    exclusion

16
Advantages of VOPP
  • Keep the convenience of shared memory programming
  • Focus on data partitioning and data access
    instead of data race and mutual exclusion
  • View primitives automatically achieve mutual
    exclusion
  • View primitives are not extra burden
  • The programmer can finely tune the parallel
    algorithm by careful view partitioning

17
Advantages of VOPP (cont.)
  • Implementation independent
  • View access can be based on mutual exclusion or
    Transactional Memory (TM)
  • TM is a memory system that checks access
    conflicts
  • Programming language independent
  • Can be implemented as a user space library
  • Performance advantage
  • Cache pre-fetching when a view is acquired
  • Can cache a view until the view is not acquired
    by any other threads/processes

18
Philosophy of VOPP
  • Shared memory is a critical resource that needs
    to be used with care
  • If there is no need to use shared memory, dont
    use it
  • Justification is wanted before a view is created
  • Compatible with Throughput Computing which
    encourages multiple independent threads running
    in a chip

19
VOPP vs. MPI
  • Easier for programmers than MPI
  • For problems like task queue, programming with
    MPI is horrific.
  • Can mimic any finely-tuned MPI program
  • Shared message ? view
  • Send/recv ? acquire_view
  • Essential differences
  • View is location transparent
  • More barriers in VOPP

20
Implementation
  • VOPP is supported by our DSM system called VODCA
  • DSM Distributed Shared Memory system provides a
    virtual shared memory on multi-computers
  • VODCA View-Oriented, Distributed, Cluster-based
    Approach to parallel computing
  • VODCA version 1.0
  • Will be released as an open source software
  • A library run at the user space
  • Its implementation will be published on DSM06

21
Experiment
  • Use a cluster computer
  • The cluster computer, in Tsinghua Univ., consists
    of 128 Itanium 2 running Linux 2.4, connected by
    InfiniBand. Each node has two 1.3 GHz processors
    and 4 Gbytes RAM. We run two processes on each
    node.
  • We used four applications, Integer Sort (IS),
    Gauss, Successive Over-Relaxation (SOR), and
    Neural Network (NN).

22
Related systems
  • TreadMarks (TMK) is a state-of-the-art
    Distributed Shared Memory system based on
    traditional parallel programming.
  • Message Passing Interface (MPI) is a standard for
    message passing-based parallel programming. We
    used LAM/MPI.

23
Performance of NN
24
Performance of IS
25
Performance of SOR
26
Performance of Gauss
27
Future work on VOPP
  • API for multi-core systems
  • Implementation on Niagara
  • More benchmarks/applications, especially
    telecommunication applications
  • Performance evaluation on CMT
  • A view-based debugger for VOPP

28
Questions?
Write a Comment
User Comments (0)
About PowerShow.com