Design and Implementation of the CCC Parallel Programming Language - PowerPoint PPT Presentation

About This Presentation
Title:

Design and Implementation of the CCC Parallel Programming Language

Description:

of the CCC Parallel Programming Language Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 44
Provided by: Nai106
Category:

less

Transcript and Presenter's Notes

Title: Design and Implementation of the CCC Parallel Programming Language


1
Design and Implementationof the CCC Parallel
Programming Language
  • Nai-Wei Lin
  • Department of Computer Science and Information
    Engineering
  • National Chung Cheng University

2
Outline
  • Introduction
  • The CCC programming language
  • The CCC compiler
  • Performance evaluation
  • Conclusions

3
Motivations
  • Parallelism is the future trend
  • Programming in parallel is much more difficult
    than programming in serial
  • Parallel architectures are very diverse
  • Parallel programming models are very diverse

4
Motivations
  • Design a parallel programming language that
    uniformly integrates various parallel programming
    models
  • Implement a retargetable compiler for this
    parallel programming language on various parallel
    architectures

5
Approaches to Parallelism
  • Library approach
  • MPI (Message Passing Interface), pthread
  • Compiler approach
  • HPF (High Performance Fortran), HPC
  • Language approach
  • Occam, Linda, CCC (Chung Cheng C)

6
Models of Parallel Architectures
  • Control Model
  • SIMD Single Instruction Multiple Data
  • MIMD Multiple Instruction Multiple Data
  • Data Model
  • Shared-memory
  • Distributed-memory

7
Models of Parallel Programming
  • Concurrency
  • Control parallelism simultaneously execute
    multiple threads of control
  • Data parallelism simultaneously execute the same
    operations on multiple data
  • Synchronization and communication
  • Shared variables
  • Message passing

8
Granularity of Parallelism
  • Procedure-level parallelism
  • Concurrent execution of procedures on multiple
    processors
  • Loop-level parallelism
  • Concurrent execution of iterations of loops on
    multiple processors
  • Instruction-level parallelism
  • Concurrent execution of instructions on a single
    processor with multiple functional units

9
The CCC Programming Language
  • CCC is a simple extension of C and supports both
    control and data parallelism
  • A CCC program consists of a set of concurrent and
    cooperative tasks
  • Control parallelism runs in MIMD mode and
    communicates via shared variables and/or message
    passing
  • Data parallelism runs in SIMD mode and
    communicates via shared variables

10
Tasks in CCC Programs
11
Control Parallelism
  • Concurrency
  • task
  • par and parfor
  • Synchronization and communication
  • shared variables monitors
  • message passing channels

12
Monitors
  • The monitor construct is a modular and efficient
    construct for synchronizing shared variables
    among concurrent tasks
  • It provides data abstraction, mutual exclusion,
    and conditional synchronization

13
An Example - Barber Shop
Barber
Chair
14
An Example - Barber Shop
taskmain( ) monitor Barber_Shop bs
int i par barber( bs )
parfor (i 0 i lt 10 i)
customer( bs )
15
An Example - Barber Shop
taskbarber(monitor Barber_Shop in bs)
while ( 1 ) bs.get_next_customer( )
bs.finished_cut( ) taskcustomer(m
onitor Barber_Shop in bs) bs.get_haircut(
)
16
An Example - Barber Shop
monitor Barber_Shop int barber, chair,
open cond barber_available, chair_occupied
cond door_open, customer_left
Barber_Shop( ) void get_haircut( ) void
get_next_customer( ) void finished_cut( )
17
An Example - Barber Shop
Barber_Shop( ) barber 0 chair 0 open
0 void get_haircut( ) while (barber
0) wait(barber_available) barber ? 1
chair 1 signal(chair_occupied) while
(open 0) wait(door_open) open ? 1
signal(customer_left)
18
An Example - Barber Shop
void get_next_customer( ) barber 1
signal(barber_available) while (chair 0)
wait(chair_occupied) chair ? 1 void
get_haircut( ) open 1
signal(door_open) while (open gt 0)
wait(customer_left)
19
Channels
  • The channel construct is a modular and efficient
    construct for message passing among concurrent
    tasks
  • Pipe one to one
  • Merger many to one
  • Spliter one to many
  • Multiplexer many to many

20
Channels
  • Communication structures among parallel tasks are
    more comprehensive
  • The specification of communication structures is
    easier
  • The implementation of communication structures is
    more efficient
  • The static analysis of communication structures
    is more effective

21
An Example - Consumer-Producer
consumer
producer
consumer
spliter
consumer
22
An Example - Consumer-Producer
taskmain( ) spliter int chan int
i par producer( chan )
parfor (i 0 i lt 10 i)
consumer( chan )
23
An Example - Consumer-Producer
taskproducer(spliter in int chan) int
i for (i 0 i lt 100 i)
put(chan, i) for (i 0 i lt 10 i)
put(chan, END)
24
An Example - Consumer-Producer
taskconsumer(spliter in int chan) int
data while ((data get(chan)) ! END)
process(data)
25
Data Parallelism
  • Concurrency
  • domain an aggregate of synchronous tasks
  • Synchronization and communication
  • domain variables in global name space

26
An Example Matrix Multiplication
?

27
An Example Matrix Multiplication
domain matrix_op16 int a16, b16,
c16 multiply(distribute in int
16block16, distribute in
int 1616block, distribute
out int 16block16)
28
An Example Matrix Multiplication
taskmain( ) int A1616, B1616,
C1616 domain matrix_op m
read_array(A) read_array(B)
m.multiply(A, B, C) print_array(C)
29
An Example Matrix Multiplication
matrix_opmultiply(A, B, C) distribute in int
16block16 A distribute in int
1616block B distribute out int
16block16 C int i, j a A b
B for (i 0 i lt 16 i) for
(ci 0, j 0 j lt 16 j) ci
aj matrix_opi.bj C c
30
Platforms for the CCC Compiler
  • PCs and SMPs
  • Pthread shared memory dynamic thread creation
  • PC clusters and SMP clusters
  • Millipede distributed shared memory dynamic
    remote thread creation
  • The similarities between these two classes of
    machines enable a retargetable compiler
    implementation for CCC

31
Organization of the CCC Programming System
CCC applications
CCC compiler
CCC runtime library
Virtual shared memory machine interface
Pthread
Millipede
SMP
SMP cluster
32
The CCC Compiler
  • Tasks ? threads
  • Monitors ? mutex locks, read-write locks, and
    condition variables
  • Channels ? mutex locks and condition variables
  • Domains ? set of synchronous threads
  • Synchronous execution ? barriers

33
Virtual Shared Memory Machine Interface
  • Processor management
  • Thread management
  • Shared memory allocation
  • Mutex locks
  • Read-write locks
  • Condition variables
  • Barriers

34
The CCC Runtime Library
  • The CCC runtime library contains a collection of
    functions that implements the salient
    abstractions of CCC on top of the virtual shared
    memory machine interface

35
Performance Evaluation
  • SMPs
  • Hardwarean SMP machine with four CPUs, each CPU
    is an Intel PentiumII Xeon 450MHz, and cache is
    512K
  • SoftwareOS is Solaris 5.7 and library is pthread
    1.26
  • SMP clusters
  • Hardwarefour SMP machines, each of which has
    two CPUs, each CPU is Intel PentiumIII 500MHz,
    and cache is 512K
  • SoftwareOS is windows 2000 and library is
    millipede 4.0
  • NetworkFast ethernet network 100Mbps

36
Benchmarks
  • Matrix multiplication (1024 x 1024)
  • Warshalls transitive closure (1024 x 1024)
  • Airshed simulation (5)

37
Matrix Multiplication (SMPs)
64.44 (4.46, 1.11)
59.44 (4.83, 1.20)
(unit sec)
38
Matrix Multiplication (SMP clusters)
(unit sec)
39
Warshalls Transitive Closure (SMPs)
(unit sec)
40
Warshalls Transitive Closure (SMP clusters)
(unit sec)
41
Airshed simulation (SMPs)
Seq 5\5\5 1\5\5 5\1\5 5\5\1 1\1\5 1\5\1 5\1\1
CCC(2cpu) 14.2 8.68 (1.6,0.8) 8.84 (1.6,0.8) 10.52 (1.3,0.6) 12.87 (1.1,0.5) 10.75 (1.3,0.6) 13.2 (1.1,0.5) 14.85 (0.9,0.4)
Pthread (2cpu) 14.2 8.63 (1.6,0.8) 8.82 (1.6,0.8) 10.42 (1.3,0.6) 12.84 (1.1,0.5) 10.72 (1.3,0.6) 13.19 (1.1,0.5) 14.82 (0.9,0.4)
CCC(4cpu) 14.2 6.49 (2.1,0.5) 6.84 (2.1,0.5) 9.03 (1.5,0.3) 12.08 (1.1,0.2) 9.41 (1.5,0.3) 12.46 (1.1,0.2) 14.66 (0.9,0.2)
Pthread (4cpu) 14.2 6.37 (2.2,0.5) 6.81 (2.1,0.5) 9.02 (1.5,0.3) 12.07 (1.1,0.2) 9.38 (1.5,0.3) 12.44 (1.1,0.2) 14.62 (0.9,0.2)
threads
(unit sec)
42
Airshed simulation (SMP clusters)
threads
Seq 5\5\5 1\5\5 5\1\5 5\5\1 1\1\5 1\5\1 5\1\1
CCC (1m x 2p) 49.7 26.13 (1.9,0.9) 26.75 (1.8,0.9) 30.37 (1.6,0.8) 44.25 (1.1,0.5) 31.97 (1.5,0.7) 45.25 (1.1,0.5) 48.51 (1.1,0.5)
Millipede (1m x 2p) 49.9 20.02 (2.4,1.2) 20.87 (2.3,1.1) 26.05 (1.9,0.9) 30.41 (1.6,0.8) 26.42 (1.8,0.9) 31.13 (1.5,0.7) 35.89 (1.3,0.6)
CCC (2m x 2p) 49.9 26.41 (1.8,0.4) 27.51 (1.8,0.4) 50.42 (0.9,0.2) 56.68 (0.8,0.2) 54.76 (0.9,0.2) 58.25 (0.8,0.2) 91.17 (0.5,0.1)
Millipede (2m x 2p) 49.9 19.98 (2.4,0.6) 21.84 (2.2,0.5) 31.33 (1.5,0.4) 39.31 (1.2,0.3) 30.85 (1.6,0.4) 42.13 (1.1,0.2) 36.38 (1.3,0.3)
CCC (4m x 2p) 49.9 23.09 (2.1,0.2) 25.59 (1.9,0.2) 48.97 (1.0,0.1) 58.31 (0.8,0.1) 53.33 (0.9,0.1) 61.96 (0.8,0.1) 89.61 (0.5,0.1)
Millipede (4m x 2p) 49.9 16.72 (2.9,0.3) 17.61 (2.8,0.3) 35.11 (1.4,0.2) 41.03 (1.2,0.1) 33.95 (1.4,0.2) 40.88 (1.2,0.1) 36.07 (1.3,0.1)
(unit sec)
43
Conclusions
  • A high-level parallel programming language that
    uniformly integrates
  • Both control and data parallelism
  • Both shared variables and message passing
  • A modular parallel programming language
  • A retargetable compiler
Write a Comment
User Comments (0)
About PowerShow.com