Unified Parallel C UPC and the Berkeley UPC Compiler - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Unified Parallel C UPC and the Berkeley UPC Compiler

Description:

Most parallel programs are written using either: Message passing with a SPMD model ... Uses light-weight multi-threading atop SPMD latency tolerant ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 12

Provided by: danb104

Category:

more less

Transcript and Presenter's Notes

Title: Unified Parallel C UPC and the Berkeley UPC Compiler

1
Unified Parallel C (UPC) and the Berkeley UPC
Compiler
Wei Chen the Berkeley UPC Group 3/11/07
2
Parallel Programming

Most parallel programs are written using either
Message passing with a SPMD model
Usually for scientific applications with
C/Fortran
Scales easily user controlled data layout
Hard to use send/receive matching, message
packing/unpacking
Shared memory with OpenMP/pthreads/Java
Usually for non-scientific applications
Easier to program direct reads and writes to
shared data
Hard to scale (mostly) limited to SMPs, no
concept of locality
PGAS an alternative hybrid model

3
Partitioned Global Address Space

PGAS model uses global address space abstraction
Shared memory is partitioned by processors
User controlled data layout (global pointers and
distributed arrays)
One-sided communication
Use RDMA support for reads/writes of shared
variables
Much faster than message passing for small/medium
size messages
Hybrid model works for both SMPs and clusters
Languages Titanium, Co-Array Fortran, UPC

X0
X1
XP
Shared
Global address space
ptr
ptr
ptr
Private
4
Unified Parallel C

A SPMD parallel extension of C
PGAS add shared qualifier to type system
Several kinds of shared array distributions
Fine-grained and bulk communication
Commercial compilers with Cray/HP/IBM
Open source compilers with Berkeley UPC

Vector Addition in UPC
define N 100THREADSshared int v1N, v2N,
sumN //cyclic layoutvoid main() for(int
i0 iltN i) if (MYTHREAD iTHREADS)
//SPMD sumiv1iv2i
5
Overview of the Berkeley UPC Compiler
Two Goals Portability and High-Performance
Lower UPC code into ISO C code
Translator
UPC Code
Shared Memory Management and pointer operations
Platform- independent
Translator Generated C Code
Berkeley UPC Runtime System
Network- independent
Compiler- independent
GASNet Communication System
Language- independent
Network Hardware
Uniform get/put interface for underlying networks
6
UPC to C Translator

Based on Open64
Extend with shared type
Reuse analysis framework
Add UPC specific optimizations
Portable translation
High level IR
Config file for platform dependent information
Reinclude library headers
Convert shared memory operations into runtime
calls

Preprocessed UPC Source
Parsing
WHIRL with shared types
Optimizer
Optimized WHIRL
Lowering
WHIRL with runtime calls
Lowering
WHIRL2C
Backend C compiler
ISO C code
7
Optimization framework

Combination of language/compiler/runtime support
Transparent to the user
Performance portable
Short term goal effective on different cluster
networks.
Long term goal code designed for SMP get good
performance on clusters

Optimize regular array accesses
Optimize irregular pointer accesses
Nonblocking bulk communication
p-gtx-gty
upc_memget(dst, src, size)
Aijk
Loop framework for message vectorization, strip
mining
PRE framework with split-phase access and
coalescing
Runtime framework for communication overlap
8
Application Performance LU Decomposition

UPC performance comparable to MPI/HPL(Linpack)
with lt ½ the code size
Uses light-weight multi-threading atop SPMD ?
latency tolerant
Highly adaptable to different problem and machine
sizes

9
Application Performance 3D FFT
MFLOPS / Proc
up is good

One-sided UPC approach sends more, smaller
messages
Same total volume of data, but send earlier and
more often
Aggressively overlaps the transpose with the 2nd
1-D FFT
Same approach is less effective in MPI due to
higher per-message cost
Consistently outperforms MPI-based
implementations by as much as 2X

10
Current Status

Public release v2.4 in November 2006
Fully compliant with UPC 1.2 specification
Communication optimizations
Extensions for performance and programmability
Support from laptops to supercomputers
OS UNIX (Linux, BSD, AIX, Solaris, etc), Mac,
Cygwin
Arch x86, Itanium, Opteron, Alpha, PPC, SPARC,
Cray X1, NEC SX-6, Blue Gene, etc.
Network SMP, Myrinet, Quadrics, Infiniband, IBM
LAPI, MPI, Ethernet, SHMEM, etc.
Give us a try at http//upc.lbl.gov

11
Summary