Title: Berkeley UPC
1Berkeley UPC
Kathy Yelick Christian Bell, Dan Bonachea, Wei
Chen, Jason Duell, Paul Hargrove, Parry Husbands,
Costin Iancu, Rajesh Nishtala, Mike Welcome LBNL
and U.C. Berkeley http//upc.lbl.gov
2Berkeley UPC Compiler Status
- Recent Berkeley UPC release (v2.2)
- Support 1.2 language spec
- Supports collectives (tuning ongoing) memory
model compliance - Supports UPC I/O (naïve reference implementation)
- Compiler work
- Optimization phase and improved performance in
v2.2 - Work on automated communication overlap,
upc_forall, - Large effort in quality assurance and robustness
- Test suite 600 tests run nightly on 20
platform configs - gt30,000 UPC compilations and gt20,000 UPC test
runs per night - Test suite infrastructure extended to support any
UPC compiler - now running nightly with GCC/UPC UPCR
- also has been used on HP-UPC, Cray UPC
3Berkeley UPC Collaborations
- GCC UPC on Berkeley UPC Runtime
- Use for cluster (GASNet) implementations
- Now works with pthread runtime
- Source-level debugging with Totalview 7.x
- Joint project with Etnus
- General framework for source-to-source
translators - Future work
- Cray XT3 and other Rainier/Adams port
- Possible BlueGene/L port
- XT3 and BG/L both run on MPI conduit
4Berkeley Applications Benchmarks
- Some new applications
- FT .45 TFlops on 512 proc Itanium/Quadrics
(Elan4) - CG 30 GFlops on 512 HP Alpha/Quadrics (Elan3)
- LU gt2 TFlops on 512 proc Itanium/Quadrics
(Elan4) - Barnes-Hut fine-grained (based on Splash)
- CFG uses to Chombo
- More on LU
- Towards a Sparse direct solver (SuperLU)
- Currently a full (top500-compliant) HPL
implementation - All UPC except for call to the BLAS
5End of Berkeley Status
6Data Movement and Synchronization
7Motivation for Data Movement Synchronization
- Some are (at best) hard/slow in UPC
- Benchmarks highlight these
- FT communication-limited, all-to-all want to
overlap - MG fill in ghost regions
- Remote writes are often faster than remote reads
- But need to synchronize let the other proc know
data is available - See Tarek and John Mellor-Crummeys PPoPP05 paper
- Signaling store in Split-C
- Implementation issue reordering
- LU remotely enqueue a task
- GUPS and Histogram remotely increment/XOR a
value - With or without atomicity
8Who Would Like to Talk?
- Non-Blocking Memget/put (Dan)
- Semaphores (Dan)
- Semaphore example (Tarek)
- Remote Atomics (Phil)
- Floating functions (Jason)