Title: CS 267: Applications of Parallel Computers Final Project Suggestions
1CS 267 Applications of Parallel
ComputersFinal Project Suggestions
- James Demmel
- www.cs.berkeley.edu/demmel/cs267_Spr06
2Outline
- Kinds of projects
- Evaluating and improving the performance of a
parallel application - Application could be full scientific
application, or important kernel - Parallelizing a sequential application
- other kinds of performance improvements possible
too, eg memory hierarchy tuning - Devise a new parallel algorithm for some problem
- Porting parallel application or systems software
to new architecture - Example of previous projects (all on-line)
- Upcoming guest lecturers
- See their previous lectures, or contact them, for
project ideas - Suggested projects
3CS267 Class Projects from 2004
- BLAST Implementation on BEE2 Chen Chang
- PFLAMELET An Unsteady Flamelet Solver for
Parallel Computers Fabrizio Bisetti - Parallel Pattern Matcher Frank Gennari, Shariq
Rizvi, and Guille Díez-Cañas - Parallel Simulation in Metropolis Guang Yang
- A Survey of Performance Optimizations for
Titanium Immersed Boundary Simulation Hormozd
Gahvari, Omair Kamil, Benjamin Lee, Meling Ngo,
and Armando Solar - Parallelization of oopd1 Jeff Hammel
- Optimization and Evaluation of a Titanium
Adaptive Mesh Refinement Code Amir Kamil, Ben
Schwarz, and Jimmy Su
4CS267 Class Projects from 2004 (cont)
- Communication Savings With Ghost Cell Expansion
For Domain Decompositions Of Finite Difference
Grids C. Zambrana Rojas and Mark Hoemmen - Parallelization of Phylogenetic Tree
Construction Michael Tung - UPC Implementation of the Sparse Triangular Solve
and NAS FT Christian Bell and Rajesh Nishtala - Widescale Load Balanced Shared Memory Model for
Parallel Computing Sonesh Surana, Yatish Patel,
and Dan Adkins
5Planned Guest Lecturers
- Katherine Yelick (UPC, heart modeling)
- David Anderson (volunteer computing)
- Kimmen Sjolander (phylogenetic analysis of
proteins SATCHMO Bonnie Kirkpatrick) - Julian Borrill, (astrophysical data analysis)
- Wes Bethel, (graphics and data visualization)
- Phil Colella, (adaptive mesh refinement)
- David Skinner, (tools for scaling up
applications) - Xiaoye Li, (sparse linear algebra)
- Osni Marques and Tony Drummond, (ACTS Toolkit)
- Andrew Canning (computational neuroscience)
- Michael Wehner (climate modeling)
6Suggested projects (1)
- Weekly research group meetings on these and
related topics (see J. Demmel and K. Yelick) - Contribute to upcoming ScaLAPACK release (JD)
- Proposal, talk at www.cs.berkeley.edu/demmel
ask me for latest - Performance evaluation of existing parallel
algorithms - Ex New eigensolvers based on successive band
reduction - Improved implementations of existing parallel
algorithms - Ex Use UPC to overlap communication, computation
- Many serial algorithms to be parallelized
- See following slides
7Missing Drivers in Sca/LAPACK
LAPACK ScaLAPACK
Linear Equations LU Cholesky LDLT xGESV xPOSV xSYSV PxGESV PxPOSV missing
Least Squares (LS) QR QRpivot SVD/QR SVD/DC SVD/MRRR QR iterative refine. xGELS xGELSY xGELSS xGELSD missing missing PxGELS missing missing missing (intent?) missing missing
Generalized LS LS equality constr. Generalized LM Above Iterative ref. xGGLSE xGGGLM missing missing missing missing
8More missing drivers
LAPACK ScaLAPACK
Symmetric EVD QR / BisectionInvit DC MRRR xSYEV / X xSYEVD xSYEVR PxSYEV / X PxSYEVD missing
Nonsymmetric EVD Schur form Vectors too xGEES / X xGEEV /X missing driver missing driver
SVD QR DC MRRR Jacobi xGESVD xGESDD missing missing PxGESVD missing (intent?) missing Missing
Generalized Symmetric EVD QR / BisectionInvit DC MRRR xSYGV / X xSYGVD missing PxSYGV / X missing (intent?) missing
Generalized Nonsymmetric EVD Schur form Vectors too xGGES / X xGGEV / X missing missing
Generalized SVD Kogbetliantz MRRR xGGSVD missing missing (intent) missing
9Suggested projects (2)
- Contribute to sparse linear algebra (JD KY)
- Performance tuning to minimize latency and
bandwidth costs, both to memory and between
processors (sparse gt few flops per memory
reference or word communicated) - Typical methods (eg CG conjugate gradient) do
some number of dot projects, saxpys for each
SpMV, so communication cost is O( iterations) - Our goal Make latency cost O(1)!
- Requires reorganizing algorithms drastically,
including replacing SpMV by new kernel Ax, A2x,
A3x, , Akx, which can be done with O(1)
messages - Projects
- Study scalability bottlenecks of current CG on
real, large matrices - Optimize Ax, A2x, A3x, , Akx on sequential
machines - Optimize Ax, A2x, A3x, , Akx on parallel
machines
10Suggested projects (3)
- Evaluate new languages on applications (KY)
- UPC or Titanium
- UPC for asynchrony, overlapping communication
computation - ScaLAPACK in UPC
- Use UPC-based 3D FFT in your application
- Optimize existing 1D FFT in UPC, to use 3D
techniques - Porting, Evaluating parallel systems software
(KY) - Port UPC to RAMP
- Port GASNET to Blue Gene, evaluate performance