Title: Rashmi B'V'
1 Apr 23, 2007
HPCS Languages
- Presented by
- Rashmi B.V.
- Shilpa Sudhakaran
2Outline
- DARPA HPCS Project
- HPCS Languages Need for them
- DARPA HPCS Project Phases Timeline
- HPCS languages as a group
- Challenges faces by HPCS languages
- IBM X10
- Sun Fortress
- Cray Chapel
- Conclusion
3DARPA HPCS Project
- What is it?
- A 10 year, 3 phase hardware/software effort
to transform the productivity aspect of the HPC
enterprise. - A program to fill in the gap between todays
HPC technology (late 80s based) and the future
promise of quantum computing. Goal is to provide
a new generation of economically viable,
scalable, high productivity computing systems for
the national security industrial user
communitites with the following design
attributes - - Performance - Programmability
- Portability - Robustness -
4DARPA HPCS Project (Contd.)
- HPCS Project Computing elements
- HPCS machines will be a new generation of
petaflops-scale supercomputers. - Under this program, Cray is creating a
hybrid architecture with Cascade (a
petaflops-scale machine) that will bring Opteron,
MTA, vector and FPGA computing elements all into
a shared, global memory system that can span upto
10 petaFlops of performance. - Will undoubtedly be a follow-up program to
HPCS that will push well beyond petaflops
computing. -
- The software component of the project are
HPCS Languages. The current approach is
traditional high-level languages Fortran, C and
C with calls to MPI. PGAS (Partitioned Global
Address Space) languages UPC (Unified
Parallel), CAF(Co Array Fortran) and
Titanium(Java based) are also gaining some
ground. Static number of processes. Global view
of data but explicit local/remote distinction of
data for performance. -
-
-
-
5HPCS Languages the Need for them
- The HPCS Project petascale systems require a
language model that can scale more easily than
the current MPI OpenMP models and exploit the
HPCS h/w platforms. - Both C and Fortran lack high-level abstraction
OO constructs, generic templates, type checking
etc. essential for modern s/w engineering. - Similar to PGAS languages, will provide a global
view of data (with locality hints) that
facilitates ease of use. Can directly read/write
remote variables. Will avoid message-passing. - Need a modern general-purpose HPC language that
will improve productivity with advanced features
that support parallel semantics and support
dynamic task parallelism (MPI used in static
parallelism mode). - It is recognized that, in order to be accepted,
any HPCS language will have to be effective of
other parallel architectures (besides the HPCS
machines). In addition, the HPCS machines will
have to run other PGAS and MPI programs too
(besides HPCS languages).
6DARPA HPCS Project Phases Timeline
- 2002 Phase I
- Contracts awarded to IBM, Cray,
Sun, SGI HP. System concept study and technical
assessment. - 2003 Phase II
- IBM, Cray Sun involved. RD -
determined H/W S/W technology components,
system architecture programming models. The 3
HPCS language prototypes developed during this
phase X10 by IBM, Chapel by Cray, Fortress by
Sun. -
- 2006 Phase III
- Sun Microsystems dropped. IBM Cray
move forward to carry out full scale development
of Phase II models. Application productivity
analysis performance assessment. - 2010 Final version of supercomputer expected to
become available for use serve as a model for
new generation platforms.
7DARPA HPCS Project Phases Timeline (contd.)
8HPCS Languages as a Group
- Base language
- X10 Java (OO language)
- Chapel Fortress own OO language
- Creation of Parallelism
- - Not SPMD Initially single thread of
control, parallelism thru language constructs. - - All have dynamic parallelism for loops as
well as for tasks. Mixed task data - parallelism.
- - Threads are grouped by memory locality.
Explicity 2 level (Chapel, X10) or - heirarchical(Fortress)
- - Programmer specifies as much parallelism as
possible - compiler/runtime will control how much
actually executed in parallel. - - Fortress Parallelism is default. (loops
implicity parallel but can be forced serial) - - X10 Parallel across places (units of
locality). Method invocations on other places - implicitly parallel.
- Sharing Communication
- - Chapel Fortress are GAS (Global Address
space) languages. - - X10 is a parallel OO language.
9HPCS Languages as a Group (contd.)
- Synchronization
- All 3 support atomic blocks. None have locks
(harder to manage than - atomic sections).
- - X10 has clocks (barriers with dynamically
attached tasks), conditional - atomic sections, sync. Variables.
- - Chapel has single (single writer) and
sync (multiple readers writers) - variables.
- - Fortress has abortable atomic sections a
mechanism for waiting on - individual spawned threads.
- Locality
- - All 3 have a way to associate computation
with data, for performance. Looks - a little different in each language
(places vs locales) - - Explicitly distributed data structures
enable automation of this (esp. for arrays) - - Delegation of the problem to libraries
(esp. in Fortress).
10Challenges faced by HPCS languages
- Have language that works MPI with Fortran, C
or C. Users already - familiar with these languages. Minimal
refactoring to parallelize existing - code.
- Always difficult for new languages to gain
acceptance. Users are - conservative barrier of legacy code is
formidable. - Also, need to get support from s/w tool
developers not just compilers, - but debuggers, libraries integrated
development environments. -
11X10
- Primary authors Kemal Ebcioglu, Vijay Saraswat
Vivek Sarkar. - Extended subset of Java with strong resemblance
in most aspects, but - features additional support for arrays
concurrency. - Uses a PGAS (Partitioned Global Address Space)
model that supports both - OO non-Object Oriented paradigms.
- Schedule
- 7/2004 - First draft of X10 language
specification - 2/2005 - First X10 implementation
(unoptimized single VM prototype) - 1/2006 - Second X10 implementation (optimized
multi VM prototype) - 6/2006 - Open source release of X10 reference
implementation. - Design completed for production
X10 implementation in Phase 3.
12X10 (contd.)
- Foreign functions
- - Java methods can be called directly from
x10 programs. Java class loaded - automatically as part of X10 program
execution. - - C functions need to use extern declaration
in X10, and perform a - System.LoadLibrary() call
- X10 guarantee
- Any program written with async, finish,
atomic, foreach, ateach clock parallel - constructs will never deadlock.
- Uses the concept of Parent Child
relationships for tasks to prevent - deadlocks that occur when 2 or more
processors wait for each other to finish - before they can complete. A task may spawn 1
or more children, which in turn can - spawn more children. Children cannot wait for
the parent to finish, but a parent - can wait for a child using the finish
command. - To avoid data races
- - Use atomic methods blocks without
worrying about deadlock. - - Declare data to be read only whenever
possible.
13HPCS Languages (Contd)
14FORTRESS
- General Purpose programming Language for HPC from
Sun - Fortress Secure Fortran
- Open Source
-
- Design Strategy Push decisions to libraries
15Problems to Consider
- Mathematical Notation
- Dimension Checking
- Communication Efficiency
- Memory Consistency
- Synchronization
16Fortress Programming Environment
- Support for Unicode characters
- Ex To enter Greek letter ?, we need to enter
- GREEK_CAPITAL_LETTER_LAMBDA
- Juxtaposition
- factorial (n)
- if n 0 then 1
- else n factorial (n - 1) end
- Function Overloading
- Function Contracts
- - requires and ensures clause
17- Numeric types can be annotated with physical
units and dimensions - kineticEnergy(mR kg, v R m/s) R kg m2/s2
(m v2)/2 - Support for summations and products
- factorial(n) PRODUCTi lt- 1n I
- Aggregate Expressions
- Ex Single dimension array a 0 1 2 3 4
- Two dimension array b 3 4
- 5 6
18- Supports data types like Strings, Boolean,
Integer, Float etc - Objects - consisting of fields and methods
- Traits named program constructs that declare
sets of methods. - Methods declared may be concrete or abstract.
- Traits may be inherited by another trait or by an
object
19Example NAS Conjugate Gradient (ASCII)
- conjGrad(A MatrixFloat, x VectorFloat)
- (VectorFloat, Float)
- cgit_max 25
- z VectorFloat 0
- r VectorFloat x
- p VectorFloat r
- rho Float rT r
- for j lt- seq(1cgit_max) do
- q A p
- alpha rho / pT q
- z z alpha p
- r r - alpha q
- rho0 rho
- rho rT r
- beta rho / rho0
- p r beta p
- end
- (z, x A z)
- MatrixT and VectorT are
- Parameterized interfaces, where T
- is the type of the elements. The
- form xTe declares a variable
- X of type T with initial value e, and
- that variable may be updated using
- the assignment operator .
20Example NAS Conjugate Gradient (ASCII)
- conjGradE extends Number, nat N,
- Mat extends MatrixE,N BY N,
- Vec extends VectorE,N
- (A Mat, x Vec) (Vec, E)
- cgitmax 25
- z Vec 0
- r Vec x
- p Vec r
- rho Elt rT r
- for j lt- seq(1cgit_max) do
- q A p
- alpha rho / pT q
- z z alpha p
- r r - alpha q
- rho0 rho
- rho rT r
- beta rho / rho0
- p r beta p
- end
- Here we make conjGrad a generic
- procedure. The runtime compiler
- may produce multiple instantiations
- of the code for various types E.
- The form xe as a statement
- declares variable x to have an
- unchanging value. The type of x is
- exactly the type of the
- expression e.
21Example NAS Conjugate Gradient (UNICODE)
- conjGradE extends Number, nat N,
- Mat extends MatrixE,N?N,
- Vec extends VectorE,N
- (A Mat, x Vec) (Vec, E)
- cgit_max 25
- z Vec 0
- r Vec x
- p Vec r
- ? E rT r
- do j ? seq(1cgit_max) do
- q A p
- a ? / pT q
- z z a p
- r r - a q
- ?0 ?
- ? rT r
- ß ? / ?0
- p r ß p
- end do
- This would be considered entirely
- equivalent to the previous version.
- You might think of this as an
- abbreviated form of the ASCII
- version, or you might think of the
- ASCII version as a way to
- conveniently enter this version on
- a standard keyboard.
22Parallelism in Fortress
- Two types of threads Implicit and spawned
- A number of constructs are implicitly parallel
- Ex Tuple Expressions, do blocks, Function
calls, for loops - Programmer cannot work on any implicit thread
- Scheduling of the threads managed by compiler,
runtime and libraries - Spawned threads are of type Threads T
- Has methods like val, wait, ready, stop
23Parallelism in Fortress (Contd)
- Regions describe machine resources.
- Distributions map aggregates onto regions.
- Aggregates used as generators drive parallelism.
- Algebraic properties drive implementation
strategies. - Algebraic properties are described by traits.
- Traits allow sharing of code, properties, and
test data - Properties are verified by automated unit
testing. - Reducers and generators to achieve mix-and-match
code selection.
24Memory Model
- Programs are multithreaded by design
- Model written with principles in mind
- Violations must still respect the underlying data
abstractions - Should be understood by programmers and
implementors. - Permit aggressive optimizations.
- For Sequential consistency, atomic operations for
updates to shared location - Two orderings dynamic program order and memory
order
25Memory order
- There is a single memory order which is respected
in all threads. - Every read obtains the value of the immediately
preceding write to the identical location in
memory order. - Memory order on atomic expressions respects
dynamic program order. - Memory order respects dynamic program order for
operations that certainly access the same
location. - Initializing writes are ordered before any other
memory access to the same location
26Fortress - Summary
- Easy to code parallel applications
- Focus on reducing application and compiler
complexity - Blackboard notation for Mathematical code
27Chapel
- Parallel Programming language from Cray Inc.
- Provides a higher level of expression than
current - Multithreaded, with abstractions for data, task
and nested parallelism - Code reuse and genarality
- Based on HPF, MTAx extension to Fortran and C.
28Global View Programming Model
- Raises the level of abstraction for data and
control - Global view data structures Size and indices
expresses globally - Global view of control Language concepts for
parallelism - Broad range of parallel architectures can be
supported like SMPs, clusters, Multicore,
Distributed memory etc.
29Other Motivating principles
- User to specify where to put data and computation
in the physical machine - Support for Object oriented programming
- Does not necessitate use of OOP concepts.
- Chapel library implemented using objects
- Support for generic programming and polymorphism
- Compiler creates different versions of the code
for each type
30Program Examples Jacobi Method
- config var n 5, // size of nxn grid
- epsilon 0.00001, // convergence tolerance
- verbose false // control for amount of output
- def main()
- const ProblemSpace 1..n, 1..n, // domain for
interior points - BigDomain 0..n1, 0..n1 // domain with
boundary points - var X, XNew BigDomain real 0.0 // X holds
approximate solution - // XNew is work array
- Xn1, 1..n 1.0
- if (verbose)
- writeln("Initial configuration")
- writeln(X, "\n")
-
- var iteration 0, // iteration counter
- delta real // covergence measure
31- do
- forall (i,j) in ProblemSpace do
- XNew(i,j) (X(i-1,j) X(i1,j) X(i,j-1)
X(i,j1)) / 4.0 - delta max reduce abs(XNewProblemSpace -
XProblemSpace) - XProblemSpace XNewProblemSpace
- iteration 1
- if (verbose)
- writeln("iteration ", iteration)
- writeln(X)
- writeln("delta ", delta, "\n")
-
- while (delta gt epsilon)
- writeln("Jacobi computation complete.")
- writeln("Delta is ", delta, " (lt epsilon ",
epsilon, ")") - writeln(" of iterations ", iteration)
32Producer- Consumer Example
- use Time
- config var numIterations int 5,
- sleepTime uint 2
- var s sync int
- begin // create consumer computation
- for c in 1..numIterations do
- writeln("consumer got ", s)
-
- // producer computation
- for p in 1..numIterations
- sleep(sleepTime)
- s p
33Stack Implementation Example
- class MyNode
- type itemType
- var item itemType
- var next MyNode(itemType)
-
- record Stack
- type itemType
- var top MyNode(itemType)
- def push(item itemType)
- top MyNode(itemType, item, top)
-
- def pop()
- if isEmpty? then
- halt("attempt to pop an item off an empty
stack") - var oldTop top
- top top.next
- return oldTop.item
-
- def isEmpty? return top nil
- var stack1 Stack(string)
- stack1.push("one")
- stack1.push("two")
- stack1.push("three")
- writeln(stack1.pop())
- writeln(stack1.pop())
- writeln(stack1.pop())
34Parallelization and Syncronization
- Forall - variation of for loop for concurrent
execution - Compiler and runtime determine the concurrency
- Keyword ordered can be used to give a partial
order - Cobegin - Creates parallelism among statements
within a block - Begin Spawns a computation to execute a
statement - Syncronization variable Coordinate computations
that access same data - Single and sync variables
-
35Locality and Distribution
- Locale Unit in target architecture that
supports computations and data storage - Mapping from domain index values to locales is
called Distributions. - The block distribution Block
- The cyclic distribution Cyclic
- The block-cyclic distribution BlockCyclic
- The cut distribution Cut
36Chapel - Summary
- Enhance programmer productivity
- Multithreaded parallel programming
- Locality aware programming
- Generic programming and type inference
37 Future of HPCS Languages
- DARPA project timelines
- 2002 DARPA awards the first four Phase I
contracts to build next-generation supercomputer
to IBM, Cray, Sun and SGI. Each contract was
worth 3 million. - 2003 DARPA awards 146 million to IBM, Cray and
Sun for Phase II of supercomputer project. SGI is
dropped from the program. - 2006 DARPA awards approximately 500 million to
Cray and IBM for Phase III of the project. Sun is
dropped from the program. - 2010 Final version of supercomputer expected to
go online and serve as model for a new generation
of high-performance computing platforms. - Phase III Each vendor Provide a working
prototype of the implementation - Future focus on performance credibility
- Merging the three languages into one
38 References
- http//www.darpa.mil/ipto/programs/hpcs/index.htm
- http//www.itjungle.com/breaking/bn112706-story01.
html - https//ft.ornl.gov/rbarrett/cw120307.html
- http//www.hoise.com/primeur/03/articles/monthly/A
E-PR-08-03-29.html - http//www.hpcwire.com/hpc/827250.html
- http//www.channelinsider.com/article/IBMCrayWin
DARPASupercomputerContracts/194797_1.aspx - http//www.hpcwire.com/hpc/837711.html
- http//newportwire.hpcwire.com/understanding_the_d
arpa_hpcs_program.htm - www.ahpcrc.org/conferences/PGAS2006/presentations/
Yelick.pdf - http//en.wikipedia.org/wiki/X10_28programming_la
nguage29 - http//x10.sourceforge.net/x10presentations.shtml
- http//www.research.ibm.com/x10
- http//en.wikipedia.org/wiki/Fortress_programming_
language - http//research.sun.com/projects/plrg/
- http//en.wikipedia.org/wiki/Chapel_programming_la
nguage - http//chapel.cs.washington.edu/