Rashmi B'V' - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Rashmi B'V'

Description:

Goal is to provide a new generation of economically viable, ... Will undoubtedly be a follow-up program to HPCS that will push well beyond petaflops computing. ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 39
Provided by: Shi8153
Category:

less

Transcript and Presenter's Notes

Title: Rashmi B'V'


1
Apr 23, 2007
HPCS Languages
  • Presented by
  • Rashmi B.V.
  • Shilpa Sudhakaran

2
Outline
  • DARPA HPCS Project
  • HPCS Languages Need for them
  • DARPA HPCS Project Phases Timeline
  • HPCS languages as a group
  • Challenges faces by HPCS languages
  • IBM X10
  • Sun Fortress
  • Cray Chapel
  • Conclusion

3
DARPA HPCS Project
  • What is it?
  • A 10 year, 3 phase hardware/software effort
    to transform the productivity aspect of the HPC
    enterprise.
  • A program to fill in the gap between todays
    HPC technology (late 80s based) and the future
    promise of quantum computing. Goal is to provide
    a new generation of economically viable,
    scalable, high productivity computing systems for
    the national security industrial user
    communitites with the following design
    attributes
  • - Performance - Programmability
    - Portability - Robustness

4
DARPA HPCS Project (Contd.)
  • HPCS Project Computing elements
  • HPCS machines will be a new generation of
    petaflops-scale supercomputers.
  • Under this program, Cray is creating a
    hybrid architecture with Cascade (a
    petaflops-scale machine) that will bring Opteron,
    MTA, vector and FPGA computing elements all into
    a shared, global memory system that can span upto
    10 petaFlops of performance.
  • Will undoubtedly be a follow-up program to
    HPCS that will push well beyond petaflops
    computing.
  • The software component of the project are
    HPCS Languages. The current approach is
    traditional high-level languages Fortran, C and
    C with calls to MPI. PGAS (Partitioned Global
    Address Space) languages UPC (Unified
    Parallel), CAF(Co Array Fortran) and
    Titanium(Java based) are also gaining some
    ground. Static number of processes. Global view
    of data but explicit local/remote distinction of
    data for performance.

5
HPCS Languages the Need for them
  • The HPCS Project petascale systems require a
    language model that can scale more easily than
    the current MPI OpenMP models and exploit the
    HPCS h/w platforms.
  • Both C and Fortran lack high-level abstraction
    OO constructs, generic templates, type checking
    etc. essential for modern s/w engineering.
  • Similar to PGAS languages, will provide a global
    view of data (with locality hints) that
    facilitates ease of use. Can directly read/write
    remote variables. Will avoid message-passing.
  • Need a modern general-purpose HPC language that
    will improve productivity with advanced features
    that support parallel semantics and support
    dynamic task parallelism (MPI used in static
    parallelism mode).
  • It is recognized that, in order to be accepted,
    any HPCS language will have to be effective of
    other parallel architectures (besides the HPCS
    machines). In addition, the HPCS machines will
    have to run other PGAS and MPI programs too
    (besides HPCS languages).

6
DARPA HPCS Project Phases Timeline
  • 2002 Phase I
  • Contracts awarded to IBM, Cray,
    Sun, SGI HP. System concept study and technical
    assessment.
  • 2003 Phase II
  • IBM, Cray Sun involved. RD -
    determined H/W S/W technology components,
    system architecture programming models. The 3
    HPCS language prototypes developed during this
    phase X10 by IBM, Chapel by Cray, Fortress by
    Sun.
  • 2006 Phase III
  • Sun Microsystems dropped. IBM Cray
    move forward to carry out full scale development
    of Phase II models. Application productivity
    analysis performance assessment.
  • 2010 Final version of supercomputer expected to
    become available for use serve as a model for
    new generation platforms.

7
DARPA HPCS Project Phases Timeline (contd.)
8
HPCS Languages as a Group
  • Base language
  • X10 Java (OO language)
  • Chapel Fortress own OO language
  • Creation of Parallelism
  • - Not SPMD Initially single thread of
    control, parallelism thru language constructs.
  • - All have dynamic parallelism for loops as
    well as for tasks. Mixed task data
  • parallelism.
  • - Threads are grouped by memory locality.
    Explicity 2 level (Chapel, X10) or
  • heirarchical(Fortress)
  • - Programmer specifies as much parallelism as
    possible
  • compiler/runtime will control how much
    actually executed in parallel.
  • - Fortress Parallelism is default. (loops
    implicity parallel but can be forced serial)
  • - X10 Parallel across places (units of
    locality). Method invocations on other places
  • implicitly parallel.
  • Sharing Communication
  • - Chapel Fortress are GAS (Global Address
    space) languages.
  • - X10 is a parallel OO language.

9
HPCS Languages as a Group (contd.)
  • Synchronization
  • All 3 support atomic blocks. None have locks
    (harder to manage than
  • atomic sections).
  • - X10 has clocks (barriers with dynamically
    attached tasks), conditional
  • atomic sections, sync. Variables.
  • - Chapel has single (single writer) and
    sync (multiple readers writers)
  • variables.
  • - Fortress has abortable atomic sections a
    mechanism for waiting on
  • individual spawned threads.
  • Locality
  • - All 3 have a way to associate computation
    with data, for performance. Looks
  • a little different in each language
    (places vs locales)
  • - Explicitly distributed data structures
    enable automation of this (esp. for arrays)
  • - Delegation of the problem to libraries
    (esp. in Fortress).

10
Challenges faced by HPCS languages
  • Have language that works MPI with Fortran, C
    or C. Users already
  • familiar with these languages. Minimal
    refactoring to parallelize existing
  • code.
  • Always difficult for new languages to gain
    acceptance. Users are
  • conservative barrier of legacy code is
    formidable.
  • Also, need to get support from s/w tool
    developers not just compilers,
  • but debuggers, libraries integrated
    development environments.

11
X10
  • Primary authors Kemal Ebcioglu, Vijay Saraswat
    Vivek Sarkar.
  • Extended subset of Java with strong resemblance
    in most aspects, but
  • features additional support for arrays
    concurrency.
  • Uses a PGAS (Partitioned Global Address Space)
    model that supports both
  • OO non-Object Oriented paradigms.
  • Schedule
  • 7/2004 - First draft of X10 language
    specification
  • 2/2005 - First X10 implementation
    (unoptimized single VM prototype)
  • 1/2006 - Second X10 implementation (optimized
    multi VM prototype)
  • 6/2006 - Open source release of X10 reference
    implementation.
  • Design completed for production
    X10 implementation in Phase 3.

12
X10 (contd.)
  • Foreign functions
  • - Java methods can be called directly from
    x10 programs. Java class loaded
  • automatically as part of X10 program
    execution.
  • - C functions need to use extern declaration
    in X10, and perform a
  • System.LoadLibrary() call
  • X10 guarantee
  • Any program written with async, finish,
    atomic, foreach, ateach clock parallel
  • constructs will never deadlock.
  • Uses the concept of Parent Child
    relationships for tasks to prevent
  • deadlocks that occur when 2 or more
    processors wait for each other to finish
  • before they can complete. A task may spawn 1
    or more children, which in turn can
  • spawn more children. Children cannot wait for
    the parent to finish, but a parent
  • can wait for a child using the finish
    command.
  • To avoid data races
  • - Use atomic methods blocks without
    worrying about deadlock.
  • - Declare data to be read only whenever
    possible.

13
HPCS Languages (Contd)
14
FORTRESS
  • General Purpose programming Language for HPC from
    Sun
  • Fortress Secure Fortran
  • Open Source
  • Design Strategy Push decisions to libraries

15
Problems to Consider
  • Mathematical Notation
  • Dimension Checking
  • Communication Efficiency
  • Memory Consistency
  • Synchronization

16
Fortress Programming Environment
  • Support for Unicode characters
  • Ex To enter Greek letter ?, we need to enter
  • GREEK_CAPITAL_LETTER_LAMBDA
  • Juxtaposition
  • factorial (n)
  • if n 0 then 1
  • else n factorial (n - 1) end
  • Function Overloading
  • Function Contracts
  • - requires and ensures clause

17
  • Numeric types can be annotated with physical
    units and dimensions
  • kineticEnergy(mR kg, v R m/s) R kg m2/s2
    (m v2)/2
  • Support for summations and products
  • factorial(n) PRODUCTi lt- 1n I
  • Aggregate Expressions
  • Ex Single dimension array a 0 1 2 3 4
  • Two dimension array b 3 4
  • 5 6

18
  • Supports data types like Strings, Boolean,
    Integer, Float etc
  • Objects - consisting of fields and methods
  • Traits named program constructs that declare
    sets of methods.
  • Methods declared may be concrete or abstract.
  • Traits may be inherited by another trait or by an
    object

19
Example NAS Conjugate Gradient (ASCII)
  • conjGrad(A MatrixFloat, x VectorFloat)
  • (VectorFloat, Float)
  • cgit_max 25
  • z VectorFloat 0
  • r VectorFloat x
  • p VectorFloat r
  • rho Float rT r
  • for j lt- seq(1cgit_max) do
  • q A p
  • alpha rho / pT q
  • z z alpha p
  • r r - alpha q
  • rho0 rho
  • rho rT r
  • beta rho / rho0
  • p r beta p
  • end
  • (z, x A z)
  • MatrixT and VectorT are
  • Parameterized interfaces, where T
  • is the type of the elements. The
  • form xTe declares a variable
  • X of type T with initial value e, and
  • that variable may be updated using
  • the assignment operator .

20
Example NAS Conjugate Gradient (ASCII)
  • conjGradE extends Number, nat N,
  • Mat extends MatrixE,N BY N,
  • Vec extends VectorE,N
  • (A Mat, x Vec) (Vec, E)
  • cgitmax 25
  • z Vec 0
  • r Vec x
  • p Vec r
  • rho Elt rT r
  • for j lt- seq(1cgit_max) do
  • q A p
  • alpha rho / pT q
  • z z alpha p
  • r r - alpha q
  • rho0 rho
  • rho rT r
  • beta rho / rho0
  • p r beta p
  • end
  • Here we make conjGrad a generic
  • procedure. The runtime compiler
  • may produce multiple instantiations
  • of the code for various types E.
  • The form xe as a statement
  • declares variable x to have an
  • unchanging value. The type of x is
  • exactly the type of the
  • expression e.

21
Example NAS Conjugate Gradient (UNICODE)
  • conjGradE extends Number, nat N,
  • Mat extends MatrixE,N?N,
  • Vec extends VectorE,N
  • (A Mat, x Vec) (Vec, E)
  • cgit_max 25
  • z Vec 0
  • r Vec x
  • p Vec r
  • ? E rT r
  • do j ? seq(1cgit_max) do
  • q A p
  • a ? / pT q
  • z z a p
  • r r - a q
  • ?0 ?
  • ? rT r
  • ß ? / ?0
  • p r ß p
  • end do
  • This would be considered entirely
  • equivalent to the previous version.
  • You might think of this as an
  • abbreviated form of the ASCII
  • version, or you might think of the
  • ASCII version as a way to
  • conveniently enter this version on
  • a standard keyboard.

22
Parallelism in Fortress
  • Two types of threads Implicit and spawned
  • A number of constructs are implicitly parallel
  • Ex Tuple Expressions, do blocks, Function
    calls, for loops
  • Programmer cannot work on any implicit thread
  • Scheduling of the threads managed by compiler,
    runtime and libraries
  • Spawned threads are of type Threads T
  • Has methods like val, wait, ready, stop

23
Parallelism in Fortress (Contd)
  • Regions describe machine resources.
  • Distributions map aggregates onto regions.
  • Aggregates used as generators drive parallelism.
  • Algebraic properties drive implementation
    strategies.
  • Algebraic properties are described by traits.
  • Traits allow sharing of code, properties, and
    test data
  • Properties are verified by automated unit
    testing.
  • Reducers and generators to achieve mix-and-match
    code selection.

24
Memory Model
  • Programs are multithreaded by design
  • Model written with principles in mind
  • Violations must still respect the underlying data
    abstractions
  • Should be understood by programmers and
    implementors.
  • Permit aggressive optimizations.
  • For Sequential consistency, atomic operations for
    updates to shared location
  • Two orderings dynamic program order and memory
    order

25
Memory order
  • There is a single memory order which is respected
    in all threads.
  • Every read obtains the value of the immediately
    preceding write to the identical location in
    memory order.
  • Memory order on atomic expressions respects
    dynamic program order.
  • Memory order respects dynamic program order for
    operations that certainly access the same
    location.
  • Initializing writes are ordered before any other
    memory access to the same location

26
Fortress - Summary
  • Easy to code parallel applications
  • Focus on reducing application and compiler
    complexity
  • Blackboard notation for Mathematical code

27
Chapel
  • Parallel Programming language from Cray Inc.
  • Provides a higher level of expression than
    current
  • Multithreaded, with abstractions for data, task
    and nested parallelism
  • Code reuse and genarality
  • Based on HPF, MTAx extension to Fortran and C.

28
Global View Programming Model
  • Raises the level of abstraction for data and
    control
  • Global view data structures Size and indices
    expresses globally
  • Global view of control Language concepts for
    parallelism
  • Broad range of parallel architectures can be
    supported like SMPs, clusters, Multicore,
    Distributed memory etc.

29
Other Motivating principles
  • User to specify where to put data and computation
    in the physical machine
  • Support for Object oriented programming
  • Does not necessitate use of OOP concepts.
  • Chapel library implemented using objects
  • Support for generic programming and polymorphism
  • Compiler creates different versions of the code
    for each type

30
Program Examples Jacobi Method
  • config var n 5, // size of nxn grid
  • epsilon 0.00001, // convergence tolerance
  • verbose false // control for amount of output
  • def main()
  • const ProblemSpace 1..n, 1..n, // domain for
    interior points
  • BigDomain 0..n1, 0..n1 // domain with
    boundary points
  • var X, XNew BigDomain real 0.0 // X holds
    approximate solution
  • // XNew is work array
  • Xn1, 1..n 1.0
  • if (verbose)
  • writeln("Initial configuration")
  • writeln(X, "\n")
  • var iteration 0, // iteration counter
  • delta real // covergence measure

31
  • do
  • forall (i,j) in ProblemSpace do
  • XNew(i,j) (X(i-1,j) X(i1,j) X(i,j-1)
    X(i,j1)) / 4.0
  • delta max reduce abs(XNewProblemSpace -
    XProblemSpace)
  • XProblemSpace XNewProblemSpace
  • iteration 1
  • if (verbose)
  • writeln("iteration ", iteration)
  • writeln(X)
  • writeln("delta ", delta, "\n")
  • while (delta gt epsilon)
  • writeln("Jacobi computation complete.")
  • writeln("Delta is ", delta, " (lt epsilon ",
    epsilon, ")")
  • writeln(" of iterations ", iteration)

32
Producer- Consumer Example
  • use Time
  • config var numIterations int 5,
  • sleepTime uint 2
  • var s sync int
  • begin // create consumer computation
  • for c in 1..numIterations do
  • writeln("consumer got ", s)
  • // producer computation
  • for p in 1..numIterations
  • sleep(sleepTime)
  • s p

33
Stack Implementation Example
  • class MyNode
  • type itemType
  • var item itemType
  • var next MyNode(itemType)
  • record Stack
  • type itemType
  • var top MyNode(itemType)
  • def push(item itemType)
  • top MyNode(itemType, item, top)
  • def pop()
  • if isEmpty? then
  • halt("attempt to pop an item off an empty
    stack")
  • var oldTop top
  • top top.next
  • return oldTop.item
  • def isEmpty? return top nil
  • var stack1 Stack(string)
  • stack1.push("one")
  • stack1.push("two")
  • stack1.push("three")
  • writeln(stack1.pop())
  • writeln(stack1.pop())
  • writeln(stack1.pop())

34
Parallelization and Syncronization
  • Forall - variation of for loop for concurrent
    execution
  • Compiler and runtime determine the concurrency
  • Keyword ordered can be used to give a partial
    order
  • Cobegin - Creates parallelism among statements
    within a block
  • Begin Spawns a computation to execute a
    statement
  • Syncronization variable Coordinate computations
    that access same data
  • Single and sync variables

35
Locality and Distribution
  • Locale Unit in target architecture that
    supports computations and data storage
  • Mapping from domain index values to locales is
    called Distributions.
  • The block distribution Block
  • The cyclic distribution Cyclic
  • The block-cyclic distribution BlockCyclic
  • The cut distribution Cut

36
Chapel - Summary
  • Enhance programmer productivity
  • Multithreaded parallel programming
  • Locality aware programming
  • Generic programming and type inference

37
Future of HPCS Languages
  • DARPA project timelines
  • 2002 DARPA awards the first four Phase I
    contracts to build next-generation supercomputer
    to IBM, Cray, Sun and SGI. Each contract was
    worth 3 million.
  • 2003 DARPA awards 146 million to IBM, Cray and
    Sun for Phase II of supercomputer project. SGI is
    dropped from the program.
  • 2006 DARPA awards approximately 500 million to
    Cray and IBM for Phase III of the project. Sun is
    dropped from the program.
  • 2010 Final version of supercomputer expected to
    go online and serve as model for a new generation
    of high-performance computing platforms.
  • Phase III Each vendor Provide a working
    prototype of the implementation
  • Future focus on performance credibility
  • Merging the three languages into one

38
References
  • http//www.darpa.mil/ipto/programs/hpcs/index.htm
  • http//www.itjungle.com/breaking/bn112706-story01.
    html
  • https//ft.ornl.gov/rbarrett/cw120307.html
  • http//www.hoise.com/primeur/03/articles/monthly/A
    E-PR-08-03-29.html
  • http//www.hpcwire.com/hpc/827250.html
  • http//www.channelinsider.com/article/IBMCrayWin
    DARPASupercomputerContracts/194797_1.aspx
  • http//www.hpcwire.com/hpc/837711.html
  • http//newportwire.hpcwire.com/understanding_the_d
    arpa_hpcs_program.htm
  • www.ahpcrc.org/conferences/PGAS2006/presentations/
    Yelick.pdf
  • http//en.wikipedia.org/wiki/X10_28programming_la
    nguage29
  • http//x10.sourceforge.net/x10presentations.shtml
  • http//www.research.ibm.com/x10
  • http//en.wikipedia.org/wiki/Fortress_programming_
    language
  • http//research.sun.com/projects/plrg/
  • http//en.wikipedia.org/wiki/Chapel_programming_la
    nguage
  • http//chapel.cs.washington.edu/
Write a Comment
User Comments (0)
About PowerShow.com