Infrastructure for Adaptive Scientific Applications - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Infrastructure for Adaptive Scientific Applications

Description:

An application database can record and store application performance data ... This data is then stored in the database the first time only. ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 17
Provided by: josh208
Category:

less

Transcript and Presenter's Notes

Title: Infrastructure for Adaptive Scientific Applications


1
Infrastructure for Adaptive Scientific
Applications
  • Avi Purkayastha
  • Ashok Adiga
  • Texas Advanced Computing Center
  • The University of Texas at Austin

TeraGrid 06, Indianapolis, IN
2
Outline
  • Introduction
  • What is Adaptive Framework
  • Why is it necessary
  • Adaptive Framework Design
  • Scientific Applications
  • Adaptive Application Prototypes
  • Initial Results
  • Future Research and Conclusions

3
Scientific Applications on TG
  • Need different executables for each architecture
  • Often need different executables for same
    architecture as software stack is different
  • Application performance on a given system
    dictates choice of runtime system.
  • Determination of application performance on a
    totally new architecture can often be an
    expensive proposition
  • Myriad of architecture/software combinations
    gives rise to difficult choices at runtime

4
Adaptive Framework I
  • An application database can record and store
    application performance data
  • Performance of main computational kernels vary
    due to a number of parameters including
    architecture.
  • Computational kernel performance can be obtained
    from measurement of either profile data or
    wall-time for each kernel
  • Profile data can be obtained from instrumentation
    tools such as TAU or PAPI.
  • At runtime, the adaptive application may
    retrieve this data to make optimal choices for
    the appropriate computational kernel.

5
Adaptive Framework II
  • Complex applications with extensive profiling,
    can save some of that information based on
    certain metrics in the database.
  • Users can then extract some of that information
    based on different parameters such as
    architecture or the number of CPUs.
  • In the absence of profiling, execution
    information about computational kernels of an
    application can be stored in the Adaptive
    database.
  • This historical data can be extracted for
    future runs and extrapolated at run-time to
    estimate the wall-clock response time for a
    specific kernel.

6
Adaptive Framework II
7
Adaptive Framework III
Pre-Processing call Select_Optimal(Data_Layout
, FDTD, ltattributes_listgt, Optimal_Choice,..) do
call Optimal_Fn(Optimal_Choice,..) while
(timesteps not exceeded) Post-Processing
Application Code snippet with Adaptive
Library Interface function
8
Adaptive Framework Components
  • Library provides functions to
  • open/close connections to remote database
  • authenticate user accessing database
  • store profile and execution data
  • retrieve profile and execution data
  • Database
  • Remote access to performance data
  • Schema supports storage of application specific
    attributes
  • Access functions select best kernel option for a
    given set of attributes

9
Adaptive Framework
Entity Relationship Diagram for Adaptive
Framework Database Schema
10
Scientific Applications I
  • Open-source FDTD serial application
  • Solves time-dependent Maxwells equation in the
    curl form based on the method for Perfectly
    Matched Layer
  • Problems such as simulation of a structure with a
    very fine grid requires large number of timesteps
    (i.e. large number of iterations at runtime)
  • Has two computational kernel optimizations
  • The performance of the computational kernels for
    each iteration is critical to overall
    performance.
  • Need to find optimal kernel performance for each
    architecture or system.

11
Scientific Applications II
  • Generic Matrix-matrix Product
  • Widely researched area.
  • Picked two approaches for doing matrix-matrix
    product -- strip-mining and blocking
  • Wish to illustrate that under sets of parameters
    such as cache line block and matrix sizes, each
    approach will be optimal for a given
    architecture.
  • Therefore need to identify these sets of
    parameters for optimality under a given
    architecture.

12
Adaptive Application Prototypes I
  • The FDTD application code was modified so the
    main outer loop iteration was reduced so
    wall-clock on each computational kernel can be
    obtained.
  • This data is then stored in the database the
    first time only.
  • Subsequent users can check if such data exists in
    the database, and retrieve it if it does.
  • FDTD application was also tested with profile
    data
  • If it is possible to profile an application, then
    a lot more complex parameters can be measured by
    profiling.

13
Adaptive Application Prototypes II
  • The matrix application was tested by collection
    of historical data only.
  • a testing function ran both of the two methods
    with changes in different parameters such as
    cache-size length, block and matrix sizes.
  • With different sets of parameters, this
    application was run on two different
    architectures -- ia32 process node and an Power4
    process.
  • A fixed size set of parameters is presented in
    the results.
  • The same methodology for profiling can be used
    for this application.

14
Initial Results I
15
Initial Results II
  • For FDTD serial application, the results
    presented were obtained for one architecture. The
    best computational kernel option for that
    architecture can be chosen at run-time.
  • For the matrix-product application, the best
    approach depended on the architecture for optimal
    performance.

16
Conclusions and Future Research
  • For serial applications additional parameters
    need to be added to increase complexity.
  • Parallel applications are currently being tested.
    These will add different sets of parameters for
    consideration.
  • From the insights for both serial and parallel
    applications the schema design can be optimized
    further before completion.
  • Finalize interface functions to load and store
    performance data.

17
(No Transcript)
18
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com