Title: Infrastructure for Adaptive Scientific Applications
1Infrastructure for Adaptive Scientific
Applications
- Avi Purkayastha
- Ashok Adiga
- Texas Advanced Computing Center
- The University of Texas at Austin
TeraGrid 06, Indianapolis, IN
2Outline
- Introduction
- What is Adaptive Framework
- Why is it necessary
- Adaptive Framework Design
- Scientific Applications
- Adaptive Application Prototypes
- Initial Results
- Future Research and Conclusions
3Scientific Applications on TG
- Need different executables for each architecture
- Often need different executables for same
architecture as software stack is different - Application performance on a given system
dictates choice of runtime system. - Determination of application performance on a
totally new architecture can often be an
expensive proposition - Myriad of architecture/software combinations
gives rise to difficult choices at runtime
4Adaptive Framework I
- An application database can record and store
application performance data - Performance of main computational kernels vary
due to a number of parameters including
architecture. - Computational kernel performance can be obtained
from measurement of either profile data or
wall-time for each kernel - Profile data can be obtained from instrumentation
tools such as TAU or PAPI. - At runtime, the adaptive application may
retrieve this data to make optimal choices for
the appropriate computational kernel.
5Adaptive Framework II
- Complex applications with extensive profiling,
can save some of that information based on
certain metrics in the database. - Users can then extract some of that information
based on different parameters such as
architecture or the number of CPUs. - In the absence of profiling, execution
information about computational kernels of an
application can be stored in the Adaptive
database. - This historical data can be extracted for
future runs and extrapolated at run-time to
estimate the wall-clock response time for a
specific kernel.
6Adaptive Framework II
7Adaptive Framework III
Pre-Processing call Select_Optimal(Data_Layout
, FDTD, ltattributes_listgt, Optimal_Choice,..) do
call Optimal_Fn(Optimal_Choice,..) while
(timesteps not exceeded) Post-Processing
Application Code snippet with Adaptive
Library Interface function
8Adaptive Framework Components
- Library provides functions to
- open/close connections to remote database
- authenticate user accessing database
- store profile and execution data
- retrieve profile and execution data
- Database
- Remote access to performance data
- Schema supports storage of application specific
attributes - Access functions select best kernel option for a
given set of attributes
9Adaptive Framework
Entity Relationship Diagram for Adaptive
Framework Database Schema
10Scientific Applications I
- Open-source FDTD serial application
- Solves time-dependent Maxwells equation in the
curl form based on the method for Perfectly
Matched Layer - Problems such as simulation of a structure with a
very fine grid requires large number of timesteps
(i.e. large number of iterations at runtime) - Has two computational kernel optimizations
- The performance of the computational kernels for
each iteration is critical to overall
performance. - Need to find optimal kernel performance for each
architecture or system.
11Scientific Applications II
- Generic Matrix-matrix Product
- Widely researched area.
- Picked two approaches for doing matrix-matrix
product -- strip-mining and blocking - Wish to illustrate that under sets of parameters
such as cache line block and matrix sizes, each
approach will be optimal for a given
architecture. - Therefore need to identify these sets of
parameters for optimality under a given
architecture.
12Adaptive Application Prototypes I
- The FDTD application code was modified so the
main outer loop iteration was reduced so
wall-clock on each computational kernel can be
obtained. - This data is then stored in the database the
first time only. - Subsequent users can check if such data exists in
the database, and retrieve it if it does. - FDTD application was also tested with profile
data - If it is possible to profile an application, then
a lot more complex parameters can be measured by
profiling.
13Adaptive Application Prototypes II
- The matrix application was tested by collection
of historical data only. - a testing function ran both of the two methods
with changes in different parameters such as
cache-size length, block and matrix sizes. - With different sets of parameters, this
application was run on two different
architectures -- ia32 process node and an Power4
process. - A fixed size set of parameters is presented in
the results. - The same methodology for profiling can be used
for this application.
14Initial Results I
15Initial Results II
- For FDTD serial application, the results
presented were obtained for one architecture. The
best computational kernel option for that
architecture can be chosen at run-time. - For the matrix-product application, the best
approach depended on the architecture for optimal
performance.
16Conclusions and Future Research
- For serial applications additional parameters
need to be added to increase complexity. - Parallel applications are currently being tested.
These will add different sets of parameters for
consideration. - From the insights for both serial and parallel
applications the schema design can be optimized
further before completion. - Finalize interface functions to load and store
performance data.
17(No Transcript)
18(No Transcript)