Allen D. Malony, Sameer Shende - PowerPoint PPT Presentation

About This Presentation
Title:

Allen D. Malony, Sameer Shende

Description:

in-core, off-machine, data transfers for post-processing ... Disk I/O and thumbnail. pictures. Simulation (15 proc) Post-processing (3 proc) Thumbnails, I/O ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 34
Provided by: allend7
Category:

less

Transcript and Presenter's Notes

Title: Allen D. Malony, Sameer Shende


1
Performance Engineering Technologyfor Complex
Scientific Component Software
  • Allen D. Malony, Sameer Shende
  • malony,sameer_at_cs.uoregon.edu
  • Department of Computer and Information Science
  • Computational Science Institute
  • University of Oregon

2
Outline
  • Overview of the TAU project
  • Performance Engineered Component Software
  • CCA Performance Observation Component
  • CCAFFEINE (Classic C)
  • SIDL
  • Applications (SC02 Demos)
  • Optimizer Component Craig Rasmussen, Matt
    Sottile
  • Combustion Component Jaideep Ray
  • Concluding remarks

3
TAU Performance System Framework
  • Tuning and Analysis Utilities
  • Performance system framework for scalable
    parallel and distributed high-performance
    computing
  • Targets a general complex system computation
    model
  • nodes / contexts / threads
  • Multi-level system / software / parallelism
  • Measurement and analysis abstraction
  • Integrated toolkit for performance
    instrumentation, measurement, analysis, and
    visualization
  • Portable, configurable performance
    profiling/tracing facility
  • Open software approach
  • University of Oregon, LANL, FZJ Germany
  • http//www.cs.uoregon.edu/research/paracomp/tau

4
General Complex System Computation Model
  • Node physically distinct shared memory machine
  • Message passing node interconnection network
  • Context distinct virtual memory space within
    node
  • Thread execution threads (user/system) in context

Interconnection Network
Inter-node messagecommunication


Node
Node
Node
node memory
memory
memory
SMP
physicalview
VM space

modelview

Context
Threads
5
TAU Performance System Architecture
Paraver
EPILOG
6
TAU Status
  • Instrumentation supported
  • Source, preprocessor, compiler, MPI, runtime,
    virtual machine
  • Languages supported
  • C, C, F90, Java, Python
  • HPF, ZPL, HPC, pC...
  • Packages supported
  • PAPI UTK, PCL FZJ (hardware performance
    counter access),
  • Opari, PDT UO,LANL,FZJ, DyninstAPI U.Maryland
    (instrumentation),
  • EXPERT, EPILOGFZJ,VampirPallas, Paraver
    CEPBA (visualization)
  • Platforms supported
  • IBM SP, SGI Origin, Sun, HP Superdome, HP-Compaq
    ES,
  • Linux clusters (IA-32, IA-64, PowerPC, Alpha),
    Apple OS X, Windows,
  • Hitachi SR8000, NEC SX, Cray T3E ...
  • Compilers suites supported
  • GNU, Intel KAI (KCC, KAP/Pro), Intel, SGI, IBM,
    Compaq,HP, Fujitsu, Hitachi, Sun, Apple,
    Microsoft, NEC, Cray, PGI, Absoft,
  • Thread libraries supported
  • Pthreads, SGI sproc, OpenMP, Windows, Java, SMARTS

7
Program Database Toolkit
8
Program Database Toolkit (PDT)
  • Program code analysis framework for developing
    source-based tools for C99, C and F90
  • High-level interface to source code information
  • Widely portable
  • IBM, SGI, Compaq, HP, Sun, Linux
    clusters,Windows, Apple, Hitachi, Cray T3E...
  • Integrated toolkit for source code parsing,
    database creation, and database query
  • commercial grade front end parsers (EDG for
    C99/C, Mutek for F90)
  • Intel/KAI C headers for std. C library
    distributed with PDT
  • portable IL analyzer, database format, and access
    API
  • open software approach for tool development
  • Target and integrate multiple source languages
  • Used in CCA for automated generation of SIDL
  • Used in TAU to build automated performance
    instrumentation tools (tau_instrumentor)
  • Used in CHASM, XMLGEN, Component method signature
    extraction,

9
Performance Database Framework
Raw performance data
Performance analysis programs
Performance analysis and query toolkit
PerfDML data description
PerfDML translators
ORDB
PostgreSQL
  • XML profile data representation
  • Multiple experiment performance database

. . .
10
TAUs Runtime Monitor
TAU uses SCIRun U. Utah for visualization of
performance data (online/offline)
11
Performance-Engineered Component Software
  • Intra- and Inter-component performance
    engineering
  • Four general parts
  • Performance observation
  • integrated measurement and analysis
  • Performance query and monitoring
  • runtime access to performance information
  • Performance control
  • mechanisms to alter performance observation
  • Performance knowledge
  • characterization and modeling
  • Consistent with component architecture /
    implementation

12
Main Idea Extend Component Design
  • Extend the programming and execution environment
    to be performance observable and performance aware

repository service ports
performance observation ports
performance knowledge ports
componentports



PerformanceKnowledge
PerformanceObservation
Component Core

Component Performance Repository
variants
? measurement ? analysis
? empirical ? analytical
13
Performance Observation and Component
  • Performance measurementintegration in component
    form
  • Functional extension of originalcomponent design
    ( )
  • Include new componentmethods and ports ( ) for
    othercomponents to access measuredperformance
    data
  • Allow original component to access performance
    data
  • Encapsulate as tightly-couple and co-resident
    performance observation object
  • POC provides port allow use of optimized
    interfaces ( )to access internal''
    performance observations

performance observation ports
componentports


PerformanceObservation
Component Core

variants
? measurement ? analysis
14
Performance Knowledge
  • Describe and store known component performance
  • Benchmark characterizations in performance
    database
  • Empirical or analytical performance models
  • Saved information about component performance
  • Use for performance-guided selection and
    deployment
  • Use for runtime adaptation
  • Representation must be in common forms with
    standard means for accessing the performance
    information
  • Compatible with component architecture

15
Component Performance Repository
  • Performance knowledge storage
  • Implement in componentarchitecture framework
  • Similar to CCA componentrepository
  • Access by componentinfrastructure
  • View performance knowledge as component (PKC)
  • PKC ports give access to performance knowledge
  • to other components, back to original
    component
  • Static/dynamic component control and composition
  • Component composition performance knowledge

repository service ports
performance knowledge ports

PerformanceKnowledge
Component Performance Repository
? empirical ? analytical
16
Component Composition Performance
  • Performance of component-based scientific
    applications depends on interplay of component
    functions and the computational resources
    available
  • Management of component compositions throughout
    execution is critical to successful deployment
    and use
  • Identify key technological capabilities needed to
    support the performance engineering of component
    compositions
  • Two model concepts
  • performance awareness
  • performance attention

17
Performance Awareness of Component Ensembles
  • Composition performance knowledge and observation
  • Composition performance knowledge
  • Can come from empirical and analytical evaluation
  • Can utilize information provided at the component
    level
  • Can be stored in repositories for future review
  • Extends the notion of component observation to
    ensemble-level performance monitoring
  • Associate monitoring components to component
    grouping
  • Build upon component-level observation support
  • Performance integrators and routers
  • Use component framework mechanisms

18
Performance Engineering Support in CCA
  • Define a standard observation component interface
    for
  • Performance measurement
  • Performance data query
  • Performance control (enable/disable)
  • Implement performance interfaces for use in CCA
  • TAU performance system
  • CCA component frameworks (CCAFFEINE, SIDL/Babel)
  • Demonstrations
  • Optimizing component
  • picks from a set of equivalent CCA port
    implementations
  • Flame reaction-diffusion application

19
CCA Performance Observation Component
  • Design measurement port and measurement
    interfaces
  • Timer
  • start/stop
  • set name/type/group
  • Control
  • enable/disable groups
  • Query
  • get timer names
  • metrics, counters, dump to disk
  • Event
  • user-defined events

20
CCA C (CCAFFEINE) Performance Interface
namespace performance namespace ccaports
class Measurement public virtual
classicgovccaPort public virtual
Measurement () / Create a Timer
interface / virtual performanceTimer
createTimer(void) 0 virtual
performanceTimer createTimer(string name) 0
virtual performanceTimer
createTimer(string name, string type) 0
virtual performanceTimer createTimer(string
name, string type, string group) 0 /
Create a Query interface / virtual
performanceQuery createQuery(void) 0
/ Create a user-defined Event interface /
virtual performanceEvent createEvent(void)
0 virtual performanceEvent
createEvent(string name) 0 / Create a
Control interface for selectively enabling and
disabling the instrumentation based on
groups / virtual performanceControl
createControl(void) 0
Measurement port
Measurement interfaces
21
CCA Timer Interface Declaration
namespace performance class Timer public
virtual Timer() / Implement methods
in a derived class to provide functionality /
/ Start and stop the Timer / virtual void
start(void) 0 virtual void stop(void)
0 / Set name and type for Timer /
virtual void setName(string name) 0 virtual
string getName(void) 0 virtual void
setType(string name) 0 virtual string
getType(void) 0 / Set the group name and
group type associated with the Timer / virtual
void setGroupName(string name) 0 virtual
string getGroupName(void) 0 virtual void
setGroupId(unsigned long group ) 0 virtual
unsigned long getGroupId(void) 0
Timer interface methods
22
Use of Observation Component in CCA Example
include "ports/Measurement_CCA.h"... double
MonteCarloIntegratorintegrate(double lowBound,
double upBound,
int count) classicgovccaPort
port double sum 0.0 // Get Measurement
port port frameworkServices-gtgetPort
("MeasurementPort") if (port)
measurement_m dynamic_cast lt performanceccapor
tsMeasurement gt(port) if (measurement_m
0) cerr ltlt "Connected to something other
than a Measurement port" return -1
static performanceTimer t measurement_m-gtcrea
teTimer( string("IntegrateTimer"))
t-gtstart() for (int i 0 i lt count i)
double x random_m-gtgetRandomNumber ()
sum sum function_m-gtevaluate (x)
t-gtstop()
23
Measurement Port Implementation
  • Use of Measurement port (i.e., instrumentation)
  • independent of choice of measurement tool
  • independent of choice of measurement type
  • TAU performance observability component
  • Implements the Measurement port
  • Implements Timer, Control, Query, Control
  • Port can be registered with the CCAFEINE
    framework
  • Components instrument to generic Measurement port
  • Runtime selection of TAU component during
    execution
  • TauMeasurement_CCA port implementation uses a
    specific TAU library for choice of measurement
    type

24
Whats Going On Here?
Two instrumentationpaths using TAU API
Two query and controlpaths using TAU API
25
SIDL Interface for Performance Component
version performance 1.0package performance
interface Timer / Start/stop the Timer /
void start() void stop() /
Set/get the Timer name / void setName(in
string name) string getName() /
Set/get Timer type information (e.g., signature
of the routine) / void setType(in string
name) string getType() / Set/get the
group name associated with the Timer / void
setGroupName(in string name) string
getGroupName() / Set/get the group id
associated with the Timer / void
setGroupId(in long group) long
getGroupId()
26
Simple Runtime Performance Optimization
  • Components are plug-and-play
  • One can choose from a set of equivalent port
    implementations based on performance measurements
  • An outside agent can monitor and select an
    optimal working set of components

FunctionPort

FunctionPort
IntegratorPort
NonlinearFunction

FunctionPort
MidpointIntegrator
IntegratorPort
GoPort
LinearFunction

FunctionPort
Driver
FunctionPort
IntegratorPort
PiFunction
RandomGeneratorPort

RandomGeneratorPort
MonteCarloIntegrator
RandomGenerator
27
Component Optimizing Performance Results
28
Computational Facility for Reacting Flow Science
  • Sandia National Laboratory
  • DOE SciDAC project (http//cfrfs.ca.sandia.gov)
  • Jaideep Ray
  • Component-based simulation and analysis
  • Sandias CCAFFEINE framework
  • Toolkit components for assembling flame
    simulation
  • integrator, spatial discretizations,
    chemical/transport models
  • structured adaptive mesh, load-balancers,
    error-estimators
  • in-core, off-machine, data transfers for
    post-processing
  • Components are C and wrapped F77 and C code
  • Kernel for 3D, adaptive mesh low Mach flame
    simulation

29
Simulation System Architecture
Combustion Components
Post-processing Subsystem
Driver
Driver
Disk I/O and thumbnailpictures
3x1
15x3
MxN
MxN
MxN
Post-processing (3 proc)
Thumbnails, I/O
Simulation (15 proc)
  • Three partitions 15-proc, 3-proc, 1-proc
  • In-core, off-machine data transfer
  • MxN transfer component (CUMULVS, ORNL, Kohl)

30
Flame Reaction-Diffusion Demonstration
CCAFFEINE
31
Meeting CCA Performance Engineering Goals?
  • Language interoperability?
  • SIDL and Babel give access to all supported
    languages
  • TAU supports multi-language instrumentation
  • Component interface instrumentation automated
    with PDT
  • Platform interoperability?
  • Implement observability component across
    platforms
  • TAU runs wherever CCA runs
  • Execution model transparent?
  • TAU measurement support for multiple execution
    models
  • Reuse with any CCA-compliant framework?
  • Demonstrated with SIDL/Babel, CCAFEINE, SCIRun

32
Meeting CCA Performance Engineering Goals?
  • Component performance knowledge?
  • Representation and performance repository work to
    do
  • Utilize effectively for deployment and steering
  • Build repository with TAU performance database
  • Performance of component compositions?
  • Component-to-component performance
  • Per connection instrumentation and measurement
  • Utilize performance mapping support
  • Ensemble-wide performance monitoring
  • connect performance producers to consumers
  • component-style implementation

33
Concluding Remarks
  • Complex component systems pose challenging
    performance analysis problems that require robust
    methodologies and tools
  • New performance problems will arise
  • Instrumentation and measurement
  • Data analysis and presentation
  • Diagnosis and tuning
  • Performance modeling
  • Performance engineered components
  • Performance knowledge, observation, query and
    control

34
Support Acknowledgement
  • TAU and PDT support
  • Department of Energy (DOE)
  • DOE 2000 ACTS contract
  • DOE MICS contract
  • DOE ASCI Level 3 (LANL, LLNL)
  • U. of Utah DOE ASCI Level 1 subcontract
  • DARPA
  • NSF National Young Investigator (NYI) award
Write a Comment
User Comments (0)
About PowerShow.com