Title: Energy Profiling And Analysis Of The HPC Challenge Benchmarks
1Energy Profiling And Analysis Of The HPC
Challenge Benchmarks
Scalable Performance Laboratory Department
of Computer Science Virginia Tech
Shuaiwen Song, Hung-ching Chang, Rong Ge, Xizhou
Feng, Dong Li, and Kirk W. Cameron s562673_at_vt.edu
, hcchang_at_vt.edu, lid_at_cs.vt.edu,
rong.ge_at_marquette.edu, xizhou.feng_at_gmail.com,
cameron_at_vt.edu Also affiliated with Marquette
University.
- Spatio-temporal locality vs. Avg Power Use
- HPCC is designed to stress all the aspects of a
high-performance - system including CPU, memory, disk, and network.
We characterized - HPCC results based on data locality.
- Since lower temporal and spatial locality imply
higher average - memory access delay times, applications with
(low, low) - temporal-spatial locality use less power on
average. - Since higher temporal and spatial locality imply
lower average - memory access delay times, applications with
(high, high) - temporal-spatial locality use more power on
average. - Mixed temporal and spatial locality implies mixed
- results that fall between the average power
ranges of - (high, high) and (low, low) temporal-spatial
locality codes.
Energy Analysis of the HPC Challenge Benchmarks
System G and PowerPack 2.0
What makes System G so Green?
Key Findings (1) This work identifies power
profiles by system component and application
function level. (2) This work reveals the
correlation between spatio-temporal locality and
energy use for these benchmarks. (3) This work
explores the relationship between scalability and
energy use for high-end systems.
System G (Green) System G provides a research
platform for the development of high-performance
software tools and applications with extreme
efficiency at scale.
About the HPC Challenge Benchmarks HPC Challenge
(HPCC) benchmarks are specifically designed
to stress aspects of application and system
design ignored by NAS Benchmarks and LINPACK to
aid in system procurements and evaluations.
HPCC organizes the benchmarks into four
categories each category Represents a type of
memory access pattern characterized by the
Benchmarks memory access spatial and temporal
locality. We use a classification scheme to
separate performance phases that make up the HPCC
benchmark suites as shown in the table 1 Local
(single processor) 2. Star (Embarrassingly
parallel ) 3. Global (explicit parallel data
communications)
System G provides a research platform for the
development of high-performance software tools
and applications.
Results II Detailed Function-level Analysis
Results I Power Profiling and Analysis
Detailed power/energy/performance profiling and
analysis of various global benchmarks of HPCC
including scalability tests, parallel efficiency
and power-function mapping.
- Analysis
- 1) Each test in the benchmark suite stresses
processor and memory power relative to their use.
For example, as Global HPL and Star DGEMM have
high temporal and spatial locality, they spend
little time waiting on data and stress the
processor's floating point execution units
intensively consuming more processor power than
other tests. - Changes in processor and memory power profiles
correlate to communication to computation ratios.
Power varies for global tests such as PTRAN, HPL,
and MPI_FFT because of their computation and
communication phases. - Disk power and motherboard power are relatively
stable over all tests. - 4) Processors consume more power during GLOBAL
and STAR tests since they use all processor cores
in the computation. LOCAL tests use only one core
per node and thus consume less energy.
- Conclusions
- Each application has a unique power profile
characterized by power distribution among major
system components. - The power profiles of the HPCC benchmark suite
reveal power boundaries for real applications. - Energy efficiency is a critical issue in high
performance computing that requires further study
since the interactions between hardware and
application affect power usage dramatically.
- The PowerPack 2.0 Framework
- Components
- Hardware power/energy profiling
- Software power/energy profiling control
- Software system power/energy control
- Data collection/fusion/analysis
- System under test
- Main features
- a) Direct measurements of the power consumption
of a systems major components (i.e. CPU, Memory,
and disk, etc) and /or an entire computing unit. - Automatic logging of power profiles and
synchronization to application source code. - Scalable, fast, and accurate.
Detailed power profiles for four Global HPCC
benchmarks across eight computing nodes with 32
cores.
HPCC Power Profile of Full Benchmark Run The
power signatures of each application are unique.
In the figure below, power consumption is
separated by major computing components including
CPU, Memory, Disk and Motherboard. These four
components capture nearly all the dynamic power
usage of the system.
Detailed power-function mapping of MPI_FFT in
HPCC.
features
Energy Profiling and Efficiency Under Strong
Scaling and Weak Scaling of HPCC
analyze
profile
The figure above shows that parallel computation
changes the locality of data accesses and impacts
the major computing components power profiles
over the execution of the benchmarks.
A snapshot of the HPCC power profile. The entire
run of HPCC consists of seven micro benchmark
tests in the order as follows. 1. PTRANS, 2 HPL,
3. Star DGEMM single DGEMM, 4. Star STREAM, 5.
MPI_RandomAccess, 6. Star_RandomAccess, 7.
Single_RandomAccess, 8. MPI_FFT, Star_ FFT,
single FFT and latency/bandwidth.
PowerPack Framework
Strong Scaling
Weak Scaling
Portions of this work have appeared in the
following publications Shuaiwen Song, Rong Ge,
Xizhou Feng, Kirk W. Cameron, Energy Profiling
and Analysis of HPC Challenge Benchmarks,
International Journal of High Performance
Computing Applications, Vol. 23, No. 3, 265-276
(2009). Rong Ge, Xizhou Feng, Shuaiwen Song,
Hung-Ching Chang, Dong Li, Kirk W. Cameron,
"PowerPack Energy Profiling and Analysis of
High-Performance Systems and Applications," IEEE
Transactions on Parallel and Distributed Systems,
to appear (2009).
The authors would like to thank the National
Science Foundation for support of this work under
grants CCF 0848670, CNS 0720750, and CNS
0709025.