Supercomputer Benchmarking - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Supercomputer Benchmarking

Description:

... of benchmark suites used to study supercomputer performance has varied widely over the years. ... metric = (86400 seconds) / (elapsed time of benchmark in seconds) ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 19
Provided by: IBMU597
Category:

less

Transcript and Presenter's Notes

Title: Supercomputer Benchmarking


1
Supercomputer Benchmarking
  • By John Dorfner, Wesley Jones, and Eric Ng

Cray-1
CDC 1604
Origin 2000
RS/6000 SP
2
Overview
  • Definition of Benchmark
  • Introduction to Benchmark Suites
  • SPEChpc96 Suite
  • Livermore Loops
  • The Linpack Benchmark
  • The Top 8 Supercomputers
  • HPC Challenge Benchmark
  • Cray 1-A vs. IBM Cluster 1600
  • inside the IBM Cluster 1600
  • Conclusion

3

Benchmark def.
  • A measurement or standard that serves as a point
    of reference by which process performance is
    measured. Benchmarking is a structured approach
    for identifying the best practices from industry
    and government, and comparing and adapting them
    to the organization's operations. Such an
    approach is aimed at identifying more efficient
    and effective processes for achieving intended
    results, and suggesting ambitious goals for
    program output, product/service quality, and
    process improvement.
  • www.ichnet.org

4
Supercomputer Benchmarking
  • The number and type of benchmark suites used to
    study supercomputer performance has varied widely
    over the years. In early studies, an ad hoc
    collection of programs was typically used to
    measure the performance of a given system
    relative to a known performance benchmark.
    Eventually, this practice evolved into groups of
    programs explicitly designed as supercomputer
    benchmark suites. The most widely used benchmarks
    for performance on supercomputing clusters are
    the SPEChpc96 suite the Livermore Loops and for
    scientific machines, the Linpack Kernels.
  • Some general examples of individual computer
    benchmarks
  • Dhrystone - Integer benchmark for UNIX systems
  • Whetstone - Floating point benchmark for
    minicomputers
  • I/O benchmarks
  • MIPS
  • Synthetic benchmarks
  • Kernel benchmarks
  • SPECint / SPECfp
  • Summarizing

5
SPEChpc96 Suite
  • In 1995, the Standard Performance Evaluation
    Corp. (SPEC) announced the release of SPEChpc96,
    the first standard benchmark suite specifically
    designed for measuring high-performance
    computing. SPEChpc96 was developed by SPEC's
    High Performance Group (HPG), which includes
    several leading high-performance computer
    vendors, systems integrators, and major
    universities and research institutes.
  • SPEChpc96 allows users and vendors of high-end
    computers to make objective performance
    comparisons across different hardware platforms.
  • Specific scientific and industrial applications
    are represented within the SPEChpc96 benchamrk
    suite.
  • The first two SPEChpc96 benchmarks are
  • SPECseis96, a seismic processing application
  • SPECchem96, a computational chemistry
    application
  • Since SPECseis96 and SPECchem96 can be run in
    both serial and parallel modes, the SPEChpc96
    suite can be used for general performance
    comparisons over a broad range of
    high-performance computing systems. This list
    includes multiprocessor systems, workstation
    clusters, distributed memory parallel systems,
    and traditional vector and vector parallel
    supercomputers.

6
SPEChpc96 Suite Metrics
  • The SPECseis96 and SPECchem96 suites each
    generate four metrics. Each program represents a
    different problem size and is used to
    characterize the scalability of the application
    as well as the entire system.
  • The SPEChpc96 metrics are as follows
  • SPECseis96_SM
  • SPECseis96_MD
  • SPECseis96_LG
  • SPECseis96_XL
  • SPECchem96_SM
  • SPECchem96_MD
  • SPECchem96_LG
  • SPECchem96_XL.
  • The metrics are unitless. They are derived as
    follows
  • metric (86400 seconds) / (elapsed time of
    benchmark in seconds)
  • Since these benchmarks are both compute-intensive
    and data-intensive, the above metrics are used to
    reflect the performance of the entire system.
    This includes the processors, memory access, I/O
    bandwidth, interconnect topology, etc. For
    example, the SPECseis96_XL requires processing of
    100GB of data.

7
Livermore Loops
  • Livermore Loops is a set of kernels consisting of
    loops from real Fortran programs.
  • Introduced in 1970, this supercomputer benchmark
    was initially comprised of 14 kernels of
    numerically intensive applications written in
    Fortran. The number of kernels was increased to
    24 in the 1980's. Performance measurements are
    taken in units of Millions of Floating Point
    Operations Per Second or MFLOPS. The program
    also evaluates the results for computational
    accuracy. A main aim of the Livermore design was
    to avoid producing single number performance
    comparisons. The 24 kernels can be executed
    three times each at a range of do-loop spans to
    produce short, medium and long vector performance
    measurements. In this mode, if overall averages
    are quoted, the geometric mean may be interpreted
    as a characteristic rate of computation for the
    suite. However, it is more realistic to retain
    the range of statistics in terms of geometric,
    harmonic and arithmetic means, minimum and
    maximum.

8
Livermore Loops Kernels
  • Kernel 1 an excerpt from a hydrodynamic code.
  • Kernel 2 an excerpt from an Incomplete
    Cholesky-Conjugate Gradient code.
  • Kernel 3 the standard Inner Product function of
    linear algebra.
  • Kernel 4 an excerpt from a Banded Linear
    Equations routine.
  • Kernel 5 an excerpt from a Tridiagonal
    Elimination routine.
  • Kernel 6 an example of a general linear
    recurrence equation.
  • Kernel 7 an Equation of State fragment.
  • Kernel 8 an excerpt of an Alternating Direction,
    Implicit Integration code.
  • Kernel 9 an Integrate Predictor code.
  • Kernel 10 a Difference Predictor code.
  • Kernel 11 a First Sum.
  • Kernel 12 a First Difference.
  • Kernel 13 an excerpt from a 2-D Particle-in-Cell
    code.
  • Kernel 14 an excerpt of a 1-D Particle-in-Cell
    code.
  • Kernel 15 a sample of how casually FORTRAN can
    be written.
  • Kernel 16 a search loop from a Monte Carlo code.
  • Kernel 17 an example of an implicit conditional
    computation.
  • Kernel 18 an excerpt from a 2-D Explicit
    Hydrodynamic code.
  • Kernel 19 a general Linear Recurrence Equation.

9
Livermore Loops Kernel Output
  • THE LIVERMORE FORTRAN KERNELS SUMMARY
  • Computer CRAY-YMP C90 (240 MHz)
  • System UNICOS 7.C, loaded
  • Compiler CFT77 5.0.1.17
  • Date 92.02.18
  • Testor Charles Grassl, CRI
  • MFLOPS RANGE REPORT ALL RANGE STATISTICS
  • Mean DO Span 167
  • Code Samples 72
  • Maximum Rate 826.0859 Mega-Flops/Sec.
  • Average Rate 190.5636 Mega-Flops/Sec.
  • GEOMETRIC MEAN 86.2649 Mega-Flops/Sec.
  • Median Q2 83.5138 Mega-Flops/Sec.
  • Harmonic Mean 40.7302 Mega-Flops/Sec.
  • Minimum Rate 6.7925 Mega-Flops/Sec.
  • Mean Precision 11.07 Decimal Digits
    ltltltltltltltltltltltltltltltltltltltltltltltltltltltgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt
    gtgtgtgtgt

10
The Linpack Benchmark
  • The Linpack Benchmark measures a computers
    floating-point rate of execution, Mflop/s, by
    running a mathematics application that solves a
    dense system of linear equations. Over the
    years, the characteristics of the benchmark have
    changed. Today, in fact, there are three
    benchmarks included in the Linpack Benchmark
    report.
  •  
  • The Linpack Benchmark grew out of the Linpack
    software project. It was originally intended to
    give end-users an indication of length of time it
    would take to solve certain matrix problems.
  • The three benchmarks in the Linpack Benchmark
    report are
  • Linpack Fortran n 100 benchmark
  • Linpack n 1000 benchmark
  • Linpacks Highly Parallel Computing benchmark
  • Mflop/s, millions of floating point operations
    per second, execution rate refers to 64-bit
    floating-point operations of either addition or
    multiplication. Gflop/s are billions of
    floating-point operations per second and Tflop/s
    are trillions of floating-point operations per
    second.

11
Linpack Performance Example
  • Measured Gflop/s Peak rate of execution in
    billions of floating point operations per second.
  • Size of Problem The matrix size at which the
    measured performance was observed.
  • Size of ½ Perf The size of problem needed to
    achieve ½ the measured peak performance.
  • Theoretical Peak Gflop/s The theoretical peak
    performance for the computer.

12
The Top 8 Supercomputers
13
The Top 8 Supercomputers
Table Key
  • Rank Position within the TOP500 ranking
  • Manufacturer Manufacturer or vendor
  • Computer Model type indicated by manufacturer
    or vendor
  • Installation Site Customer
  • Location Location and country
  • Year Year of installation/last major update
  • Installation Area Field of Application
  • Processors Number of processors
  • Rmax Maximum LINPACK performance achieved
  • Rpeak Theoretical peak performance
  • Nmax Problem size for achieving Rmax
  • N1/2 Problem size for achieving half of Rmax

14
HPC Challenge Benchmark
  • A Group of 20 top researchers has initiated a
    program to redefine the benchmarks used to
    measure high-performance systems under the
    direction of the High Productivity Computing
    Systems program under the Defense Advanced
    Research Projects Agency (DARPA). It is designed
    to broaden the Linpack benchmark of raw
    floating-point operations/second (flops). They
    have established a target date of 2006 to release
    new a benchmark.
  • The HPC Challenge benchmark consists of 5
    hardware performance metrics
  • HPL - the Linpack TPP benchmark which measures
    the floating point rate of execution for solving
    a linear system of equations
  • STREAM - a simple synthetic benchmark program
    that measures sustainable memory bandwidth (in
    GB/s) and the corresponding computation rate for
    simple vector kernels
  • RandomAccess - measures the rate of integer
    random updates of memory
  • PTRANS (parallel matrix transpose) - exercises
    the communications where pairs of processors
    communicate with each other simultaneously. It
    is a useful test of the total communications
    capacity of the network
  • b_eff (effective bandwidth benchmark) - a set of
    tests to measure latency and bandwidth of a
    number of simultaneous communication patterns

15
Cray 1-A
vs.
IBM Cluster 1600
1978
2002
16
Inside the IBM 1600 cluster
The diagram above shows a schematic view of the
two-cluster configuration
The diagram above shows the configuration of a
single cluster
17
Conclusion
  • Benchmarking refers to a measurement standard
    that serves as a point of reference by which
    process performance is measured
  • Three of the more popular suites for
    benchmarking supercomputers are the SPEChpc96
    suite, the Livermore Loops, and for scientific
    machines, the Linpack Kernels
  • The performance ratios, for important HPC
    features, between supercomputers of the past and
    those used today, is vastly different
  • As the High Performance Computing industry
    grows, the benchmarks used upon supercomputers
    must also grow in order to provide a yard stick
    by which these systems can be measured

18
For more information
  • www.top500.org
  • www.spec.org/hpg
  • www.llnl.gov
  • www.ecmwf.int/services/computing/overview/superco
    mputer_history.html
  • www.microsoft.com/windows2000/hpc/
  • www.ibm.com
  • www.sgi.com
  • www.hp.com
  • www.cray.com
Write a Comment
User Comments (0)
About PowerShow.com