Title: Metrics
1Metrics
- FLOPS (FLoating point Operations Per Sec) - a
measure of the numerical processing of a CPU
which can be an indicator of its scientific
computing capability. - The floating-point format is a variation of
scientific notation - the real number is
represented using a mantissa, base, and exponent - Storing real number in computers
- use the fixed length of word as the storage space
for a real number (e.g. 64bits) - Mantissa is normalised (1.61 is normalised, 16.1
is not) - The mantissa and exponents are converted to
base-2 - Some parts of the word are used to store the
mantissa, 1bit to store sign, and the rest to
store the exponent - Advantages and disadvantages
- Using a fixed-length space to store a wide
overall range of values - If 64 bits are used to store the real numbers,
in which 11 bits are used to store exponent and
52 bits to mantissa (the remaining 1 bit used to
store sign). We can derive the range of numbers
this storage layout can represent - More bits are used to store mantissa, higher
precision, but smaller range - More bits are used to store exponent, wider
range, but lower precision - The difference between two successive numbers is
not uniform - When the numbers cannot be perfected converted to
base-2 numbers, they must be rounded to be stored
in the format, leading to some problems where
algebraic rules do not appear to apply - The LINPACK benchmark produces a FLOPS results.
This solves a dense system of linear equations by
Gaussian elimination.
2Example of Floating Point Numbers
- 172.625 base 10
- 10101100.101 X 20 base 2
- 1.0101100101 X 27 base 2 normalised
- Using 32 bit (4 bytes) to store the number in
computers, in which 1 bit for sign, 8 bits for
exponent, and the rest for Mantissa - 0 00000111 00000000000010101100101
- S Exp Mantissa
3Metrics
- MIPS (Millions of Instructions Per Second) - a
measure of the speed of a processor. - Peak MIPS rates (usually vendor supplied) can be
misrepresentative - Meaningless Information on Performance for
Salespeople - People seldom refer to it
4Metrics
- SPECint - measures a processors integer
processing capabilities. - Latest version SPECint2006
- Can test cpu, memory, compiler, but cannot test
networking, I/O - Consists of a series of benchmarks (12, including
compression, compilation) - each benchmark has a reference time
- Dividing the measured runtime of the benchmark by
the reference time and multiplying by 100
provides a base ratio - For example, if we run the benchmark 401.bzip2 to
test the system, whose reference time is 1400.
The actual runtime of the benchmark is 140 sec.
then the base ratio is calculated as
1400/1401001000 - These are averaged to produce a final performance
figure for the processor.
5SPECint2006 benchmark suite
Benchmark Language Category
400.perlbench C Programming Language
401.bzip2 C Compression
403.gcc C C Compiler
429.mcf C Combinatorial Optimization
445.gobmk C Artificial Intelligence
456.hmmer C Search Gene Sequence
458.sjeng C Artificial Intelligence
462.libquantum C Physics / Quantum Computing
464.h264ref C Video Compression
471.omnetpp C Discrete Event Simulation
473.astar C Path-finding Algorithms
483.xalancbmk C XML Processing
6Metrics
- Communication
- Bandwidth (bytes/sec)
- How much data can be sent per second over the
network - Latency (seconds)
- The time between one processor sending a message
and the other processor receiving the message - Interconnection type On-board interconnection or
over networks. - Topologies bus, crossbar, hub, switch
- Protocols stacks
- unicast, multicast, broadcast.
- Storage capabilities
- Storage facilities register, cache, memory, hard
disk - Bandwidth and Latency.
- Bandwidth how much data can be accessed per
second in a certain storage facility - Latency the time between sending a data
accessing request and receiving the requested
data - Memory hierarchies (cpu register-gt cache -gt main
memory -gt remote memory) - Local, remote file systems
7Top500 Supercomputer list
- Website www.top500.org
- Top500 project Started in 1993, updated twice a
year - Aiming to track the trend in HPC
- Using LINPACK to measure the performance (FLOPS)
- Essentially, LINPACK is to solve the dense system
of linear equations Axb (commonly encountered in
engineering area) - Users are allowed to change the problem size to
get the maximum performance, which is used to
rank the supercomputers - Theoretical peak performance is also given for
reference
8Top500 Supercomputer list
- Tends to represent parallel computers, so
distributed systems such as SETI_at_Home are
neglected. - Does not consider storage or I/O issues
- Both custom designed machines and commodity
machines win positions in the list - General trend towards commodity machines (COTS -
Commodity Off-The-Shelf). BlueGene/L, however, is
not a COTS machine - Connecting a large number of machines with
relatively lower performance is more rewarding
than connecting a small number of machines each
with high performance - Read the paper A note on the Zipf distribution
of Top500 supercomputers (download from my
homepage) - Performance doubles each year, better than
Moores Law. - Moores Law performance doubles approximately
every 18 months - Dominated by the United States (location map of
the Top100 machines http//www.top500.org/lists/2
006/11/top100map) - UK supercomputers in the list
- Cambridge No.20 (http//www.top500.org/system/826
7 ), - AWE No. 15
9Top Machine
- BlueGene/L
- first supercomputer in the Blue Gene project
- Specialised systems based on the Power
architecture. - Individual power 400 processors at 700Mhz
- Two processors reside in a single chip.
- Two chips reside on a compute card with 512MB
memory. - 16 of these compute cards are placed on a node
board. - 32 node boards fit into one cabinet, and there
are 64 cabinets. - 130,712 CPUs with theoretical peak of 183.5
TFLOPS/s - Multiple network topologies available, which can
be selected depending on the application. - High density of processors in a small area
- Low power and (comparatively) slow processors -
just lots of them! - Fast interconnects and low-latency.