Is a Grid costeffective - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Is a Grid costeffective

Description:

Find and submit to best suited Grid ressource (not always optimum) ... Best price/performance ratio based on parameterisation . Availability of the ressources. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 22
Provided by: MarioR98
Category:

less

Transcript and Presenter's Notes

Title: Is a Grid costeffective


1
Is a Grid cost-effective?
Ralf Gruber, EPFL-SIC/FSTI-ISE-LIN, Lausanne
2
HPC in Europe
TOP500 176 in Europe, 12 have more than 1
Tflops/s Linpack First is CEA-DAM No. 7 Germany
71, UK 39, France 22, Italy 16, Others
28 Industry 108, first (Telecom I) at No.
96 BMW 11, Daimler-Chrysler 5, Car F 6 Not one
big, but many smaller machines HPC Companies
Quadrics Scali, SCI-based clusters No. 51 SCS
see Tonis presentation Beowulf production
Paralline, Dalco, ......
3
Swiss-Tx project
The Swiss-Tx machines (with TNet switch) 1998
Prototype Swiss-T0 with 16 Alphas 21164 1999
Swiss-T1 (Baby) with 16 Alphas 21264 2000
Swiss-T1 with 70 Alphas 21264 Know-how transfer
to industry 2001 GeneProt protein sequencing
machine with 1420 Alphas 21264 Peak
performance1780Gflop/s In June 2001, would have
been No. 12 in the Top500, 2nd in Europe and Was
world number 1 of industrial computer
installations Would be No. 48 (C-Plant) in the
Top500 list of November 2002 and Is still number
2 of industrial computer installations
4
Is a grid cost-effective?
NO!
Reasons Since 25 years, we can use machines all
over the world Those who needed good connections,
installed it (HEPNET, Swissprot, ..) Using Java
is against HPC
5
Parallel machines at EPFL and CSCS
EPFL-SIC SGI Origin3800 (500 MHz) 128
processors HP Alpha ES45/Quadrics (1.25 GHz) 100
processors Institutes PC clusters (CFD,
Chemistry, Mathematics, Physics) IBM SP-2
(EFD) CSCS NEC SX-5 (16 processors) IBM Regatta
(256 processors, 1.3 GHz)
6
Optimal grid scheduling
Parameterisation of . Single processor .
Cluster . Application Application tailored Grid
scheduling
7
Characteristic single processor parameters Va and
ra
Va Operations (Ops) / Memory accesses
(LS) Examples SAXPY y y a x Ops 2 LS
3 (2 loads 1 store) Va 2 / 3 Matrixmatrix
multiply and add Va n / 2 ra min (R , R
Va / Vm) min (R , M Va)
-gt ra 2/3 M
-gt ra R
8
Results with MATMULT Va 1 (double precision)
Vm R Mflop/s / M Mword/s
R Mflop/s Theoretical peak performance M
Mword/s Theoretical peak memory bandwidth
Machine P R raM VM r NEC
SX-5 1 8000 8000 1 Pentium 4 1.5/R 1 1500
400 4 229 57 Alpha 21264 2 2000 333
6 200 60 Pentium 4 1.7/S 1 1700 133 12
92 69 AMD 1.2/S 1 2400 133 18
57 43 r Performance mesurée 100r/ ra /S
Slow SDRAM memory /R Fast Rambus or RDRAM memory
9
Tailoring clusters to applications
G gt 1
10
Tailoring clusters to applications
G ga / gm
Application ga O / S Machine gm ra / b
O Number of operations in Flops S Number of
words sent in Words ra Theoretical peak
performance of application in Mflops/s b Peak
network bandwidth per processor in Mwords/s
11
Cluster characterisation
gm ra / b b C / P ltdgt
gm P ra Mflops/s ltdgt / C Mwords/s
Table The gm values for MATMULT
(double precision) Machine P Pra C
ltdgt gm Mflops/s Mwords/s T1 (TNet)
322 21333 640 1.25 40 T1 (Fast Ethernet)
322 21333 48 1 444 IELNX (P4FE) 22 8800
34 1 250
12
LAUTREC on Swiss-T1 TNet
Swiss-T1 (TNet) ra 1000 Mflops/s, b 10
Mwords/s gm 100 Water molecules ga
5P(0.65Norb4.24log2V) / 3(P-1) P8,
Norb128, log2V20 ga 330 G 3.3 (3.6
measured) -gt 25 of overall time is due to
communication 75 is due to computation
13
LAUTREC on Swiss-T1 Fast Ethernet
Swiss-T1 (FE) ra 2000 Mflops/s, b 1.5
Mwords/s gm 1333 Water molecules ga
5P(0.65Norb4.24log2V) / 3(P-1) P8,
Norb128, log2V20 ga 330 G 0.25 (0.25
measured) -gt 20 of overall time is due to
computation 80 is due to communication
14
LAUTREC Effect of latency
TNet/Swiss-T1 L13 ms MPI latency,
b80MB/s Break-even message length
bemlLb1000B Fast Ethernet L100 ms MPI
latency, b10MB/s Break-even message length
bemlLb1000B Average message length in
Lautrec aml pV/16P2 For test case (V963,
P8) aml40 kBgtgtbeml
15
Point-to-point applications
ga Operations (O) / Sends (S)
FE/FV O Nb of volume nodes O Nb of
variables per node square O Nb of non-zero
matrix elements O Nb of operations per matrix
element
FE/FV S Nb of surface nodes S Nb of
variables per node
FE/FV ga Nb of nodes in one direction ga Nb
of variables per node ga Nb of non-zero matrix
elements ga Nb of operations per matrix
element ga 1/Nb of surfaces
ga (NS/FV/1003) C 2000 ga (Poisson/FD/1003) C
400 Reminder (BeowulfFast Ethernet) gm C 250
16
Other quantities
Memory usage Price per 1h CPU time Engineering
salary Energy consumption Maintenance/servicing/pe
rsonnel costs User commodity
17
Optimal Grid scheduling
Goal Add an application tailored Grid scheduling
to RMS . Estimate machine and application
parameters by counts . Measure machine and
application parameters (PAPI, ...) . Build up a
data base on these parameters . Find and submit
to best suited Grid ressource (not always
optimum) . Update the data base dynamically .
Perform statistics on decisions and decision
failures
18
Optimal Grid scheduling
Settle and apply rules to find best suited
ressource by . Match machine/application (MPI
or not MPI) . Best price/performance ratio based
on parameterisation . Availability of the
ressources . Engineering costs . Energy
consumption
19
Optimal Grid scheduling
Perform statistics to . Detect too often
demanded unavailable ressources . Detect real
costs of an application . Detect applications
that should be parallelised/optimised to reduce
costs . Guide decision making for the next
purchase . Guide decision on RD money attribution
20
Is a grid cost-effective?
Yes, it can be!
Minimise overall costs by application adapted job
execution Purchase not available demanded
low-cost ressources Parallelise cost-ineffective
applications Reduce engineering and energy
costs Note Cheap ressources do not have to be
used up during 90 Results in More computing
ressources for the same price More rapid increase
of application efficiencies Questions Do
computer manufacturers play the game? Do
application owners play the game? Can we change
users, decision makers and computing centres?
21
Reference
R. Gruber, P. Volgers, A. de Vita, M. Stengel,
T.-M. Tran, Parameterisation to tailor commodity
clusters to applications, Future Generation
Computer Systems 19 (2003) 111-120 see
also http//sawww.epfl.ch/SIC/SA/publications/SC
R02/scr13e.html
Write a Comment
User Comments (0)
About PowerShow.com