Title: Prsentation PowerPoint
1www.grid5000.fr
One of the 30 ACI Grid projects
Grid5000
Motivations, design and Status of the Grid5000
Computer Science Grid
Franck Cappello Senior Researcher INRIA Director
of Grid5000 Email fci_at_lri.fr
5000 CPUs
2Agenda
Motivation Grid5000 design Grid5000
Status Early results Conclusion
3ACI GRID projects
- Peer-to-Peer
- CGP2P (F. Cappello, LRI/CNRS)
- Application Service Provider
- ASP (F. Desprez, ENS Lyon/INRIA)
- Algorithms
- TAG (S. Genaud, LSIIT)
- ANCG (N. Emad, PRISM)
- DOC-G (V-D. Cung, UVSQ)
- Compiler techniques
- Métacompil (G-A. Silbert, ENMP)
- Networks and communication
- RESAM (C. Pham, ENS Lyon)
- ALTA (C. Pérez, IRISA/INRIA)
- Visualisation
- EPSN (O. Coulaud, INRIA)
- Data management
- PADOUE (A. Doucet, LIP6)
- MEDIAGRID (C. Collet, IMAG)
- Tools
- Code coupling
- RMI (C. Pérez, IRISA)
- CONCERTO (Y. Maheo, VALORIA)
- CARAML (G. Hains, LIFO)
- Applications
- COUMEHY (C. Messager, LTHE) - Climate
- GenoGrid (D. Lavenier, IRISA) - Bioinformatics
- GeoGrid (J-C. Paul, LORIA) - Oil reservoir
- IDHA (F. Genova, CDAS) - Astronomy
- Guirlande-fr (L. Romary, LORIA) - Language
- GriPPS (C. Blanchet, IBCP) -Bioinformatics
- HydroGrid (M. Kern, INRIA) - Environment
- Medigrid (J. Montagnat, INSA-Lyon) - Medical
- Grid Testbeds
- CiGri-CIMENT (L. Desbat, UjF)
- Mecagrid (H. Guillard, INRIA)
- GLOP (V. Breton, IN2P3)
- GRID5000 (F. Cappello, INRIA)
- Support for disseminations
4Needs for urgent situations!
5Existing Grid Research Tools
France
- SimGRid and SimGrid2
- Discrete event simulation with trace injection
- Originally dedicated to scheduling studies
- Single user, multiple servers
USA
Australia
Japan
- GridSim
- Dedicated to scheduling (with deadline), DES
(Java) - Multi-clients, Multi-brokers, Multi-servers
- Titech Bricks
- Discrete event simulation for scheduling and
replication studies - GangSim
- Scheduling inside and between VOs
- MicroGrid,
- Emulator, Dedicated to Globus, Virtualizes
resources and time, Network (MaSSf)
- Nowhere to test networking/OS/middleware ideas,
to measure real application performance, - Simulation and Emulation are quite slow.
6We need experimental tools
In the first ½ of 2003, the design and
development of an experimental platform for Grid
researchers was decided
- Grid5000 as a real life system
log(cost coordination)
This talk
Grid5000 DAS PlanetLab GENI
Major challenge
Data Grid eXplorer WANinLab Emulab
RAMP
Dave Pattersons Project on Muticore Multi-process
or emulator
Challenging
SimGrid MicroGrid Bricks NS, etc.
Reasonable
Model Protocol proof
log(realism)
emulation
math
simulation
live systems
7The Grid5000 Project
- Building a nation wide experimental platform for
- Large scale Grid P2P experiments
- 9 geographically distributed sites
- every site hosts a cluster (from 256 CPUs to 1K
CPUs) - All sites are connected by RENATER (French Res.
and Edu. Net.) - RENATER hosts probes to trace network load
conditions - Design and develop a system/middleware
environment - for safely test and repeat experiments
- 2) Use the platform for Grid experiments in real
life conditions - Port and test applications, develop new
algorithms - Address critical issues of Grid
system/middleware - Programming, Scalability, Fault Tolerance,
Scheduling - Address critical issues of Grid Networking
- High performance transport protocols, Qos
- Investigate original mechanisms
- P2P resources discovery, Desktop Grids
8Planning
Today
June 2003
2005
2007
2006
2004
5000
Discussions Prototypes
Installations Clusters Net
3500
Preparation Calibration
2500
Experiments
2000
International collaborations CoreGrid
2300
1250 CPUs
Processors
First Experiments
Funded
9Agenda
Motivation Grid5000 design Grid5000
Status Early results Conclusion
10Grid5000 foundationsCollection of experiments
to be done
- Applications
- Multi-parametric applications (Climate
modeling/Functional Genomic) - Large scale experimentation of distributed
applications (Electromagnetism, multi-material
fluid mechanics, parallel optimization
algorithms, CFD, astrophysics - Medical images, Collaborating tools in virtual 3D
environment - Programming
- Component programming for the Grid (Java, Corba)
- GRID-RPC
- GRID-MPI
- Code Coupling
- Middleware / OS
- Scheduling / data distribution in Grid
- Fault tolerance in Grid
- Resource management
- Grid SSI OS and Grid I/O
- Desktop Grid/P2P systems
- Networking
- End host communication layer (interference with
local communications) - High performance long distance protocols
(improved TCP) - High Speed Network Emulation
https//www.grid5000.fr/index.php/Grid5000Experim
ents
Allow experiments at any level of the software
stack
11Grid5000 foundationsMeasurements and condition
injection
- Quantitative metrics
- Performance Execution time, throughput,
overhead, QoS (Batch, interactive, soft real
time, real time). - ScalabilityResource occupation (CPU, memory,
disc, network), Applications algorithms, Number
of users, Number of resources. - Fault-toleranceTolerance to very frequent
failures (volatility), tolerance to massive
failures (a large fraction of the system
disconnects), Fault tolerance consistency across
the software stack. - Experimental Condition injection
- Background workloads CPU, Memory, Disk, network,
Traffic injection at the network edges. - Stress high number of clients, servers, tasks,
data transfers, - Perturbation artificial faults (crash,
intermittent failure, memory corruptions,
Byzantine), rapid platform reduction/increase,
slowdowns, etc.
Allow users running their favorite measurement
tools and experimental condition injectors
12Grid5000 principle A highly reconfigurable
experimental platform
Application
Programming Environments
Application Runtime
Measurement tools
Experimental conditions injector
Grid or P2P Middleware
Operating System
Networking
Let users create, deploy and run their software
stack, including the software to test and their
environment measurement tools experimental
conditions injectors
13Experiment workflow
Log into Grid5000 Import data/codes
yes
Build an env. ?
no
Reserve nodes corresponding to the experiment
Reserve 1 node
Reboot node (existing env.)
Reboot the nodes in the user experimental
environment (optional)
Adapt env.
Transfer params Run the experiment
Reboot node
Collect experiment results
Env. OK ?
Available on all sites Fedora4all Ubuntu4all Deb
ian4all
Exit Grid5000
yes
14Agenda
Motivation Grid5000 design Grid5000
status Early results Conclusion
15Grid5000 map
GdX Sept. 2007 1040 CPUs (dual-core?) Myrinet
10G 512 ports
Lille 500 (106)
Nancy 500 (94)
Rennes 518 (518)
Orsay 1000 (684)
Lyon 500 (252)
Bordeaux 500 (96)
Grenoble 500 (270)
Toulouse 500 (116)
Sophia Antipolis 500 (434)
16Hardware Configuration
17Grid5000 network
Renater connections
10 Gbps
Dark fiber Dedicated Lambda Fully isolated
traffic!
18Grid5000 as an Instrument
- 4 main features
- A high security for Grid5000 and the Internet,
despite the deep reconfiguration feature - --gt Grid5000 is confined communications between
sites are isolated from the Internet and Vice
versa (level2 MPLS, Dedicated lambda). - A software infrastructure allowing users to
access Grid5000 from any Grid5000 site and have
simple view of the system - --gt A user has a single account on Grid5000,
Grid5000 is seen as a cluster of clusters, 9 (1
per site) unsynchronized home directories - A reservation/scheduling tools allowing users to
select nodes and schedule experiments - a reservation engine batch
scheduler (1 per site) OAR - Grid (a co-reservation
scheduling system) - A user toolkit to reconfigure the nodes
- -gt software image
deployment and node reconfiguration
tool
19OS Reconfiguration techniquesReboot OR Virtual
Machines
Virtual Machine No need for reboot Virtual
machine technology Selection not so easy Xen has
some limitations -Xen3 in initial support
status for intel vtx -Xen2 does not support
x86/6 -Many patches not supported -High overhead
on high speed Net.
Reboot Remote control with IPMI, RSA,
etc. Disc repartitioning, if necessary Reboot
or Kernel switch (Kexec)
Currently we use Reboot, but Xen will be used
in the default environment. Let users select its
experimental environment Fully dedicated or
shared within virtual machine
20Rennes
Lyon
Sophia
Grenoble
Bordeaux
Toulouse
Orsay
21Experimental Condition injectors
Network traffic generator
Faults injector
FAult Injection Language
A non Gaussian long memory model Gamma Farima
Ga,b farima (f, d, q)
Normal
D10ms
D400ms
DOS attack
Flash Crowd
D32ms
D2ms
22Agenda
- Motivation
- Grid5000 design
- Grid5000 status
- Early results
- Communities,
- Platform usage,
- Experiments
- Conclusion
23Community Grid5000 users
345 registered Users Coming from 45 Laboratories.
Univ.Nantes Sophia CS-VU.nl FEW-VU.nl Univ.
Nice ENSEEIHT CICT IRIT CERFACS ENSIACET INP-Toulo
use SUPELEC
IBCP IMAG INRIA-Alpes INSA-Lyon Prism-Versailles B
RGM INRIA CEDRAT IME/USP.br INF/UFRGS.br LORIA
UFRJ.br LABRI LIFL ENS-Lyon EC-Lyon IRISA RENATER
IN2P3 LIFC LIP6 UHP-Nancy
France-telecom LRI IDRIS AIST.jp UCD.ie LIPN-Paris
XIII U-Picardie EADS EPFL.ch LAAS ICPS-Strasbourg
24About 230 Experiments
25About 200 Publications
26A series of Events
27The Grid5000 Newsletter
28117 participants (we tryed to limit to 100)
Non Grid5000 users
Grid5000 users
4
Engineers
31
Students
26
44
Scientists
56
PostDocs
Dont miss the Second Grid5000 Winter School in
Jan. 2007 http//ego-2006.renater.fr/
- Topics and exercises
- Reservation
- Reconfiguration
- MPI on the Cluster of Clusters
- Virtual Grid based on Globus GT4
39
1
1
Toulouse
3
6
Bordeaux
9
Sophia
6
3
Grenoble
Rennes
19
Computer science
14
Physics
Mathematics
Biology
Lille
6
Chemistry
Orsay
17
Nancy
14
86
Lyon
15
29Grid_at_work (Octobre 10-14 2005)
- Series of conferences and
- tutorials including
- Grid PlugTest (N-Queens and Flowshop Contests).
The objective of this event was to bring together
ProActive users, to present and discuss current
and future features of the ProActive Grid
platform, and to test the deployment and
interoperability of ProActive Grid applications
on various Grids.
Dont miss Grid_at_work 2006 in Nov. 26 to Dec.
1 http//www.etsi.org/plugtests/Upcoming/GRID2006/
GRID2006.htm
The N-Queens Contest (4 teams) where the aim was
to find the number of solutions to the N-queens
problem, N being as big as possible, in a limited
amount of time
The Flowshop Contest (3 teams)
1600 CPUs in total 1200 provided by Grid5000
50 by the other Grids (EGEE, DEISA, NorduGrid)
350 CPUs on clusters.
30Resource usage activity (Feb06)
Activity gt 70
31Reconfiguration stats.
Lyon
Toulouse
Grenoble
Nancy
Orsay
Sophia
32Experiment Geophysics Seismic Ray Tracing in 3D
mesh of the Earth
Stéphane Genaud , Marc Grunberg , and Catherine
Mongenet IPGS Institut de Physique du Globe de
Strasbourg
Building a seismic tomography model of the Earth
geology using seismic wave propagation
characteristics in the Earth. Seismic waves are
modeled from events detected by sensors. Ray
tracing algorithm waves are reconstructed from
rays traced between the epicenter and one sensor.
- A MPI parallel program composed of 3 steps
- 1) Master-worker ray tracing and mesh update by
each process with blocks of rays - successively fetched from the master process,
- 2) all-to all communications to exchange submesh
in-formation between the processes, - 3) merging of cell information of the submesh
associated with each process.
33Solving the Flow-Shop Scheduling Problem
one of the hardest challenge problems in
combinatorial optimization
- Schedule a set of jobs on a set of machines
minimizing the makespan. - Jobs order must be respected and machines can
execute 1 job at a time. - Complexity is very high for large size instances
(possible schedules). - Exhaustive enumeration of all combinations would
take several years. - The challenge is thus to reduce the number of
explored solutions. - But the problem cannot be efficiently solved
without computational grids.
- New Grid exact method based on the
Branch-and-Bound algorithm (Talbi, melab, et
al.), combining new approaches of combinatorial
algorithmic, grid computing, load balancing and
fault tolerance. - Problem 50 jobs on 20 machines, optimally
solved for the 1st time, - with 1245 CPUs (peak)
- Using simultaneously Grid5000 and other clusters
- Involved Grid5000 sites (6) Bordeaux, Lille,
Orsay, Rennes, Sophia-Antipolis and Toulouse. - The optimal solution required a wall-clock time
of 1 month and 3 weeks.
E. Talbi, N. Melab, 2006
34Jxta DHT scalability
Edge Peer
rdv Peer
- Goals study of a JXTA DHT
- Rendez vous peers form the JXTA DHT
- Performance of this DHT?
- Scalability of this DHT?
- Organization of a JXTA overlay (peerview
protocol) - Each rendezvous peer has a local
- view of other rendezvous peers
- Loosely-Consistent DHT between
- rendezvous peers
- Mechanism for ensuring
- convergence of local views
- Benchmark time for
- local views to converge
- Up to 580 nodes on 6 sites
- It requires 2 hours to contact all rendez vous
peers - With the per default setting, the view of every
rendez vous peers is limited to only 300 rendez
vous peers - The view of every rendez vous peer is very
unstable
rendez vous peers known by one of the rendez
vous peer X axis time Y axis rendez vous
peer ID
35Fully Distributed Batch Scheduler
- Motivation evaluation of a fully distributed
resource allocation service (batch scheduler) -
- Vigne Unstructured network, flooding (random
walk optimized for scheduling). - Experiment a bag of 944 homogeneous tasks / 944
CPU - Synthetic sequential code (monte carlo
application). - Measure of the mean execution time for a task
(computation time depends on the resource) - Measure the overhead compared with an ideal
execution (central coordinator) - Objective 1 task per CPU.
- Tested configuration
- Result
mean 1972 s
- -944 CPUs
- Bordeaux (82), Orsay(344), Rennes Paraci (98),
Rennes Parasol (62), Rennes Paravent (198),
Sophia (160) - Duration 12 hours
36Fault tolerant MPI for the Grid
- MPICH-V Fault tolerant MPI implementation
- Research context large scale fault tolerance
- Research issue Blocking or non Blocking
- Coordinated Checkpointing?
- Experiments on 6 sites up to 536 CPUs
37TCP limits over 10Gb/s links
- Highlighting TCP stream interaction issues in
very high bandwidth links (congestion colapse)
and poor bandwidth fairness - Grid5000 10Gb/s connections evaluation.
- Evaluation of TCP variants over Grid5000 10Gb/s
links (BIC TCP, H-TCP, weswood)
Interaction of 10 1Gb/s TCP streams, over the
10Gb/s Rennes-Nancy link, during 1 hour.
Aggregated bandwidth of 9,3 Gb/s on a time
interval of few minutes. Then a very high drop of
the bandwidth on one of the connection.
38Agenda
Motivation Grid5000 design Grid5000
Status Early results Conclusion
39Conclusion
- A large scale and highly reconfigurable Grid
experimental platform - Used by Master student Ph. D., PostDoc and
researchers (and results are presented in their
reports, thesis, papers, etc.) - Grid5000 offers in 2006
- 9 clusters distributed over 9 sites in France,
- about 10 Gigabit/s (directional) of bandwidth
- the capability for all users to reconfigure the
platform protocols/OS/Middleware/Runtime/Applicat
ion - Grid5000 results in 2006
- 300 users
- 200 publications,
- 230 planned experiments
- Grid5000 winter school (Philippe dAnfray,
January 2007) - Connection to other Grid experimental platforms
- -Netherlands (from October 2006), Japan (under
discussion)
40ADDITIONALSLIDES
41Evaluation of medical image registration
algorithms
- Medical imaging aspect
- Goal
- Comparing the accuracy of registration algorithms
on a brain radiotherapy use-case with a
statistical procedure - Results
- Subvoxelic accuracy quantified for each algorithm
- Visually undetectable features detected by the
procedure - Grid computing aspect
- Goal
- Comparing the performances of a production grid
(EGEE) to the Grid'5000 controlled platform - Results
- Multi-grid submission model determined and
parameters quantified - Impact of grid variability on application
performances highlighted
T.Glatard X.Pennec - J.Montagnat - Sept.2006
42Molecular Structure Prediction on the
Computational Grid
- Problem formulation
- In Molecular conformation as a set of torsion
angles - Out Molecular conformation with a lower free
energy - Complexity analysis
- For a molecule with 40 residues with 10
conformations per residue, 1040 conformations are
obtained in average - 1018 years are required at 1014 conformations per
second explored ! - A near-optimal method for the Grid
- Using a Lamarkian Genetic Algorithm (GA) on the
Computational Grid for solving the problem
Bonded and non-bonded energy surfaces
Lamarkian GA on Grid5000
Deployment on Lille(60), Orsay(150) and Azur
Sophia (50) 520 CPUs Average Quality
Improvement 64
Energy surface after applying a Lamarckian
operator on Grid5000
A. Tantar, N. Melab and E-G. Talbi, OPAC-LIFL,
INRIA DOLPHIN
43Large Scale experiment of DIETA GridRPC
environment
1120 clients submitted more than 45 000 REAL
GridRPC requests (dgemm matrix multiply) to
GridRPC servers
Objectives - Proove that the DIET environment
is scallable. - Test the functionnalities of DIET
at large scale
? 7 sites Lyon, Orsay,Rennes, Lilles,
Sophia,Toulouse,Bordeaux ? 8 clusters - 585
machines - 1170 CPUs.
Raphaël Bolze
44DAS2 400 CPUs exp. Grid
Henri Bal
- Homogeneous nodes!
- Grid middleware
- Globus 3.2 toolkit
- PBSMaui scheduler
- Parallel programming support
- MPI (MPICH-GM, MPICH-G2), PVM, Panda
- Pthreads
- Programming languages
- C, C, Java, Fortran 77/90/95
DAS2 (2002)
Node reconfiguration is not possible!
45Grid5000 versus PlanetLab
46DAS3 Cluster configurations
47Virtualization at which level?
Usual Stack
Customization Stack
User interface
Application
Runtime
Middleware
Operating system
OS C
Network protocols
Network C
48The DAS3 Computer Science Grid
- 4 DAS-3 sites, with 5 clusters
- Interconnected with 4 to 8 dedicated lambdas of
10 Gb/s each - Same fiber as for regular Internet
Funding NWO, NCF, VL-e (UvA, Delft, part VU),
MultimediaN (UvA), Universiteit Leiden
49DAS3 Star Plane
Connection from each site to OADM (fixed)
equipment - Optical Add Drop Multiplexer and the
WSS - Wavelength Selectable Switches- in the
Amsterdam area.
50Toward a European Computer Science Grid
DAS3
Oct. 2006
1500 CPUs
Grid5000
2600 CPUs
51TCP limits over 10Gb/s links
- Highlighting TCP stream interaction issues in
very high bandwidth links (congestion colapse)
and poor bandwidth fairness - Grid5000 10Gb/s connections evaluation.
- Evaluation of TCP variants over Grid5000 10Gb/s
links (BIC TCP, H-TCP, weswood)
Interaction of 10 1Gb/s TCP streams, over the
10Gb/s Rennes-Nancy link, during 1 hour.
Aggregated bandwidth of 9,3 Gb/s on a time
interval of few minutes. Then a very high drop of
the bandwidth on one of the connection.