Title: ACI MD GDS
1ACI MD GDS
- Le middleware pour GDS
- http//graal.ens-lyon.fr/diet
2Plan
- Réservation de ressource dans un ASP hiérarchique
- Déploiement automatique
- DIET en P2P
- DIET vs NetSolve
- VizDIET
- Communications dans DIET
- Une application pour GDS
3Context
- One long term idea for Grid computing renting
computational power and memory capacity over the
Internet - Very high potential
- Need of Problem Solving Environments (PSEs)
- Applications need more and more memory capacity
and computational power - Some proprietary libraries or environments need
to stay in place - Difficulty of installation for some libraries or
applications - Some confidential data must not circulate over
the net - Use of computational servers accessible through a
simple interface - Need of schedulers
- Moreover
- Still difficult to use for non-specialists
- Almost no transparency
- Security and accounting issues usually not
addressed - Often application dependent PSEs
- Lack of standards
- (CORBA, JAVA/JINI, sockets, ) to build the
computational servers
4RPC and Grid-computing ? GridRPC
- A simple idea
- RPC programming model for the Grid
- Use of distributed collections of heterogeneous
platforms on the Internet - For applications require memory capacity and/or
computational power - Task parallelism programming model
(synchronous/asynchronous) data parallelism on
servers ? mixed parallelism - Needed functionality
- Load balancing
- resource discovery
- performance evaluation
- Scheduling
- Fault tolerance,
- Data redistribution,
- Security,
- Interoperability,
5GridRPC
Client
AGENT(s)
Op(C, A, B)
S1
S3
S4
S2
6GridRPC (cont)
- 5 main components
- Client
- submits problems to servers
- Gives users interfaces
- Server
- solves problems sent by clients
- Runs software
- Database
- contains dynamic and static information about
software and hardware resources - Scheduler
- chooses an appropriate server depending of
- the problem sent
- the information contained in the database
- Monitor
- gets information about the status of the
computational resources
7DIET - Distributed Interactive Engineering
Toolbox -
- Hierarchical architecture for an improved
scalability - Distributed information in the tree
- Plug-in schedulers
MA
MA
MA
MA
MA
Master Agent
Server front end
A
Direct connection
LA
LA
8FAST - Fast Agents System Timer -
- NWS-based (Network Weather Service, UCSB)
- Computational performance
- Load, memory capacity, and performance of batch
queues (dynamic) - Benchmarks and modeling of available libraries
(static) - Communication performance
- To be able to guess the data redistribution cost
between two servers (or between clients and
servers) as a function of the network
architecture and dynamic information - Bandwidth and latency (hierarchical)
9PIF - Propagate Information Feedback -
- Algorithm from distributed system research
- PIF Propagate Information Feedback
- Two steps
- First phase broadcast phase
- Broadcast one message through the tree
- Second phase feedback phase
- When the leaf has no descendant? feedback
message is sent to their parent - When the parent receives the feedback messages
from all its descendants, it sends a feedback
message to its own parent, and so on
10PIF and DIET - broadcast phase -
MA
1. Broadcast the clients request
2. Sequential FAST interrogation for each LA
3. Resource reservation
11PIF and DIET - feedback phase -
MA
1. chooses the identity of the most (or list of)
appropriate'' server(s)
2. unused resources are released
12PIF and DIET - feedback phase -
MA
1. chooses the identity of the most (or list of)
appropriate'' server(s)
2. unused resources are released
13PIF and DIET - feedback phase -
S12
MA
1. chooses the identity of the most (or list of)
appropriate'' server(s)
2. unused resources are released
14Server failure and reactivity
MA
A
A
S2 DEAD LINE 1
S12
S15
S7
S2
LA
LA
LA
LA
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
- Take into account server failure and increase the
DIET reactivity - Time out at the LA level
- Dead Line 1 ß1 Call_FAST_time ß2 nb_server
15Hierarchical fault tolerance
MA
S7 DEAD LINE 2
S12
A
A
S12
S15
S7
LA
LA
LA
LA
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
S16
- No answer after dead line 1
- Dead Line 2 ß3 level_tree
16Simulation SimGRID2
- Real experiments or simulations are often used to
test or to compare heuristics. - Designed for distributed heterogeneous platforms
- Simulations can enable reproducible scenarios
- Simgrid a distributed application simulator for
scheduling algorithm evaluation purposes. - Event-driven simulation
- Fixed or According to a trace values to
characterize SimGrid resources - Processors
- Network links
- Simgrid2 A simulator built using SG. This layer
implements realistic simulations based on the
foundational SG and is more application-oriented. - Simulations are built in terms of communicating
agents.
17The DIET SimGRID2 simulator
18Evaluation of the PIF scheduler
19Conclusion and future work
- Conclusion
- Benefit from distributed system research
- Fault tolerance into DIET
- Server failure
- Branch failure
- Resource reservation performs a good QoS for
client requests - DIET SimGRID2 simulator
- Can be reused to validate other algorithms
- Future work
- Implementation of some tools to guarantee the
resource reservation - Integrate NWS trace into the simulator
- How to fix deadlines on a given heterogeneous
platform ?
20Plan
- Réservation de ressource dans un ASP hiérarchique
- Déploiement automatique
- DIET en P2P
- DIET vs NetSolve
- VizDIET
- Communications dans DIET
- Une application pour GDS
21Automatic deployment
- Problem Take the right number of components
(resources) and place them in right way, to
increase the overall performance of the platform.
- Motivation how to deploy DIET on the grid ?
- Foundation Idea given in the article
Scheduling strategies for master-slave tasking
on heterogeneous processor grids by C.Banino,
O.Beaumont, A. Legrand and Y.Robert.
22Introduction
- Solution
- Generate a new structure by arranging the
resources according to the graph, which gives
best throughput. - For homogeneous platform, the resources should be
arranged in a binary tree type structure. - For heterogeneous platform, more resources should
be added by checking the bottleneck in the
structure.
23Deployment
24Deployment
- wi (Mflp/s) Computing power of node Pi
- bij capacite of link (links are symmetric and
bidirectional) - Sini size of incomming request from client
- Souti size of outgoing request (response)
- alphaini fraction of time for the computation
of incomming request - alphaouti fraction of time taken for the
computaion of outgoing resuest
25Operations in steady state
- Calculation of the throughput of a node
26Calculation of the throughput of a graph
27Example Calculation of throughput of graph
Min(20,12)
12
Min(12,77)
28Homogeneous Structures
All nodes have same computing power and bandwidth
link
Star Graph
2 Depth Star Graph
Binary Tree
2 Chain Graph
Chain Graph
29Homogeneous Structures
- Simulation results (with 8nodes) -
30Homogeneous Structures
- Simulation results (with 32 nodes) -
31Homogeneous Structures
- Simulation results (for Binary graph) -
32Heterogeneous Networks
33Throughput of network
25
R 2
30
25
25
30
25
25
30
1?200
1?40
1?15
34Throughput of network by adding LAs
24
R 2
R 2.2
R 2.65
17.88
19
15
0.11
0.052
29
0.88
1?40
1?200
1?15
35Heterogeneous Network
36Experimental results
- 1 client with n requests
- no steady state (MA performed good)
- n clients with n requests
- no steady state (MA performed good)
- Pipeline effect
- not enough nodes (clients)
- Buffered the requests at MA
- a new client implementation to make an effect of
steady state - MA failed with 960 requests (due to memory
problem)
37Experimental results
38Conclusion
- Select best structure
- Improve the throughput of the network
- Predict the performance of the structure
- Can find the effects on performance if different
changes are done in the structure configuration - Bottleneck is not caused at MA
39Conclusion
- Homogeneous
- Binary tree type structure is best
- Number of nodes is proportionate to number of
servers - Star graph type structure, when nodes are less
and servers are more than 60 - Heterogeneous
- Find the bottleneck
- Improve the throughput
- Modelizing the DIET
40Future work
- Calculate the throughput of structures with
multi-client and multi-master agents. - Dynamic updating with the use of package GRAS
- Timer addition into the tool to get real value
for CORBA implementation of DIET - Check the LA and SeD as the cause of bottleneck
- Combine scheduling and deployment to increase the
performance - Validation of work by real deployment.
41Automatic Deployment first tool
42Plan
- Réservation de ressource dans un ASP hiérarchique
- Déploiement automatique
- DIET en P2P
- DIET vs NetSolve
- VizDIET
- Communications dans DIET
- Une application pour GDS
43DIET en P2P
- Existant
- Multi-MA disponible avec connection en JXTA
- Docs disponibles
- Archive disponible diet-0.7_beta-dev-jxta.tgz
- TODO list
- Evaluer les performances
- Vérifier le respect des coding standard
- Intégration au CVS DIET
- Briser la contrainte 1 composant JXTApour 1
composant DIET - Algorithmes intelligents pour le parcours des
MA ?
MA
MA
MA
Connexions JXTA
MA
MA
A
LA
LA
LA
44Plan
- Réservation de ressource dans un ASP hiérarchique
- Déploiement automatique
- DIET en P2P
- DIET vs NetSolve
- VizDIET
- Communications dans DIET
- Une application pour GDS
45DIET vs NetSolve
- Scripts de déploiement.
- Utilisation de CVS pour mettre à jour les
fichiers de configuration.
clients
paraski
agents servers
sunlabs
clients
ls
46DIET vs NetSolve
47DIET vs NetSolve
- TODO List
- Tests avec API asynchrone.
- 'Multithreader' le client.
- Amélioration des statistiques (indice de
dispersion). - Amélioration des scripts de déploiements
(fichiers de configuration DIET et omniORB). - Expliquer les résultats de NetSolve
- Expliquer le problème des 40 clients de DIET
- Tests sur les SPARC
- Tests icluster2 ?
48Plan
- Réservation de ressource dans un ASP hiérarchique
- Déploiement automatique
- DIET en P2P
- DIET vs NetSolve
- VizDIET
- Communications dans DIET
- Une application pour GDS
49VizDIET
- Chaque LogManager collecte les infos de son agent
et les envoie au LogCentral situé en dehors de la
structure DIET. - VizDiet
- outil de visualisation en Java
- Interraction sur la plate-forme
50VizDIET 1.0
- Intégration de LogService (LogManager/LogCentral)
dansles agents DIET - Transfert de messages depuis l'agent par
l'intermédiaire du LogManager - pas de stockage sur disque
- Etude vizPerf vs vizDIET
- Conclusion vizPerf trop éloigné de la structure
DIET
51VizDIET 2.0
- VizDiet collectera les infos de la structure DIET
par LogCentral et les affichera en temps réel. - VizDiet doit pouvoir aussi agir sur la structure
en générant des scripts XML (flèche rouge) ou en
modifiant des infos XML existantes (flèche bleue)
52Screenshot
53Plan
- Réservation de ressource dans un ASP hiérarchique
- Déploiement automatique
- DIET en P2P
- DIET vs NetSolve
- VizDIET
- Communications dans DIET
- Une application pour GDS
54Du neuf dans les communications DIET
- Asynchrone
- Finaliser la compatibilité GridRPC (erreur,
handle,) - Bug sur le diet_wait_and ?
- Fuites mémoires ? (test déchelle)
- PadicoTM
- Compilation/Test
- Intégration dans DIET en cours
- Limitation des plate-formes utilisables
- Bug sur le dechargement des modules (lié à la
fonction dlopen selon la libc utilisée)
55Plan
- Réservation de ressource dans un ASP hiérarchique
- Déploiement automatique
- DIET en P2P
- DIET vs NetSolve
- VizDIET
- Communications dans DIET
- Une application pour GDS
56Une application pour GDS ?
- GriPPS Grid Protein Pattern Scanning
- Application de bio-informatique
- Pattern scanning
- Caractéristiques
- Tâches répétitives, courtes mais très nombreuses
- Entrées/sorties
- fichiers texte
- De quelques Mos à plusieurs Gos
- Demande dun temps de réponse faible
57FAST