Title: Grids
1Grids
- Ontologie Opérationnelle EGEE18 Février 2009
2In a nutshell
- Grids infrastructures for e-science
- Very Large Scale Distributed systems and more
- Grids and Autonomic Computing
- Self-aware grids behavioral models of the
infrastructure, middleware, users and their
interactions - Self-controlled grids design of adaptive
configuration-free policies - Project Grid Observatory
- Integrate the collection of data on the
behaviour of the EGEE grid and users with the
development of models and of an ontology for the
domain knowledge - EGEE, DIGITEO, DEMAIN
- www.grid-observatory.org
3Who we are
- LRI IA team TAO equipe-projet INRIA
- Cécile Germain-Renaud, professor
- Michèle Sebag, Research Director
- Balazs Kegl Researcher
- Julien Perez, PhD
- Xiangliang Zhang PhD
- Elteto Tamas Post-doc
- Alain Cady undergraduate
- 1 engineer 2 be hired.
- Collaborations
- LAL (Laboratoire de lAccélérateur Linéaire)
- Universita Piemonte Orientale
- Imperial College London
- CESNET Czech NREN operator and Masaryk
University - CoreGrid
4Outline
- The EGEE grid overview
- The EU flagship infrastructure
- Autonomic computing (EGEE needs it)
- Two concepts
- The Grid Observatory
- Which ontologies?
- Data collection
- Data publication
- EGEE and gLite at work
- Overview
- The Information System
- Scheduling
- The Logging and Bookeeping
- Other sources of information
5The EGEE Grid
- Flagship grid infrastructure project co-funded
by the European Commission - Now in 3rd phase with 91 partners in 32
countries
- Main goal operate a large-scale,
production-quality grid - 300 sites, 140 partners, 50 countries
- 100,000 cores, 20 PB
- 10,000 users
- 300,000 jobs/day
- FP 6 and 7 funding 100M
Archeology Astronomy Astrophysics Civil
Protection Comp. Chemistry Earth
Sciences Finance Fusion Geophysics High Energy
Physics Life Sciences Multimedia Material
Sciences
6Towards a sustainable infrastructure
EC co-funding 8 Million /year
How can we reduce the effort required to operate
this expanding infrastructure? Bob Jones talk at
EGEE08
7Towards a sustainable infrastructure
How can we reduce the effort required to operate
this expanding infrastructure? Bob Jones talk at
EGEE08
- Computing systems that manage themselves in
accordance with high-level objectives from
humans. Kephart Chess A vision of Autonomic
Computing, IEEE Computer 2003 - Self- configuration, optimization, healing,
protection - Of open non steady state dynamic systems
ECML/PKDD-06 workshop on Autonomic Computing A
New Challenge for Machine Learning
8Some immediate questions
- Ressource allocation
- Performance of the gLite scheduling hierarchy
- Responsive grids Everybody's grid
- Common goods prevent abuse of the grid
resources - Dimensioning
- Patterns and trends in requests and usage
- Scalability of the information system
- Dependability
- Detection blackholes
- Diagnosis disappearing jobs
- Performance of a probe-free approach
9Autonomic Grids
- Statistical analysis
- Data mining
- Machine learning
DATA REQUIRED
10Challenges for autonomic grids researchAnd for
ontologies too
- Curse of dimensionality representation and
computational complexity - Data sparsity largely unexplored state/action
space - No steady-state
- Data acquisition at full grid scale
- Lack of interpretation
11Challenges for autonomic grids researchAnd for
ontologies too
- Curse of dimensionality representation and
computational complexity - Data sparsity largely unexplored state/action
space - No steady-state
- Data acquisition at full grid scale
- Lack of interpretation
- Many undocumented features
- Cases of moving and confuse terminology are not
unknown
12Grids?
Very large distributed systems
13Grids?
And also enterprise grids
Computing centers, strong admin PC farms Size
gt1E4 Driving force large scientific
collaborations
Computing centers, iron admin UHP parallel
machines Size 10 Driving forcenation-critical
issues
Personal computers or desktops Volunteer
computing or spare time Size gt1E4 Driving force
14Grids?
15Glue the interoperability specification
16Two basic ideas (concepts?)
- Virtual Organizations
- Middleware (s)
-
17Virtual Organizations
- A scientific community e.g
- Atlas, LHCb (LHC experiments), Biomed, Earth
Science - VO-upsud beginners grid users at P11
- Function who, what, when to share (I. Bird)
- Operational access and share rights
- Representation in EGEE (EGI SSC)
- Institutions
- Generally none and this is a major problem.
- Real institutions are national or intl structures
(CNRS, INSERM, CERN,) - Community software and correlated activity
18Middleware
- Grids are very large distributed systems
19Middleware
- Grids are very large distributed systems
- Production grids are a federation of independent
sites - Middleware the set of software components that
create the federation a grid-wide scope for - Submitting and run jobs
- Storing and retrieving data
- Authorization and authentication
- Each site acquires and maintains its own set of
resources - Local policies result of an informal tradeoff
between local requirements and EGEE needs. No
SLAs, only MoUs. - The global policies emerge as a consequence of
the local policies. e.g. dont look for the
scheduler more generally, only partial
information is available
20Middleware - monitoring
- Grids are very large distributed systems
- Multiple sources of informations, asynchronous,
error-prone - Hardware failures
- Software bugs
- Inconsistent specifications
- CIC tools (GOCDB, SAM, SFT,),
- core gLite (LB, BDII,)
- sites (Maui/PBS logs)
- gLite integrators (R-GMA, Job Provenance)
- experience integrators (DashBoard)
- external software (MonaLisa)
21Terminology
- EGEE is
- Stricto-sensu, 3 EU projects EGEE-I, II, III
- Informally, the hardware infrastructure and human
resources - gLite is
- The official middleware developped in the course
of the EGEE projects - A significant postion is heavy re-engineering of
various softawre suites
22Outline
- The EGEE grid overview
- The EU flagship infrastructure
- Autonomic computing (EGEE needs it)
- Two concepts
- The Grid Observatory
- Which ontologies?
- Data collection
- Data publication
- EGEE and gLite at work
- Overview
- The Information System
- Scheduling
- The Logging and Bookeeping
- Other sources of information
23The Grid Observatory
- Integrate the collection of data on the
behaviour of the EGEE grid and users with the
development of models and of an ontology for the
domain knowledge - Ontology
- Representation, organisation, automated inference
on the native data, based on existing knowledge
of operational structures, e.g. lifecycle of a
job, or Glue (this working group) - High level models involving human research e.g.
GStrap clusters (Zhang et al. 09) - Which will later become smart data
24Data Collection
- Acquisition, consolidation, long-term
conservation of traces of EGEE activities - Permanent storage of reliable, (exhaustive),
filtered information - Reliability data curation and provenance issue
- Exhaustive added value in snapshots of the
inputs and grid state e.g. workload and available
services during a relevant time range - Filtered from operational to structured
- Since October 2008 the grid observatory portal
- Collection and publication of raw data from gLite
25(No Transcript)
26(No Transcript)
27The data
- Traces from
- The Information System (BDII)
- Static and dynamic information on the resources
hardware and services Glue information model - EGEE-wide (yes EGEE, not gLite)
- The Logging and Bookkeeping service
- Extensive log of all job-related events gLite
so far, only one - Batch system(s) - so far, from the GRIF/LAL site
- WMS internals (condorG, wm_proxy, etc.) - so far,
from the GRIF/LAL site - RTM data
- from the Real Time Monitor project
- Summary of the lifecycle of jobs
- Some information is outside the scope external
traffic on shared resources - Planned all data and file-related information
28Data Publication
- Initially, heap of data as it is
- Examples based on voluntary contributions
- Goal bootstrap interactions
- Perspective semantic organization
- The Glue Information Model an ontology of the
resources - Ongoing integration with the OGF reference model
- Concepts for the grid dynamics e.g. job lifecycle
or users relations - Expert concepts as prior knowledge of non-trivial
correlations workflows, failure modes, - Concepts for elementary analysis grid snaphsot
(more a movie in fact)
29Outline
- The EGEE grid overview
- The EU flagship infrastructure
- Autonomic computing (EGEE needs it)
- Two concepts
- The Grid Observatory
- Which ontologies?
- Data collection
- Data publication
- EGEE and gLite at work
- Overview
- The Information System
- Scheduling
- The Logging and Bookeeping
- Other sources of information
30Overview of gLite
31Main components
User Interface (UI) The place where users
logon to the Grid
Resource Broker (RB) (Workload Management System
(WMS) Matches the user requirements with the
available resources on the Grid
Information System Characteristics and status of
CE and SE
File and replica catalog Location of grid files
and grid file replicas
Logging and Bookkeeping (LB) Log information of
jobs
Computing Element (CE) A batch queue on a sites
computers where
the users job is executed
Storage Element (SE) provides (large-scale)
storage for files
32The EGEE information system (aka BDII)
- Principes
- Chaque site publie
- Une description des ressources/services quil
fournit par VO - Létat actuel de ses ressources (CPUs libres,
Espace de stockage, etc.) - Chaque VO publie
- Ce quelle a installé sur chaque site (Tags des
Software Managers) - Les acteurs interrogent le SI pour savoir comment
utiliser les services/ressources de la grille
33The EGEE information system (aka BDII)
- Provient de la boite à outils Globus
- Comment?
- Sur chaque site létat des services (informations
statiques et dynamiques) est rapporté à des
serveurs - Un système central interroge ces serveurs et
stocke ces informations dans une base de données - Ces informations seront accessibles à travers le
protocole daccès LDAP - Le système central fournit linformation dans un
schema prédéfini GlueSchema
34Architecture
Site BDII
35Mise en oeuvre annuaire LDAP
Directory Information Tree
- GRIS, GIIS et BDII sont des serveurs dannuaire
LDAP (openldap) - LDAP Lightweight Directory Access Protocol
- Modèle de données
- Arborescence de nuds
- Chaque nud (entrée de lannuaire) contient des
attributs - La structure dun nud est définie par une ou des
classes au schéma prédéfini. - Modèle de nommage
- Distinguished Name (DN)
- cnGirard Pierre,ouPeople,oucc,oin2p3,cfr
- Linformation est importé/exporté depuis/vers les
serveurs LDAP par des fichiers au format LDIF
(LDAP Data Interchange Format)
oin2p3,cfr (racine du DIT)
oulal
oucc
oulapp
ouPeople
cnGirard Pierre
cn Girard Pierre phone 33 9999999999 uid
girardpi mail pierre.girard_at_in2p3.fr
objectClass top objectClass person objectClass
organizationalPerson objectClass inetOrgPerson
36Glue Schema in EGEE/LCG Design
- GLUE Schema (v1.3)
- Définition de schémas LDAP décrivant chaque
composant des ressources/services de la grille - https//forge.gridforum.org/sf/projects/glue-wg
37Quelques Exemples du Glue Schema (I)
- Some General Attributes
- Base class (objectclass GlueTop) No
attributes - Schema Version Number (objectclass
GlueSchemaVersion) - GlueSchemaVersionMajor Major Schema Version
Number - GlueSchemaVersionMinor Minor Schema Version
Number - Attributes for the CE
- Base Class for the CE information(objectclass
GlueCETop) No attributes - CE (objectclass GlueCE)
- GlueCEUniqueID unique identifier for the CE
- GlueCEName human-readable name of the
service - CE Status (objectclass GlueCEState)
- GlueCEStateRunningJobs number of running
jobs - GlueCEStateWaitingJobs number of jobs not
running - GlueCEStateTotalJobs total number of jobs
(running waiting) - GlueCEStateStatus queue status queueing
(jobs accepted but not running), production
(jobs accepted and run), closed (neither accepted
nor run), draining (jobs not accepted but those
already queued are running) - GlueCEStateWorstResponseTime worst possible
time between the submission of the job and the
start of its execution
38Quelques Exemples du Glue Schema (II)
- 3. Attributes for the SE
- Base Class (objectclass GlueSETop) No
attributes - Architecture (objectclass GlueSLArchitecture)
- GlueSLArchitectureType type of storage
hardware (disk, tape, etc) - Storage Service Access Protocol (objectclass
GlueSEAccessProtocol) - GlueSEAccessProtocolType protocol type to
access or transfer files - GlueSEAccessProtocolPort port number for the
protocol - GlueSEAccessProtocolVersion protocol version
- GlueSEAccessProtocolAccessTime time to
access a file using this protocol -
- 4. Mixed Attributes
- Association between one CE and one or more
SEs (objectclass GlueCESEBindGroup) - GlueCESEBindGroupCEUniqueID unique ID for
the CE - GlueCESEBindGroupSEUniqueID unique ID for
the SE
39Job scheduling - No scheduler Matchmaking
Information System
Submit job (executable small inputs)
query
Retrieve status (small) output files
create proxy
query
publish state
Submit job
Retrieve output
Job status
Logging
Register file
Input file(s)
Job status
process
VO Management Service (DB of VO users)
Output file(s)
Logging and bookkeeping
40The lifecycle of a job
41WMS architecture
42And more!
- 50 more slides on job management
43The LB
- Detailed log of the lifecycle
- 3 tables
- Timestamps
- Source, destination
- Operational tags
- And verbatim of the logs of the services
44And more!
45Outline
- The EGEE grid overview
- The EU flagship infrastructure
- Autonomic computing (EGEE needs it)
- Two concepts
- The Grid Observatory
- Which ontologies?
- Data collection
- Data publication
- EGEE and gLite at work
- Overview
- The Information System
- Scheduling
- The Logging and Bookeeping
- Other sources of information
46Other information sources
- SAM (Service Availibility Monitoring)
- https//lcg-sam.cern.ch8443/sam/sam.py
- Soumission périodique des tests sur les sites
- NB nécessite un certificat grille
- GOC DB - CIC portal
- https//goc.gridops.org/site/list?id239
- lURL LDAP du Site BDII de chaque site
- Le statut/type (Certified/Production) de chaque
site - La déclaration éventuelle de Scheduled
Downtime
47And also
- Informations sur les VOs avec votre navigateur,
connectez-vous sur le site CIC - http//cic.gridops.org/index.php?sectionhomepage
homepage - Selectionner le menu vo
- Selectionner VOidCard puis une VO de votre choix
- Accounting avec votre navigateur, connectez-vous
sur - http//www3.egee.cesga.es/gridsite/accounting/CESG
A/egee_view.html
48Further information, references
- EGEE
- http//www.eu-egee.org/
- gLite middleware
- http//www.glite.org
- gLite manuals, documentation
- http//glite.web.cern.ch/glite/documentation/
(gLite user guide) - Nice presentations https//twiki.cern.ch/twiki/bi
n/view/EGEE/EGEEgLitePresentations