Grids - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Grids

Description:

The EU flagship infrastructure. Autonomic computing (EGEE needs it) Two concepts ... Flagship grid infrastructure project co-funded by the European Commission ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 48
Provided by: marce226
Category:
Tags: flagship | grids

less

Transcript and Presenter's Notes

Title: Grids


1
Grids
  • Ontologie Opérationnelle EGEE18 Février 2009

2
In a nutshell
  • Grids infrastructures for e-science
  • Very Large Scale Distributed systems and more
  • Grids and Autonomic Computing
  • Self-aware grids behavioral models of the
    infrastructure, middleware, users and their
    interactions
  • Self-controlled grids design of adaptive
    configuration-free policies
  • Project Grid Observatory
  • Integrate the collection of data on the
    behaviour of the EGEE grid and users with the
    development of models and of an ontology for the
    domain knowledge
  • EGEE, DIGITEO, DEMAIN
  • www.grid-observatory.org

3
Who we are
  • LRI IA team TAO equipe-projet INRIA
  • Cécile Germain-Renaud, professor
  • Michèle Sebag, Research Director
  • Balazs Kegl Researcher
  • Julien Perez, PhD
  • Xiangliang Zhang PhD
  • Elteto Tamas Post-doc
  • Alain Cady undergraduate
  • 1 engineer 2 be hired.
  • Collaborations
  • LAL (Laboratoire de lAccélérateur Linéaire)
  • Universita Piemonte Orientale
  • Imperial College London
  • CESNET Czech NREN operator and Masaryk
    University
  • CoreGrid

4
Outline
  • The EGEE grid overview
  • The EU flagship infrastructure
  • Autonomic computing (EGEE needs it)
  • Two concepts
  • The Grid Observatory
  • Which ontologies?
  • Data collection
  • Data publication
  • EGEE and gLite at work
  • Overview
  • The Information System
  •  Scheduling 
  • The Logging and Bookeeping
  • Other sources of information

5
The EGEE Grid
  • Flagship grid infrastructure project co-funded
    by the European Commission
  • Now in 3rd phase with 91 partners in 32
    countries
  • Main goal operate a large-scale,
    production-quality grid
  • 300 sites, 140 partners, 50 countries
  • 100,000 cores, 20 PB
  • 10,000 users
  • 300,000 jobs/day
  • FP 6 and 7 funding 100M

Archeology Astronomy Astrophysics Civil
Protection Comp. Chemistry Earth
Sciences Finance Fusion Geophysics High Energy
Physics Life Sciences Multimedia Material
Sciences
6
Towards a sustainable infrastructure
EC co-funding 8 Million /year
How can we reduce the effort required to operate
this expanding infrastructure? Bob Jones talk at
EGEE08
7
Towards a sustainable infrastructure
How can we reduce the effort required to operate
this expanding infrastructure? Bob Jones talk at
EGEE08
  • Computing systems that manage themselves in
    accordance with high-level objectives from
    humans. Kephart Chess A vision of Autonomic
    Computing, IEEE Computer 2003
  • Self- configuration, optimization, healing,
    protection
  • Of open non steady state dynamic systems

ECML/PKDD-06 workshop on Autonomic Computing A
New Challenge for Machine Learning
8
Some immediate questions
  • Ressource allocation
  • Performance of the gLite scheduling hierarchy
  • Responsive grids Everybody's grid
  • Common goods prevent abuse of the grid
    resources
  • Dimensioning
  • Patterns and trends in requests and usage
  • Scalability of the information system
  • Dependability
  • Detection blackholes
  • Diagnosis disappearing jobs
  • Performance of a probe-free approach

9
Autonomic Grids
  • Statistical analysis
  • Data mining
  • Machine learning

DATA REQUIRED
10
Challenges for autonomic grids researchAnd for
ontologies too
  • Curse of dimensionality representation and
    computational complexity
  • Data sparsity largely unexplored state/action
    space
  • No steady-state
  • Data acquisition at full grid scale
  • Lack of interpretation

11
Challenges for autonomic grids researchAnd for
ontologies too
  • Curse of dimensionality representation and
    computational complexity
  • Data sparsity largely unexplored state/action
    space
  • No steady-state
  • Data acquisition at full grid scale
  • Lack of interpretation
  • Many undocumented features
  • Cases of moving and confuse terminology are not
    unknown

12
Grids?
Very large distributed systems
13
Grids?
And also enterprise grids
Computing centers, strong admin PC farms Size
gt1E4 Driving force large scientific
collaborations
Computing centers, iron admin UHP parallel
machines Size 10 Driving forcenation-critical
issues
Personal computers or desktops Volunteer
computing or spare time Size gt1E4 Driving force
14
Grids?
  • GLUE
  • GLUE
  • GLUE

15
Glue the interoperability specification
16
Two basic ideas (concepts?)
  • Virtual Organizations
  • Middleware (s)

17
Virtual Organizations
  • A scientific community e.g
  • Atlas, LHCb (LHC experiments), Biomed, Earth
    Science
  • VO-upsud beginners grid users at P11
  • Function  who, what, when to share  (I. Bird)
  • Operational access and share rights
  • Representation in EGEE (EGI SSC)
  • Institutions
  • Generally none and this is a major problem.
  • Real institutions are national or intl structures
    (CNRS, INSERM, CERN,)
  • Community software and correlated activity

18
Middleware
  • Grids are very large distributed systems

19
Middleware
  • Grids are very large distributed systems
  • Production grids are a federation of independent
    sites
  • Middleware the set of software components that
    create the federation a grid-wide scope for
  • Submitting and run jobs
  • Storing and retrieving data
  • Authorization and authentication
  • Each site acquires and maintains its own set of
    resources
  • Local policies result of an informal tradeoff
    between local requirements and EGEE needs. No
    SLAs, only MoUs.
  • The global policies emerge as a consequence of
    the local policies. e.g. dont look for  the
    scheduler more generally, only partial
    information is available

20
Middleware - monitoring
  • Grids are very large distributed systems
  • Multiple sources of informations, asynchronous,
    error-prone
  • Hardware failures
  • Software bugs
  • Inconsistent specifications
  • CIC tools (GOCDB, SAM, SFT,),
  • core gLite (LB, BDII,)
  • sites (Maui/PBS logs)
  • gLite integrators (R-GMA, Job Provenance)
  • experience integrators (DashBoard)
  • external software (MonaLisa)

21
Terminology
  • EGEE is
  • Stricto-sensu, 3 EU projects EGEE-I, II, III
  • Informally, the hardware infrastructure and human
    resources
  • gLite is
  • The official middleware developped in the course
    of the EGEE projects
  • A significant postion is heavy re-engineering of
    various softawre suites

22
Outline
  • The EGEE grid overview
  • The EU flagship infrastructure
  • Autonomic computing (EGEE needs it)
  • Two concepts
  • The Grid Observatory
  • Which ontologies?
  • Data collection
  • Data publication
  • EGEE and gLite at work
  • Overview
  • The Information System
  •  Scheduling 
  • The Logging and Bookeeping
  • Other sources of information

23
The Grid Observatory
  • Integrate the collection of data on the
    behaviour of the EGEE grid and users with the
    development of models and of an ontology for the
    domain knowledge
  • Ontology
  • Representation, organisation, automated inference
    on the native data, based on existing knowledge
    of operational structures, e.g. lifecycle of a
    job, or Glue (this working group)
  • High level models involving human research e.g.
    GStrap clusters (Zhang et al. 09)
  • Which will later become  smart  data

24
Data Collection
  • Acquisition, consolidation, long-term
    conservation of traces of EGEE activities
  • Permanent storage of reliable, (exhaustive),
    filtered information
  • Reliability data curation and provenance issue
  • Exhaustive added value in snapshots of the
    inputs and grid state e.g. workload and available
    services during a relevant time range
  • Filtered from operational to structured
  • Since October 2008 the grid observatory portal
  • Collection and publication of raw data from gLite

25
(No Transcript)
26
(No Transcript)
27
The data
  • Traces from
  • The Information System (BDII)
  • Static and dynamic information on the resources
    hardware and services Glue information model
  • EGEE-wide (yes EGEE, not gLite)
  • The Logging and Bookkeeping service
  • Extensive log of all job-related events gLite
    so far, only one
  • Batch system(s) - so far, from the GRIF/LAL site
  • WMS internals (condorG, wm_proxy, etc.) - so far,
    from the GRIF/LAL site
  • RTM data
  • from the Real Time Monitor project
  • Summary of the lifecycle of jobs
  • Some information is outside the scope external
    traffic on shared resources
  • Planned all data and file-related information

28
Data Publication
  • Initially, heap of data  as it is 
  • Examples based on voluntary contributions
  • Goal bootstrap interactions
  • Perspective semantic organization
  • The Glue Information Model an ontology of the
    resources
  • Ongoing integration with the OGF reference model
  • Concepts for the grid dynamics e.g. job lifecycle
    or users relations
  • Expert concepts as prior knowledge of non-trivial
    correlations workflows, failure modes,
  • Concepts for elementary analysis grid snaphsot
    (more a movie in fact)

29
Outline
  • The EGEE grid overview
  • The EU flagship infrastructure
  • Autonomic computing (EGEE needs it)
  • Two concepts
  • The Grid Observatory
  • Which ontologies?
  • Data collection
  • Data publication
  • EGEE and gLite at work
  • Overview
  • The Information System
  •  Scheduling 
  • The Logging and Bookeeping
  • Other sources of information

30
Overview of gLite
31
Main components
User Interface (UI) The place where users
logon to the Grid
Resource Broker (RB) (Workload Management System
(WMS) Matches the user requirements with the
available resources on the Grid
Information System Characteristics and status of
CE and SE
File and replica catalog Location of grid files
and grid file replicas
Logging and Bookkeeping (LB) Log information of
jobs
Computing Element (CE) A batch queue on a sites
computers where
the users job is executed
Storage Element (SE) provides (large-scale)
storage for files
32
The EGEE information system (aka BDII)
  • Principes
  • Chaque site publie
  • Une description des ressources/services quil
    fournit par VO
  • Létat actuel de ses ressources (CPUs libres,
    Espace de stockage, etc.)
  • Chaque VO publie
  • Ce quelle a installé sur chaque site (Tags des
    Software Managers)
  • Les acteurs interrogent le SI pour savoir comment
    utiliser les services/ressources de la grille

33
The EGEE information system (aka BDII)
  • Provient de la boite à outils Globus
  • Comment?
  • Sur chaque site létat des services (informations
    statiques et dynamiques) est rapporté à des
    serveurs
  • Un système central interroge ces serveurs et
    stocke ces informations dans une base de données
  • Ces informations seront accessibles à travers le
    protocole daccès LDAP
  • Le système central fournit linformation dans un
    schema prédéfini GlueSchema

34
Architecture
  • CACHES

Site BDII
35
Mise en oeuvre annuaire LDAP
Directory Information Tree
  • GRIS, GIIS et BDII sont des serveurs dannuaire
    LDAP (openldap)
  • LDAP Lightweight Directory Access Protocol
  • Modèle de données
  • Arborescence de nuds
  • Chaque nud (entrée de lannuaire) contient des
    attributs
  • La structure dun nud est définie par une ou des
    classes au schéma prédéfini.
  • Modèle de nommage
  • Distinguished Name (DN)
  • cnGirard Pierre,ouPeople,oucc,oin2p3,cfr
  • Linformation est importé/exporté depuis/vers les
    serveurs LDAP par des fichiers au format LDIF
    (LDAP Data Interchange Format)

oin2p3,cfr (racine du DIT)
oulal
oucc
oulapp
ouPeople
cnGirard Pierre
cn Girard Pierre phone 33 9999999999 uid
girardpi mail pierre.girard_at_in2p3.fr
objectClass top objectClass person objectClass
organizationalPerson objectClass inetOrgPerson
36
Glue Schema in EGEE/LCG Design
  • GLUE Schema (v1.3)
  • Définition de schémas LDAP décrivant chaque
    composant des ressources/services de la grille
  • https//forge.gridforum.org/sf/projects/glue-wg

37
Quelques Exemples du Glue Schema (I)
  • Some General Attributes
  • Base class (objectclass GlueTop) No
    attributes
  • Schema Version Number (objectclass
    GlueSchemaVersion)
  • GlueSchemaVersionMajor Major Schema Version
    Number
  • GlueSchemaVersionMinor Minor Schema Version
    Number
  • Attributes for the CE
  • Base Class for the CE information(objectclass
    GlueCETop) No attributes
  • CE (objectclass GlueCE)
  • GlueCEUniqueID unique identifier for the CE
  • GlueCEName human-readable name of the
    service
  • CE Status (objectclass GlueCEState)
  • GlueCEStateRunningJobs number of running
    jobs
  • GlueCEStateWaitingJobs number of jobs not
    running
  • GlueCEStateTotalJobs total number of jobs
    (running waiting)
  • GlueCEStateStatus queue status queueing
    (jobs accepted but not running), production
    (jobs accepted and run), closed (neither accepted
    nor run), draining (jobs not accepted but those
    already queued are running)
  • GlueCEStateWorstResponseTime worst possible
    time between the submission of the job and the
    start of its execution

38
Quelques Exemples du Glue Schema (II)
  • 3. Attributes for the SE
  • Base Class (objectclass GlueSETop) No
    attributes
  • Architecture (objectclass GlueSLArchitecture)
  • GlueSLArchitectureType type of storage
    hardware (disk, tape, etc)
  • Storage Service Access Protocol (objectclass
    GlueSEAccessProtocol)
  • GlueSEAccessProtocolType protocol type to
    access or transfer files
  • GlueSEAccessProtocolPort port number for the
    protocol
  • GlueSEAccessProtocolVersion protocol version
  • GlueSEAccessProtocolAccessTime time to
    access a file using this protocol
  • 4. Mixed Attributes
  • Association between one CE and one or more
    SEs (objectclass GlueCESEBindGroup)
  • GlueCESEBindGroupCEUniqueID unique ID for
    the CE
  • GlueCESEBindGroupSEUniqueID unique ID for
    the SE

39
Job scheduling - No scheduler Matchmaking
Information System
Submit job (executable small inputs)
  • CACHES

query
Retrieve status (small) output files
create proxy
query
publish state
Submit job
Retrieve output
Job status
Logging
Register file
Input file(s)
Job status
process
VO Management Service (DB of VO users)
Output file(s)
Logging and bookkeeping
40
The lifecycle of a job
  • Log RTM and LB

41
WMS architecture
42
And more!
  • 50 more slides on job management

43
The LB
  • Detailed log of the lifecycle
  • 3 tables
  • Timestamps
  • Source, destination
  • Operational tags
  • And verbatim of the logs of the services

44
And more!
  • 30 more slides on the LB

45
Outline
  • The EGEE grid overview
  • The EU flagship infrastructure
  • Autonomic computing (EGEE needs it)
  • Two concepts
  • The Grid Observatory
  • Which ontologies?
  • Data collection
  • Data publication
  • EGEE and gLite at work
  • Overview
  • The Information System
  •  Scheduling 
  • The Logging and Bookeeping
  • Other sources of information

46
Other information sources
  • SAM (Service Availibility Monitoring)
  • https//lcg-sam.cern.ch8443/sam/sam.py
  • Soumission périodique des tests sur les sites
  • NB nécessite un certificat grille
  • GOC DB - CIC portal
  • https//goc.gridops.org/site/list?id239
  • lURL LDAP du Site BDII de chaque site
  • Le statut/type (Certified/Production) de chaque
    site
  • La déclaration éventuelle de  Scheduled
    Downtime 

47
And also
  • Informations sur les VOs avec votre navigateur,
    connectez-vous sur le site CIC
  • http//cic.gridops.org/index.php?sectionhomepage
    homepage
  • Selectionner le menu  vo
  • Selectionner VOidCard puis une VO de votre choix
  • Accounting avec votre navigateur, connectez-vous
    sur
  • http//www3.egee.cesga.es/gridsite/accounting/CESG
    A/egee_view.html

48
Further information, references
  • EGEE
  • http//www.eu-egee.org/
  • gLite middleware
  • http//www.glite.org
  • gLite manuals, documentation
  • http//glite.web.cern.ch/glite/documentation/
    (gLite user guide)
  • Nice presentations https//twiki.cern.ch/twiki/bi
    n/view/EGEE/EGEEgLitePresentations
Write a Comment
User Comments (0)
About PowerShow.com