Tecnologia ed impatto sociale delle GRID - PowerPoint PPT Presentation

About This Presentation
Title:

Tecnologia ed impatto sociale delle GRID

Description:

... to be used in a coordinated fashion to deliver various qualities of service, ... infrastructure, site accounting, directory service, OS bypass ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 56
Provided by: Federico76
Category:

less

Transcript and Presenter's Notes

Title: Tecnologia ed impatto sociale delle GRID


1
Tecnologia ed impatto sociale delle GRID
  • Federico Ruggieri INFN
  • Roma La Sapienza 18 Ottobre 2006

2
Indice
  • Cosè Grid ?
  • Dalla Fisica A.E. alle altre Scienze
  • La Infrastruttura di GRID Europea (EGEE)
  • Le Applicazioni e le Comunità Scientifiche
  • Le Aree Geografiche
  • Impatto Sociale e Digital Divide
  • Luci ed Ombre
  • Conclusioni

3
What GRID is supposed to be
A computational grid is a hardware and software
infrastructure that provides dependable,
consistent, pervasive, and inexpensive access to
high-end computational capabilities. I. Foster
K. Kesselman - The Grid Blueprint for a New
Computing Infrastructure Morgan Kaufman 1998.
  • A dependable infrastructure that can facilitate
    the usage of distributed resources by many groups
    of distributed persons or Virtual Organizations.
  • The GRID paradigm is an extension of the WEB one,
    which was originally limited to distributed
    access to distributed information and documents.
  • The classical example is the Power GRID you plug
    in and receive power you dont know (and you
    dont care) where it comes from.

4
A Grid Checklist
  • Ian Foster more recently suggested that GRID is a
    system that
  • 1) coordinates resources that are not subject to
    centralized control (A Grid integrates and
    coordinates resources and users that live within
    different control domainsfor example, the users
    desktop vs. central computing different
    administrative units of the same company or
    different companies and addresses the issues of
    security, policy, payment, membership, and so
    forth that arise in these settings. Otherwise, we
    are dealing with a local management system.)
  • 2) using standard, open, general-purpose
    protocols and interfaces (A Grid is built from
    multi-purpose protocols and interfaces that
    address such fundamental issues as
    authentication, authorization, resource
    discovery, and resource access. As I discuss
    further below, it is important that these
    protocols and interfaces be standard and open.
    Otherwise, we are dealing with an
    applicationspecific system.)
  • 3) to deliver nontrivial qualities of service.
    (A Grid allows its constituent resources to be
    used in a coordinated fashion to deliver various
    qualities of service, relating for example to
    response time, throughput, availability, and
    security, and/or co-allocation of multiple
    resource types to meet complex user demands, so
    that the utility of the combined system is
    significantly greater than that of the sum of its
    parts.)

5
A bit of (My) short history in Grids
  • In the 80 and early 90 the accent was on
    client-server and meta-computing.
  • In 1998 I. Foster K. Kesselman - The Grid
    Blueprint for a New Computing Infrastructure
    Globus project (www.globus.org).
  • First GRID presentation in CHEP98 Chicago.
  • 1999 2000 INFN-GRID Project started based on
    Globus, GridPP in UK, CHEP2000 in Padova.
  • 2000 - 2003 First EU Project DataGRID and PPDG
    GRIPHYN in US.
  • 2003 2006 EGEE Project in EU and OSG in US
  • Many other projects in many countries (Japan,
    China, etc.)

6
A GRID for LHC and HEP
  • We got involved in Grids to solve the huge LHC
    computational problem which was, at that time,
    starting to be investigated (after an initial
    under-evaluation).
  • In the late 90 client-server and meta-computing
    were the frontier and Computer Farms were just
    started (Beowulf).
  • The largest problem anyway was the huge amount of
    data expected to be produced and analyzed (PB).
  • The social challenge was to allow thousands of
    physicists to access those data easily from tens
    of countries in different continents.

7
LHC Computational Problem
Several PetaBytes (1015 Bytes) of Data every Year
8
Estimate of Computing needs at CERN for LHC (2000)
9
Extension of Web Paradigm
Web Uniform Access to Information and Documents
Software catalogs
Sensor nets
Grid Flexible and High Performance access to
(any kind of) resources
Computers
Data Stores
Colleagues
On-demand creation of powerful virtual computing
and data systems
10
A Power GRID for Computing
11
DataGRID Layered structure (2000)
12
Natural HEP FarmingHigh Throughput Computing
13
GRID Computing Farm
14
Strategia di base di GRID
  • Usare il più possibile ciò che già esiste
  • Network Internet and TCP/IP
  • Protocols http, TCP, UDP, .
  • Operating Systems Linux, Solaris, ..
  • Batch Systems PBS, LSF, Condor, ..
  • Storage Disks, HPSS, HSM, CASTOR, ..
  • Directory Services LDAP, .
  • Certificates X509
  • Creare uno strato software (middleware) per
    interfacciare i servizi.

15
Middleware structure
  • Applications have access both to Higher-level
    Grid Services and to Foundation Grid Middleware
  • Higher-Level Grid Services are supposed to help
    the users building their computing infrastructure
    but should not be mandatory
  • Foundation Grid Middleware will be deployed on
    the EGEE infrastructure
  • Must be complete and robust
  • Should allow interoperation with other major grid
    infrastructures
  • Should not assume the use of Higher-Level Grid
    Services

Applications
Higher-Level Grid Services

Workload Management Replica Management Visualizat
ion Workflow Grid Economies ...
Foundation Grid Middleware Security model and
infrastructure Computing (CE) and Storage
Elements (SE) Accounting Information and
Monitoring
Overview paper http//doc.cern.ch//archive/electro
nic/egee/tr/egee-tr-2006-001.pdf
16
Some history LHC ? EGEE Grid
  • 1999 Monarc Project
  • Early discussions on how to organise distributed
    computing for LHC
  • 2000 growing interest in grid technology
  • HEP community was the driver in launching the
    DataGrid project
  • 2001-2004 - EU DataGrid project
  • middleware testbed for an operational grid
  • 2002-2005 LHC Computing Grid LCG
  • deploying the results of DataGrid to provide a
  • production facility for LHC experiments
  • 2004-2006 EU EGEE project phase 1
  • starts from the LCG grid
  • shared production infrastructure
  • expanding to other communities and sciences
  • 2006-2008 EU EGEE-II
  • Building on phase 1
  • Expanding applications and communities
  • and in the future Worldwide grid
    infrastructure??
  • Interoperating and co-operating infrastructures?

17
The release of gLite 3.0
  • Convergence of LCG 2.7.0 and gLite 1.5.0 in
    spring 2006
  • Continuity on the production infrastructure
    ensured usability by applications
  • Initial focus on the new Job Management
  • Thorough testing and optimization together with
    the applications
  • Migration to the ETICS build system
  • ETICS project started in January
  • Reorganization of the work according to the new
    process
  • EGEE Technical Coordination Group and Task Forces
  • Start of the EGEE SA3 Activity for integration
    and certification
  • Continuous release process
  • No big-bang releases!

18
Production service
  • Size of the infrastructure today
  • 192 sites in 40 countries
  • 25 000 CPU
  • 3 PB disk, tape MSS

19
EGEE Resources
20
(Some) GRID Services
  • Workload Management System (Resource Broker)
    chooses the best resources matching the user
    requirements.
  • Virtual Organization Management System allows to
    map User Certificates with VOs describing rights
    and roles of the users.
  • Data Oriented Services Data Meta-data
    Catalogs, Data Mover, Replica Manager, etc.
  • Information Monitoring Services which allow to
    know which resources and services are available
    and where.
  • Accounting services to extract resource usage
    level related to users or group of users and
    VOs.

21
Job Submission with Brokering
22
Security VOMS
  • GRID usa per la autenticazione degli utenti i
    certificati digitali (X509) con servizio
    Globus/GSI ed un sistema di mapping fra questi e
    gli UID, GID locali dei sistemi.
  • GRID si propone di facilitare luso di risorse
    distribuite da parte di comunità virtuali legate
    ad un tipo di attività, e/o ad esperimenti/progett
    i.
  • E perciò necessario un sistema che non solo
    garantisca AAA (Authentication, Authorization,
    Accounting), in maniera pressochè uniforme, ma
    sia in grado di gestire policy di accesso anche
    complesse.
  • Il Virtual Organization Management System cerca
    di rispondere a queste esigenze facilitando la
    gestione delle comunità virtuali e le definizioni
    di ruoli e gruppi di utenti che verranno usate
    per gestire le policy di accesso alle risorse.
  • Luso di VOMS permette anche ai possessori delle
    risorse di definire quali comunità (Virtual
    Organizations) possono accedere alle proprie
    risorse e con quali limiti.

23
Public Key Infrastructure
  • Based on asymmetric algorithms
  • Two keys private key and public key
  • It is almost impossible to derive private from
    public.
  • Data encrypted with one key can be only decrypted
    with the other.

24
GRID Security Infrastructure
  • Based on Public key infrastructure (PKI)
  • Certification of Personal Identity a key ltgt a
    user / physical person
  • PKI asymmetric encryption
  • X509 certificate
  • Identification of Computers and Services with PKI
    Certificates.

25
X.509 Certificate
  • ITU-T standard for PKI
  • X.509 IETF PKI cert CRL of X.509v3 standard

? Certificate ? Version ? Serial Number ?
Algorithm ID ? Issuer ? Validity ? Subject ?
Subject Public Key Info ? Public Key Algorithm ?
Subject Public Key ? Issuer Unique Identifier
(Optional) ? Subject Unique Identifier
(Optional) ? Extensions (Optional) ? ... ?
Certificate Signature Algorithm ? Certificate
Signature
26
CA Proxy
  • National/Regional Certification Authorities issue
    Certificates.
  • Usage of short lived Proxy Certificates to avoid
    real Certificates to be archived together with
    the applications (Delegation).

grid-proxy-init Your identity
/CIT/OINFN/OUPersonal/L Enter GRID pass
phrase for this identity Creating proxy
.............................................
Done Your proxy is valid until Thu Aug 31
215618 2006
27
Architettura di VOMS
Attributes Group (hierarchically organized) Role
(admin, staff, student, ..) Capability (free-form
string)
28
IS Components

Abbreviations BDII Berkeley DataBase
Information Index GIIS Grid Index Information
Server GRIS Grid Resource Information Server
Each site can run a BDII. It
collects the information coming from the GIISs
At each site, a site GIIS collects the
information given by the GRISs Local
GRISes run on CEs and SEs at each site and report
dynamic and static information
From LCG2.3.0 site GIIS has been replaced
by local BDII
29
WMS JDL
  • The Workload Management System (WMS) is the gLite
    3.0 component that allows users to submit jobs,
    and performs all tasks required to execute them,
    without exposing the user to the complexity of
    the Grid.
  • The JDL attributes described in this document are
    the ones supported when the submission to the WMS
    is performed through the legacy Network Server
    interface, i.e. using the python command line
    interface or the C/Java API of the gLite WMS-UI
    subsystem . It basically represents a subset of
    the whole set of attributes supported by the WMS
    when accessed via the new web services based
    interface (WMProxy).

30
JDL example
  • Type"Job"
  • JobType"Normal"
  • Executable ls"
  • StdError "stderr.log"
  • StdOutput "stdout.log"
  • Arguments -lrt"
  • InputSandbox "start_hostname.sh"
  • OutputSandbox "stderr.log", "stdout.log"
  • RetryCount 7
  • VirtualOrganisationATLAS"

31
Job Types
  • Normal a simple batch job
  • Interactive a job whose standard streams are
    forwarded to the submitting client
  • MPICH a parallel application using MPICH-P4
    implementation of MPI
  • Partitionable set of independent sub-jobs, each
    one taking care of a step or of a sub-set of
    steps, and which can be executed in
    parallel
  • Checkpointable a job able to save its state, so
    that the job execution can be suspended and
    resumed later, starting from the same point
    where it was first stopped.

32
User Interface (UI)
33
Applications
  • Many applications from a growing number of
    domains
  • Astrophysics
  • Computational Chemistry
  • Earth Sciences
  • Financial Simulation
  • Fusion
  • Geophysics
  • High Energy Physics
  • Life Sciences
  • Multimedia
  • Material Sciences

Applications have moved from testing to routine
and daily usage 80-90 efficiency
34
(No Transcript)
35
ARGO Data Archivese Data Catlog Sync.
36
Biology Applications The never born proteins
  • Natural proteins are only a tiny fraction of the
    possible ones
  • Approx. 1013 natural proteins vs 20100 possible
    proteins with a chain length of 100 amino acids !
  • Does the subset of natural proteins has
    particular properties?
  • Do exist in principle protein scaffolds with
    novel structure and/or activity not yet exploited
    by Nature?
  • GRID technology allows to tackle the problem
    through high throughput prediction of protein
    structure of a large library of never born
    proteins

37
Where grids can help addressing neglected diseases
  • Contribute to the development and deployment of
    new drugs and vaccines
  • Improve collection of epidemiological data for
    research (modeling, molecular biology)
  • Improve the deployment of clinical trials on
    plagued areas
  • Speed-up drug discovery process (in silico
    virtual screening)
  • Improve disease monitoring
  • Monitor the impact of policies and programs
  • Monitor drug delivery and vector control
  • Improve epidemics warning and monitoring system
  • Improve the ability of developing countries to
    undertake health innovation
  • Strengthen the integration of life science
    research laboratories in the world community
  • Provide access to resources
  • Provide access to bioinformatics services

38
In silico Drug Discovery
  • Scientific objectives
  • Provide docking information helping in search for
    new drugs.
  • Biological goal propose new inhibitors (drug
    candidates) addressed to neglected diseases.
  • Bioinformatics goal in silico virtual screening
    of drug candidate DBs.
  • Grid goal demonstrate to the research
    communities active in the area of drug discovery
    the relevance of grid infrastructures through the
    deployment of a compute intensive application.
  • Method
  • Large scale molecular docking on malaria
  • to compute million of potential drugs with
  • some software and parameters settings.
  • Docking is about computing the binding
  • energy of a protein target to a library of
  • potential drugs using a scoring algorithm.

39
The virtual screening pipeline
Grid service customers
DC 1 DC2 on neglected diseases DC on avian flu
Check point
Check point
Check point
Biology teams
Chemist/biologist teams
hits
Selected hits
target
Docking services
Annotation services
Grid infrastructure
MD service
Grid service providers
Chimioinformatics teams
Bioinformatics teams
40
Collaborating e-infrastructures
Potential for linking 80 countries by 2008
41
Social Impact
  • A large part of the Globe has not advanced
    digital infrastructures yet.
  • The European Research Area program wants to set
    Europe as the most advanced region in
    eInfrastructures and promote the take-up to speed
    of other less advanced countries to alleviate as
    much as possible the so called Digital Divide.
  • eInfrastructures support wide geographically
    distributed communities which share problems and
    resources to work towards common goals -gt enhance
    international collaboration of scientists -gt
    promote collaboration in other fields.
  • Problems too big to be handled with conventional
    local computer clusters and time sharing
    computing centers can be attacked with GRIDs.
  • eInfrastructures are leveraging international
    network interconnectivity -gt High Bandwidth
    connections will improve exchange of knowledge
    and be the basis for GRID Infrastructures.
  • Based on safe AAA (Authentication, Authorization
    and Accounting) architecture -gt secure and
    dependable infrastructures.
  • Need of persistent software middleware -gt
    Software is integral part of the infrastructure.

42
Digital Divide
http//maps.maplecroft.com/
43
Docking on Malaria
  • Of a large interest for many developing
    countries.
  • Based on Grid-enabled drug discovery process.
  • Data challenge proposal never done on a large
    scale production infrastructure and for a
    neglected disease
  • 5 different structures of the most promising
    target
  • Output Data 16,5 million results, 10 TB

44
First initiative on in silico drug discovery
against emerging diseases
  • Spring 2006 drug design against H5N1
    neuraminidase involved in virus propagation
  • impact of selected point mutations on the
    efficiency of existing drugs
  • identification of new potential drugs acting on
    mutated N1
  • Partners LPC, Fraunhofer SCAI, Academia Sinica
    of Taiwan, ITB, Unimo University, CMBA,
    CERN-ARDA, HealthGrid
  • Grid infrastructures EGEE, Auvergrid, TWGrid
  • European projects EGEE-II, Embrace, BioinfoGrid,
    Share, Simdat

45
EUMEDGRID Geography
46
(No Transcript)
47
INCO International Scientific Cooperation
Projects SUSTAINABLE WATER MANAGEMENT
INMEDITERRANEAN COASTAL AQUIFERSRecharge
Assessment and Modelling Issues(SWIMED)ICA3-CT20
02-10004 http//www.crs4.it/EIS/SWIMED/menu/index
.html
  • Partnership
  • UGR, Spain (Coordinator)
  • IMFT, France
  • UNINE, Switzerland
  • CRS4, Italy
  • EMI, Morocco
  • UG, Palestinian Authority
  • INAT, Tunisia
  • UB, Spain

48
Paramètres hydrodynamiques
Transmissivités de laquifère
Altitude du substratum
Manque remarquable de données de transmissivité !!
49
ArchaeoGRID (1/2)
50
ArchaeoGRID (2/2)
51
Priorities
  • The obvious questions are
  • Is digital infrastructure a real priority for
    them ?
  • Dont they have much more urgent, basic and
    compelling needs ?
  • If you have a limited budget, which is the
    priority of these investments in respect to more
    vital ones ?

52
The Advancement chain
  • Its obvious that fundamental needs are food,
    water, medical services, etc.
  • Although they are fundamental in the short term,
    a long term solution cant be build only around
    those activities of feeding the system.
  • A Chinese pillow of wisdom if you give a fish
    to a hungry man you feed him for a while, but if
    teach him how to fish, you feed him for life.
  • Other activities are necessary to create
    favorable conditions for a sustainable growth
  • Agriculture developments are needed to start
    producing food and employment depending on the
    specific local situation.
  • Industry will be necessary to start social
    innovation and an improvement of the quality of
    life.
  • Technology will be the indispensable element
    which will promote new products and industrial
    innovation.
  • Science is a fundamental component to produce
    technology and long term innovation.
  • Digital Infrastructures are necessary to allow
    researches to participate to frontier scientific
    activities and to be up to speed with the most
    recent tools and methods.
  • So at the root of the activities, with a long
    term impact, digital infrastructures play an
    important role.

53
How to budget
  • Limiting ourselves to feeding the system will be
    an endless investment which will not help to
    solve the problem in the long term.
  • The investment has to be understood and evaluated
    on several (tens of) years and should have a
    figure of merit respect to the obtained results
    and the sustainability of future activities.
  • All the previous elements of the chain should be
    investigated and, at different levels of
    investments, promoted in parallel with different
    time scales objectives.

54
Cosa manca
  • Le GRID non hanno ancora avuto una completa
    investitura da parte del mondo industriale. IBM,
    SUN, Oracle, ecc. hanno dei prodotti grid like,
    ma siamo lontani ancora dal successo travolgente
    del web.
  • Linfrastruttura (gestione e manutenzione)
    richiede risorse, principalmente umane, che
    costano. Si pone il problema della sostenibilità
    nel lungo termine.
  • Le risorse a disposizione sono molte, ma ancora
    poche sono quelle diverse da sistemi di calcolo
    (Es. Radio Telescopi, Osservatori e sensori,
    Grandi Apparati di Fusione, ecc.).
  • La Fisica delle A.E. non può ancora passare il
    testimone ad organizzazioni più larghe che si
    occupino delle infrastrutture mentre i Fisici
    tornerebbero ad occuparsi delle applicazioni agli
    esperimenti.

55
Conclusions
  • Grids infrastructure are an expanding reality.
  • They can stimulate new aggregations of scientists
    working together on new challenges which are now
    made affordable.
  • They are the basis of eInfrastructures which can
    promote high bandwidth networks and make a little
    step forward to fight Digital Divide in the
    developing countries.
  • But nothing comes for free you need to know who
    is using the (your) resources and for which
    purpose. You need security, accounting and,
    eventually, billing systems.
  • Long Term Sustainability of such a huge
    investment needs Governmental Priorities and a
    strong Industrial Uptake.
Write a Comment
User Comments (0)
About PowerShow.com