Title: Tecnologia ed impatto sociale delle GRID
1Tecnologia ed impatto sociale delle GRID
- Federico Ruggieri INFN
- Roma La Sapienza 18 Ottobre 2006
2Indice
- Cosè Grid ?
- Dalla Fisica A.E. alle altre Scienze
- La Infrastruttura di GRID Europea (EGEE)
- Le Applicazioni e le Comunità Scientifiche
- Le Aree Geografiche
- Impatto Sociale e Digital Divide
- Luci ed Ombre
- Conclusioni
3What GRID is supposed to be
A computational grid is a hardware and software
infrastructure that provides dependable,
consistent, pervasive, and inexpensive access to
high-end computational capabilities. I. Foster
K. Kesselman - The Grid Blueprint for a New
Computing Infrastructure Morgan Kaufman 1998.
- A dependable infrastructure that can facilitate
the usage of distributed resources by many groups
of distributed persons or Virtual Organizations. - The GRID paradigm is an extension of the WEB one,
which was originally limited to distributed
access to distributed information and documents. - The classical example is the Power GRID you plug
in and receive power you dont know (and you
dont care) where it comes from.
4A Grid Checklist
- Ian Foster more recently suggested that GRID is a
system that - 1) coordinates resources that are not subject to
centralized control (A Grid integrates and
coordinates resources and users that live within
different control domainsfor example, the users
desktop vs. central computing different
administrative units of the same company or
different companies and addresses the issues of
security, policy, payment, membership, and so
forth that arise in these settings. Otherwise, we
are dealing with a local management system.) - 2) using standard, open, general-purpose
protocols and interfaces (A Grid is built from
multi-purpose protocols and interfaces that
address such fundamental issues as
authentication, authorization, resource
discovery, and resource access. As I discuss
further below, it is important that these
protocols and interfaces be standard and open.
Otherwise, we are dealing with an
applicationspecific system.) - 3) to deliver nontrivial qualities of service.
(A Grid allows its constituent resources to be
used in a coordinated fashion to deliver various
qualities of service, relating for example to
response time, throughput, availability, and
security, and/or co-allocation of multiple
resource types to meet complex user demands, so
that the utility of the combined system is
significantly greater than that of the sum of its
parts.)
5A bit of (My) short history in Grids
- In the 80 and early 90 the accent was on
client-server and meta-computing. - In 1998 I. Foster K. Kesselman - The Grid
Blueprint for a New Computing Infrastructure
Globus project (www.globus.org). - First GRID presentation in CHEP98 Chicago.
- 1999 2000 INFN-GRID Project started based on
Globus, GridPP in UK, CHEP2000 in Padova. - 2000 - 2003 First EU Project DataGRID and PPDG
GRIPHYN in US. - 2003 2006 EGEE Project in EU and OSG in US
- Many other projects in many countries (Japan,
China, etc.)
6A GRID for LHC and HEP
- We got involved in Grids to solve the huge LHC
computational problem which was, at that time,
starting to be investigated (after an initial
under-evaluation). - In the late 90 client-server and meta-computing
were the frontier and Computer Farms were just
started (Beowulf). - The largest problem anyway was the huge amount of
data expected to be produced and analyzed (PB). - The social challenge was to allow thousands of
physicists to access those data easily from tens
of countries in different continents.
7LHC Computational Problem
Several PetaBytes (1015 Bytes) of Data every Year
8Estimate of Computing needs at CERN for LHC (2000)
9Extension of Web Paradigm
Web Uniform Access to Information and Documents
Software catalogs
Sensor nets
Grid Flexible and High Performance access to
(any kind of) resources
Computers
Data Stores
Colleagues
On-demand creation of powerful virtual computing
and data systems
10A Power GRID for Computing
11DataGRID Layered structure (2000)
12Natural HEP FarmingHigh Throughput Computing
13GRID Computing Farm
14Strategia di base di GRID
- Usare il più possibile ciò che già esiste
- Network Internet and TCP/IP
- Protocols http, TCP, UDP, .
- Operating Systems Linux, Solaris, ..
- Batch Systems PBS, LSF, Condor, ..
- Storage Disks, HPSS, HSM, CASTOR, ..
- Directory Services LDAP, .
- Certificates X509
- Creare uno strato software (middleware) per
interfacciare i servizi.
15Middleware structure
- Applications have access both to Higher-level
Grid Services and to Foundation Grid Middleware - Higher-Level Grid Services are supposed to help
the users building their computing infrastructure
but should not be mandatory - Foundation Grid Middleware will be deployed on
the EGEE infrastructure - Must be complete and robust
- Should allow interoperation with other major grid
infrastructures - Should not assume the use of Higher-Level Grid
Services
Applications
Higher-Level Grid Services
Workload Management Replica Management Visualizat
ion Workflow Grid Economies ...
Foundation Grid Middleware Security model and
infrastructure Computing (CE) and Storage
Elements (SE) Accounting Information and
Monitoring
Overview paper http//doc.cern.ch//archive/electro
nic/egee/tr/egee-tr-2006-001.pdf
16Some history LHC ? EGEE Grid
- 1999 Monarc Project
- Early discussions on how to organise distributed
computing for LHC - 2000 growing interest in grid technology
- HEP community was the driver in launching the
DataGrid project - 2001-2004 - EU DataGrid project
- middleware testbed for an operational grid
- 2002-2005 LHC Computing Grid LCG
- deploying the results of DataGrid to provide a
- production facility for LHC experiments
- 2004-2006 EU EGEE project phase 1
- starts from the LCG grid
- shared production infrastructure
- expanding to other communities and sciences
- 2006-2008 EU EGEE-II
- Building on phase 1
- Expanding applications and communities
- and in the future Worldwide grid
infrastructure?? - Interoperating and co-operating infrastructures?
17The release of gLite 3.0
- Convergence of LCG 2.7.0 and gLite 1.5.0 in
spring 2006 - Continuity on the production infrastructure
ensured usability by applications - Initial focus on the new Job Management
- Thorough testing and optimization together with
the applications - Migration to the ETICS build system
- ETICS project started in January
- Reorganization of the work according to the new
process - EGEE Technical Coordination Group and Task Forces
- Start of the EGEE SA3 Activity for integration
and certification - Continuous release process
- No big-bang releases!
18Production service
- Size of the infrastructure today
- 192 sites in 40 countries
- 25 000 CPU
- 3 PB disk, tape MSS
19EGEE Resources
20(Some) GRID Services
- Workload Management System (Resource Broker)
chooses the best resources matching the user
requirements. - Virtual Organization Management System allows to
map User Certificates with VOs describing rights
and roles of the users. - Data Oriented Services Data Meta-data
Catalogs, Data Mover, Replica Manager, etc. - Information Monitoring Services which allow to
know which resources and services are available
and where. - Accounting services to extract resource usage
level related to users or group of users and
VOs.
21Job Submission with Brokering
22Security VOMS
- GRID usa per la autenticazione degli utenti i
certificati digitali (X509) con servizio
Globus/GSI ed un sistema di mapping fra questi e
gli UID, GID locali dei sistemi. - GRID si propone di facilitare luso di risorse
distribuite da parte di comunità virtuali legate
ad un tipo di attività , e/o ad esperimenti/progett
i. - E perciò necessario un sistema che non solo
garantisca AAA (Authentication, Authorization,
Accounting), in maniera pressochè uniforme, ma
sia in grado di gestire policy di accesso anche
complesse. - Il Virtual Organization Management System cerca
di rispondere a queste esigenze facilitando la
gestione delle comunità virtuali e le definizioni
di ruoli e gruppi di utenti che verranno usate
per gestire le policy di accesso alle risorse. - Luso di VOMS permette anche ai possessori delle
risorse di definire quali comunità (Virtual
Organizations) possono accedere alle proprie
risorse e con quali limiti.
23Public Key Infrastructure
- Based on asymmetric algorithms
- Two keys private key and public key
- It is almost impossible to derive private from
public. - Data encrypted with one key can be only decrypted
with the other.
24GRID Security Infrastructure
- Based on Public key infrastructure (PKI)
- Certification of Personal Identity a key ltgt a
user / physical person - PKI asymmetric encryption
- X509 certificate
- Identification of Computers and Services with PKI
Certificates.
25X.509 Certificate
- ITU-T standard for PKI
- X.509 IETF PKI cert CRL of X.509v3 standard
? Certificate ? Version ? Serial Number ?
Algorithm ID ? Issuer ? Validity ? Subject ?
Subject Public Key Info ? Public Key Algorithm ?
Subject Public Key ? Issuer Unique Identifier
(Optional) ? Subject Unique Identifier
(Optional) ? Extensions (Optional) ? ... ?
Certificate Signature Algorithm ? Certificate
Signature
26CA Proxy
- National/Regional Certification Authorities issue
Certificates. - Usage of short lived Proxy Certificates to avoid
real Certificates to be archived together with
the applications (Delegation).
grid-proxy-init Your identity
/CIT/OINFN/OUPersonal/L Enter GRID pass
phrase for this identity Creating proxy
.............................................
Done Your proxy is valid until Thu Aug 31
215618 2006
27Architettura di VOMS
Attributes Group (hierarchically organized) Role
(admin, staff, student, ..) Capability (free-form
string)
28IS Components
Abbreviations BDII Berkeley DataBase
Information Index GIIS Grid Index Information
Server GRIS Grid Resource Information Server
Each site can run a BDII. It
collects the information coming from the GIISs
At each site, a site GIIS collects the
information given by the GRISs Local
GRISes run on CEs and SEs at each site and report
dynamic and static information
From LCG2.3.0 site GIIS has been replaced
by local BDII
29WMS JDL
- The Workload Management System (WMS) is the gLite
3.0 component that allows users to submit jobs,
and performs all tasks required to execute them,
without exposing the user to the complexity of
the Grid. - The JDL attributes described in this document are
the ones supported when the submission to the WMS
is performed through the legacy Network Server
interface, i.e. using the python command line
interface or the C/Java API of the gLite WMS-UI
subsystem . It basically represents a subset of
the whole set of attributes supported by the WMS
when accessed via the new web services based
interface (WMProxy).
30JDL example
- Type"Job"
- JobType"Normal"
- Executable ls"
- StdError "stderr.log"
- StdOutput "stdout.log"
- Arguments -lrt"
- InputSandbox "start_hostname.sh"
- OutputSandbox "stderr.log", "stdout.log"
- RetryCount 7
- VirtualOrganisationATLAS"
31Job Types
- Normal a simple batch job
- Interactive a job whose standard streams are
forwarded to the submitting client - MPICH a parallel application using MPICH-P4
implementation of MPI - Partitionable set of independent sub-jobs, each
one taking care of a step or of a sub-set of
steps, and which can be executed in
parallel - Checkpointable a job able to save its state, so
that the job execution can be suspended and
resumed later, starting from the same point
where it was first stopped.
32User Interface (UI)
33Applications
- Many applications from a growing number of
domains - Astrophysics
- Computational Chemistry
- Earth Sciences
- Financial Simulation
- Fusion
- Geophysics
- High Energy Physics
- Life Sciences
- Multimedia
- Material Sciences
-
Applications have moved from testing to routine
and daily usage 80-90 efficiency
34(No Transcript)
35ARGO Data Archivese Data Catlog Sync.
36Biology Applications The never born proteins
- Natural proteins are only a tiny fraction of the
possible ones - Approx. 1013 natural proteins vs 20100 possible
proteins with a chain length of 100 amino acids !
- Does the subset of natural proteins has
particular properties? - Do exist in principle protein scaffolds with
novel structure and/or activity not yet exploited
by Nature? - GRID technology allows to tackle the problem
through high throughput prediction of protein
structure of a large library of never born
proteins
37Where grids can help addressing neglected diseases
- Contribute to the development and deployment of
new drugs and vaccines - Improve collection of epidemiological data for
research (modeling, molecular biology) - Improve the deployment of clinical trials on
plagued areas - Speed-up drug discovery process (in silico
virtual screening) - Improve disease monitoring
- Monitor the impact of policies and programs
- Monitor drug delivery and vector control
- Improve epidemics warning and monitoring system
- Improve the ability of developing countries to
undertake health innovation - Strengthen the integration of life science
research laboratories in the world community - Provide access to resources
- Provide access to bioinformatics services
38In silico Drug Discovery
- Scientific objectives
- Provide docking information helping in search for
new drugs. - Biological goal propose new inhibitors (drug
candidates) addressed to neglected diseases. - Bioinformatics goal in silico virtual screening
of drug candidate DBs. - Grid goal demonstrate to the research
communities active in the area of drug discovery
the relevance of grid infrastructures through the
deployment of a compute intensive application. - Method
- Large scale molecular docking on malaria
- to compute million of potential drugs with
- some software and parameters settings.
- Docking is about computing the binding
- energy of a protein target to a library of
- potential drugs using a scoring algorithm.
39The virtual screening pipeline
Grid service customers
DC 1 DC2 on neglected diseases DC on avian flu
Check point
Check point
Check point
Biology teams
Chemist/biologist teams
hits
Selected hits
target
Docking services
Annotation services
Grid infrastructure
MD service
Grid service providers
Chimioinformatics teams
Bioinformatics teams
40Collaborating e-infrastructures
Potential for linking 80 countries by 2008
41Social Impact
- A large part of the Globe has not advanced
digital infrastructures yet. - The European Research Area program wants to set
Europe as the most advanced region in
eInfrastructures and promote the take-up to speed
of other less advanced countries to alleviate as
much as possible the so called Digital Divide. - eInfrastructures support wide geographically
distributed communities which share problems and
resources to work towards common goals -gt enhance
international collaboration of scientists -gt
promote collaboration in other fields. - Problems too big to be handled with conventional
local computer clusters and time sharing
computing centers can be attacked with GRIDs. - eInfrastructures are leveraging international
network interconnectivity -gt High Bandwidth
connections will improve exchange of knowledge
and be the basis for GRID Infrastructures. - Based on safe AAA (Authentication, Authorization
and Accounting) architecture -gt secure and
dependable infrastructures. - Need of persistent software middleware -gt
Software is integral part of the infrastructure.
42Digital Divide
http//maps.maplecroft.com/
43Docking on Malaria
- Of a large interest for many developing
countries. - Based on Grid-enabled drug discovery process.
- Data challenge proposal never done on a large
scale production infrastructure and for a
neglected disease - 5 different structures of the most promising
target - Output Data 16,5 million results, 10 TB
44First initiative on in silico drug discovery
against emerging diseases
- Spring 2006 drug design against H5N1
neuraminidase involved in virus propagation - impact of selected point mutations on the
efficiency of existing drugs - identification of new potential drugs acting on
mutated N1
- Partners LPC, Fraunhofer SCAI, Academia Sinica
of Taiwan, ITB, Unimo University, CMBA,
CERN-ARDA, HealthGrid - Grid infrastructures EGEE, Auvergrid, TWGrid
- European projects EGEE-II, Embrace, BioinfoGrid,
Share, Simdat
45EUMEDGRID Geography
46(No Transcript)
47INCO International Scientific Cooperation
Projects SUSTAINABLE WATER MANAGEMENT
INMEDITERRANEAN COASTAL AQUIFERSRecharge
Assessment and Modelling Issues(SWIMED)ICA3-CT20
02-10004 http//www.crs4.it/EIS/SWIMED/menu/index
.html
- Partnership
- UGR, Spain (Coordinator)
- IMFT, France
- UNINE, Switzerland
- CRS4, Italy
- EMI, Morocco
- UG, Palestinian Authority
- INAT, Tunisia
- UB, Spain
48Paramètres hydrodynamiques
Transmissivités de laquifère
Altitude du substratum
Manque remarquable de données de transmissivité !!
49ArchaeoGRID (1/2)
50ArchaeoGRID (2/2)
51Priorities
- The obvious questions are
- Is digital infrastructure a real priority for
them ? - Dont they have much more urgent, basic and
compelling needs ? - If you have a limited budget, which is the
priority of these investments in respect to more
vital ones ?
52The Advancement chain
- Its obvious that fundamental needs are food,
water, medical services, etc. - Although they are fundamental in the short term,
a long term solution cant be build only around
those activities of feeding the system. - A Chinese pillow of wisdom if you give a fish
to a hungry man you feed him for a while, but if
teach him how to fish, you feed him for life. - Other activities are necessary to create
favorable conditions for a sustainable growth - Agriculture developments are needed to start
producing food and employment depending on the
specific local situation. - Industry will be necessary to start social
innovation and an improvement of the quality of
life. - Technology will be the indispensable element
which will promote new products and industrial
innovation. - Science is a fundamental component to produce
technology and long term innovation. - Digital Infrastructures are necessary to allow
researches to participate to frontier scientific
activities and to be up to speed with the most
recent tools and methods. - So at the root of the activities, with a long
term impact, digital infrastructures play an
important role.
53How to budget
- Limiting ourselves to feeding the system will be
an endless investment which will not help to
solve the problem in the long term. - The investment has to be understood and evaluated
on several (tens of) years and should have a
figure of merit respect to the obtained results
and the sustainability of future activities. - All the previous elements of the chain should be
investigated and, at different levels of
investments, promoted in parallel with different
time scales objectives.
54Cosa manca
- Le GRID non hanno ancora avuto una completa
investitura da parte del mondo industriale. IBM,
SUN, Oracle, ecc. hanno dei prodotti grid like,
ma siamo lontani ancora dal successo travolgente
del web. - Linfrastruttura (gestione e manutenzione)
richiede risorse, principalmente umane, che
costano. Si pone il problema della sostenibilitÃ
nel lungo termine. - Le risorse a disposizione sono molte, ma ancora
poche sono quelle diverse da sistemi di calcolo
(Es. Radio Telescopi, Osservatori e sensori,
Grandi Apparati di Fusione, ecc.). - La Fisica delle A.E. non può ancora passare il
testimone ad organizzazioni più larghe che si
occupino delle infrastrutture mentre i Fisici
tornerebbero ad occuparsi delle applicazioni agli
esperimenti.
55Conclusions
- Grids infrastructure are an expanding reality.
- They can stimulate new aggregations of scientists
working together on new challenges which are now
made affordable. - They are the basis of eInfrastructures which can
promote high bandwidth networks and make a little
step forward to fight Digital Divide in the
developing countries. - But nothing comes for free you need to know who
is using the (your) resources and for which
purpose. You need security, accounting and,
eventually, billing systems. - Long Term Sustainability of such a huge
investment needs Governmental Priorities and a
strong Industrial Uptake.