Title: User communities and applications
1User communities and applications
- David Fergusson
- 28th February
2Enabling Grids for EsciencE
- What is the EGEE community?
- Researchers in eScience (applications NA4)
- eResearch
- European community
- World grid community
- Industry (industry forum)
- What is not the EGEE community?
3eScience/eResearch
- EGEEs initial focus is on specific scientific
communities - High Energy Physics (Large Hadron Collider)
- Biomedical
- Geology
- Chemistry
- Astrophysics
- Collaborating with other EU projects in other
areas - For example, digital libraries - DILIGENT
4Applications in EGEE
- Production service supporting multiple VOswith
different requirements - Data
- Volume
- Location distributed?
- Write Once or Update?
- Metadata archives?
- Controlled or open access?
- Computation
- High throughput ( current LCG)
- High performance, supercomputing
- No. of sites, scientists,
- Establish viable general process to bring other
scientific communities on board
5An EGEE community
- EGEE communities are based around the idea of
Virtual Organisations. - A Virtual Organisation
- Owns shared computing resources
- Authorises and authenticates its members access
to resources - Manages its own resources
6EGEE adding a VO
- EGEE has a formal procedure for adding selected
new user communities (Virtual Organisations) - Negotiation with one of the Regional Operations
Centres - Seek balance between the resources contributed by
a VO and those that they consume. - Resource allocation will be made at the VO level.
- Many resources need to be available to multiple
VOs shared use of resources is fundamental to a
Grid
7The role of the pilot applications HEP and
Biomedicine
- Initial area of focus to establish a strong user
base on which to build a broad EGEE user
community - Provide early feedback to the infrastructure
activities on their experience with application
deployment and VO management - Act as guinea pigs and provide early feedback to
the middleware developers on their experience
with new services
8EGEE pilot application Large Hadron Collider
- Data Challenge
- 10 Petabytes/year of data !!!
- 20 million CDs each year!
- Simulation, reconstruction, analysis
- LHC data handling requires computing power
equivalent to 100,000 of today's fastest PC
processors! - Operational challenges
- Reliable and scalable through project lifetime of
decades
Mont Blanc (4810 m)
Downtown Geneva
9The characteristics of pilot HEP applications
- Very large scale from project day 1
- Virtual Organizations were already set up at
project day 1 - Very centralized jobs are sent in a very
organized way - Multi-grid data challenges are deployed on
several grids - ALICE LCG, Alien
- ATLAS LCG, US Grid2003, Nordugrid
- CMS LCG, US Grid2003
- LHCb LCG, Dirac
10The Large Hadron Collider
11The LHC Experiments
12Overview of experiences with LHC data challenges
- There was continual evolution throughout 2004,
with LCG and experiments gaining more experience
in the development and use of an expanding LCG
grid - All experiments had excellent relations with
LCG-EIS support a model for the future support
of VOs - Global job efficiencies ranged from 60-80 as
experience developed must get up to 90 for
user analysis - look to new middleware
developments and tighter operational procedures - Sources of problems and losses
- Site configuration, management and stability
- Data Management (especially metadata handling)
- Difficult to monitor job running and causes of
failure - D0 in early 2005 showed that one can run with
good efficiency with a set of well controlled
sites
13EGEE pilot application BioMedical
- BioMedical
- Bioinformatics (gene/proteome databases
distributions) - Medical applications (screening, epidemiology,
image databases distribution, etc.) - Interactive application (human supervision or
simulation) - Security/privacy constraints
- Heterogeneous data formats - Frequent data
updates - Complex data sets - Long term archiving
- http//egee-na4.ct.infn.it/biomed/applications.htm
l
14The characteristics of biomedical pilot
applications
- Prototype level at project day 1
- VO was created after the project kicked-off
- Very decentralized application developers use
the grid at their own pace - Very demanding on services
- Compute intensive applications
- Applications requiring large amounts of short
jobs - Need for interactivity or guaranteed response
time - Resources were focused on the deployment of
large scale applications on LCG-2 - Integration of Biomed VO used to identify issues
relevant to all VOs to be deployed during EGEE
lifetime - Decentralized usage of the infrastructure
highlights different weaknesses from the more
centralized HEP data challenges
15Status of Biomedical VO
PADOVA
BARI
16Biomedical VO production jobs on EGEE
17Biomedical applications
- 3 batch-oriented applications ported on LCG2
- SiMRI3D medical image simulation
- xmipp_MLRefine molecular structure analysis
- GATE radiotherapy planning
- 3 high throughput applications ported on LCG2
- CDSS clinical decision support system
- GPS_at_ bioinformatics portal (multiple short jobs)
- gPTM3D radiology images analysis (interactivity)
- New applications to join in the near future
- Especially in the field of drug discovery
18EGEE pilot application BioMedical
- BioMedical
- Bioinformatics (gene/proteome databases
distributions) - Medical applications (screening, epidemiology,
image databases distribution, etc.) - Interactive application (human supervision or
simulation) - Security/privacy constraints
- Heterogeneous data formats - Frequent data
updates - Complex data sets - Long term archiving
- BioMed applications deployed
- GATE - Geant4 Application for Tomographic
Emission - GPS_at_ - genomic web portal
- CDSS - Clinical Decision Support System
19 12 Biomed applications
- GATE Geant4 Application for Tomographic Emission
(LPC) - Docking platform for tropical diseases
grid-enabled docking platform for in sillico drug
discovery (LPC) - CDSS Clinical Decision Support System (UPV)
- GPS_at_ Grid genomic web portal (IBCP)
- SiMRI 3D Magnetic Resonance Image simulator
(CREATIS) - gPTM 3D Interactive radiological image
visualization and processing tool (LRI) - xmipp_ML_refine Macromolecular 3D structure
analysis (CNB) - xmipp_multiple_CTFs Electronmicroscopic images
CTF calculation (CNB) - GridGRAMM Molecular Docking web (CNB)
- GROCK Mass screenings of molecular interaction
(CNB - Mammogrid Mammograms analysis (EU project)
- SPLATCHE Genome evolution modeling (U. Berne/WHO)
20...and more to come
- SPLATCHE
- first application being migrated from GILDA to
biomed VO - Pharmacokinetics in MRI (UPV)
- MRI registration for contrast agent diffusion
study - Some progress on biological sequences analysis
(M. Lexa) - ...
21BLAST comparing DNA or protein sequences
- BLAST is the first step for analysing new
sequences to compare DNA or protein sequences to
other ones stored in personal or public
databases. Ideal as a grid application. - Requires resources to store databases and run
algorithms - Can compare one or several sequence against a
database in parallel - Large user community
22Bio-medicine applications
- Bio-informatics
- Phylogenetics
- Search for primers
- Statistical genetics
- Bio-informatics web portal
- Parasitology
- Data-mining on DNA chips
- Geometrical protein comparison
- Medical imaging
- MR image simulation
- Medical data and metadata management
- Mammographies analysis
- Simulation platform for PET/SPECT
23Bio-medicine applications
24Bio-medicine applications
25Bio-medicine applications
26gPTM3D Grid-Enabling Interactive Medical
Analysis
Interaction
Render
Explore
Analyse
Interpret
Acquire
27Use case
Planning percutaneous nephrolithotomy
28Evolution of biomedical applications
- Growing interest of the biomedical community
- Partners involved proposing new applications
- New application proposals (in various
health-related areas) - Enlargement of the biomedical community (drug
discovery) - Growing scale of the applications
- Progressive migration from prototypes to
pre-production services for some applications - Increase in scale (volume of data and number of
CPU hours) - Towards pre-production
- Several initiatives to build user-friendly
portals and interfaces to existing applications
in order to open to an end-users community
29A look at the future the HealthGrid vision
In this context "Health" does not involve only
clinical practice but covers the whole range of
information from molecular level (genetic and
proteomic information) over cells and tissues, to
the individual and finally the population level
(social healthcare).
HealthGRID
Patient related data
Public Health
Databases
Association Modelling Computation
Public Health
Patient
Patient
Tissue, organ
Tissue, organ
Cell
Cell
Molecule
Molecule
Computational recommendation
INDIVIDUALISED HEALTHCARE MOLECULAR MEDICINE
30Earth Sciences in EGEE
- Research
- Earth observations by satellite
- (ESA(IT), KNMI(NL), IPSL(FR), UTV(IT),
RIVM(NL),SRON(NL)) - Climate
- DKRZ(GE),IPSL(FR)
- Solid Earth Physics
- IPGP (FR)
- Hydrology
- Neuchâtel University (CH)
- Industry
- CGG Geophysics Company (FR)
31Climate Applications in EGEE
- Model Atmosphere, Ocean, Hydrology, Atmospheric
and Marine chemistry. - Goal Comparison of model outputs from different
runs and/or institutes - Large volume of data (TB) from different model
outputs, and experimental data - Run made on supercomputer gt Link the EGEE
infrastruture with supercomputer Grids (DEISA)
EXAMPLE For the IPCC Assessment reports many
experiment are performed with different models
(different spatial resolution, different
time-step, different "physics" ..) and various
sites. The generated data need to be compared in
a comprehensive and "unified" way.
32Geophysics Applications
Seismic processing Generic Platform - Based on
Geocluster, an industrial application to be a
starter of the core member VO. - Include several
standard tools for signal processing, simulation
and inversion.
- - Opened any user can write new algorithms in
new modules (shared or not) - - Free for academic research
- Controlled by license keys (opportunity to
explore license issue at a grid level) - initial partners F, CH, UK, Russia, Norway
33Flood simulation
34Computational Chemistry molecular simulator
Ar - Benzene
35The MAGIC telescope
- Largest Imaging Air Cherenkov Telescope (17 m
mirror dish) - Located on Canary Island La Palma (_at_ 2200 m asl)
- Lowest energy threshold ever obtained with a
Cherenkov telescope - Aim detect ?ray sources in the unexplored
energy range 30 (10)-gt 300 GeV
36 The MAGIC Physics Program
- Cosmological g-Ray Horizon
- Tests of Quantum Gravity effects
37Feedback to LCG-2 middleware developers and
infrastructure
- From HEP applications
- Experiment Integration Support group and Grid
Applications Group produced documents summarizing
problems encountered in use of LCG-2 - From Biomed applications
- Very significant exchanges related to the set-up
of the biomed VO and the deployment of relevant
services - Request to use MPI
38Engineering applications
39Engineering applications
40Grid Applications art
41Who else can benefit from EGEE?
- EGEE Generic Applications Advisory Panel
- For new applications
- EU projects MammoGrid, Diligent, SEE-GRID
- Expression of interest Planck/Gaia
(astroparticle), SimDat (drug discovery) - http//agenda.cern.ch/age?a042351
- Next meeting at EGEE conference (November)
42New communities identification
- Through training, dissemination and outreach,
communities already using advanced computing and
keen to use EGEE infrastructure are identified - These communities are encouraged to prepare a
document describing their interest to use EGEE - A scientific advisory panel (EGAAP) assesses and
chooses among the interested communities the ones
which seem the most mature to deploy their
applications on EGEE
43GILDA, an infrastructure for dissemination and
demonstration
- Goals
- Demonstration of grid operation for tutorials and
outreach - Initial deployment of new applications for
testing purposes - Key features
- Initiative of the INFN Grid Project using LCG-2
middleware - On request, anyone can quickly receive a grid
certificate and a VO membership allowing them to
use the infrastructure for 2 weeks - Certificate expires after two weeks but can be
renewed - Use of friendly interface Genius grid portal
- Very important for the first steps of new user
communities on to the grid infrastructure
44GILDA numbers
- 14 sites in 2 continents
- gt1200 certificates issued, 10 renewed at least
once - gt35 tutorials and demos performed in 10 months
- gt25 jobs/day on the average
- Job success rate above 96
- gt320,000 hits on the web site from 10s of
different countries - gt200 copies of the UI live CD distributed in the
world
45NA4 Applications and GILDA
- 7 Virtual Organizations supported
- Biomed
- Earth Science Academy (ESR)
- Earth Science Industry (CGG)
- Astroparticle Physics (MAGIC)
- Computational Chemistry (GEMS)
- Grid Search Engines (GRACE)
- Astrophysics (PLANCK)
- Development of complete interfaces with GENIUS
for 3 Biomed Applications GATE, hadronTherapy,
and Friction/Arlecore - Development of complete interfaces with GENIUS
for 4 Generic Applications EGEODE (CGG), MAGIC,
GEMS, and CODESA-3D (ESR) (see demos!) - Development of complete interfaces with GENIUS
for 16 demonstrative applications available on
the GILDA Grid Demonstrator (https//grid-demo.ct.
infn.it)
46Summary
- EGEE and grids not just physics
- For communities to benefit they need to know what
grids can do for them dissemination - Many communities are beginning to adopt the grid
- EGEE has a mechanism for assisting communities
onto the grid
47Practical URLs
- homepages.nesc.ac.uk/gcw
- grid-demo.ct.infn.it