Status and evolution of the EGEE Project and its Grid Middleware

1 / 35
About This Presentation
Title:

Status and evolution of the EGEE Project and its Grid Middleware

Description:

Status and evolution of the EGEE Project and its Grid Middleware By Fr d ric Hemmer Middleware Manager CERN Geneva, Switzerland International Conference on –

Number of Views:244
Avg rating:3.0/5.0
Slides: 36
Provided by: Frdr63
Category:

less

Transcript and Presenter's Notes

Title: Status and evolution of the EGEE Project and its Grid Middleware


1
Status and evolution of the EGEE Project and its
Grid Middleware
  • By Frédéric Hemmer
  • Middleware Manager
  • CERN
  • Geneva, Switzerland

International Conference on Next Generation
Networks Brussels, Belgium June 2, 2005
2
Contents
  • The EGEE Project
  • Overview and Structure
  • Grid Operations
  • Middleware
  • Networking Activities
  • Applications
  • High Energy Physics
  • Biomedical
  • Summary and Conclusions

3
EGEE goals
  • Goal of EGEE develop a service grid
    infrastructure which is available to scientists
    24 hours-a-day
  • The project concentrates on
  • building a consistent, robust and secure Grid
    network that will attract additional computing
    resources
  • continuously improve and maintain the middleware
    in order to deliver a reliable service to users
  • attracting new users from industry as well as
    science and ensure they receive the high standard
    of training and support they need

4
EGEE
  • EGEE is the largest Grid
  • infrastructure project in Europe
  • 70 leading institutions in 27 countries,
    federated in regional Grids
  • Leveraging national and regional grid activities
  • 32 M Euros EU funding for initially 2 years
    starting 1st April 2004
  • EU review, February 2005 successful
  • Preparing 2nd phase of the project proposal to
    EU Grid call September 2005
  • Promoting scientific partnership outside EU

5
EGEE Geographical Extensions
  • EGEE is a truly international under-taking
  • Collaborations with other existing European
    projects, in particular
  • GÉANT, DEISA, SEE-GRID
  • Relations to other projects/proposals
  • OSG OpenScienceGrid (USA)
  • Asia Korea, Taiwan, EU-ChinaGrid
  • BalticGrid Lithuania, Latvia, Estonia
  • EELA Latin America
  • EUMedGrid Mediterranean Area
  • Expansion of EGEE infrastructure in these regions
    is a key element for the future of the project
    and international science

6
EGEE Activities
  • 48 service activities (Grid Operations, Support
    and Management, Network Resource Provision)
  • 24 middleware re-engineering (Quality
    Assurance, Security, Network Services
    Development)
  • 28 networking (Management, Dissemination and
    Outreach, User Training and Education,
    Application Identification and Support, Policy
    and International Cooperation)

Emphasis in EGEE is on operating a
production grid and supporting the end-users
7
EGEE Activities
  • 48 service activities (Grid Operations, Support
    and Management, Network Resource Provision)
  • 24 middleware re-engineering (Quality
    Assurance, Security, Network Services
    Development)
  • 28 networking (Management, Dissemination and
    Outreach, User Training and Education,
    Application Identification and Support, Policy
    and International Cooperation)

Emphasis in EGEE is on operating a
production grid and supporting the end-users
8
Computing Resources April 2005
9
Infrastructure metrics
Countries, sites, and CPU available in EGEE
production service
Region coun-tries sites cpu M6 (TA) cpu M15 (TA) cpu actual
CERN 0 1 900 1800 1841
UK/Ireland 2 19 100 2200 2398
France 1 8 400 895 1172
Italy 1 21 553 679 2164
South East 5 16 146 322 159
South West 2 13 250 250 498
Central Europe 5 10 385 730 629
Northern Europe 2 4 200 2000 427
Germany/Switzerland 2 10 100 400 1733
Russia 1 9 50 152 276
EGEE-total 21 111 3084 9428 11297
USA 1 3 - - 555
Canada 1 6 - - 316
Asia-Pacific 6 8 - - 394
Hewlett-Packard 1 3 - - 172
Total other 9 20 - - 1437
Grand Total 30 131 - - 12734
EGEE partner regions
Other collaborating sites
10
Service Usage
  • VOs and users on the production service
  • Active HEP experiments
  • 4 LHC, D0, CDF, Zeus, Babar
  • Active other VO
  • Biomed, ESR (Earth Sciences), Compchem, Magic
    (Astronomy), EGEODE (Geo-Physics)
  • 6 disciplines
  • Registered users in these VO 600
  • In addition to these there are many VO that are
    local to a region, supported by their ROCs, but
    not yet visible across EGEE
  • Scale of work performed
  • LHC Data challenges 2004
  • gt1 M SI2K years of cpu time (1000 cpu years)
  • 400 TB of data generated, moved and stored
  • 1 VO achieved 4000 simultaneous jobs (4 times
    CERN grid capacity)

Number of jobs processed/month
11
Grid Operations
  • The grid is flat, but
  • Hierarchy of responsibility
  • Essential to scale the operation
  • CICs act as a single Operations Centre
  • Operational oversight (grid operator)
    responsibility
  • rotates weekly between CICs
  • Report problems to ROC/RC
  • ROC is responsible for ensuring problem is
    resolved
  • ROC oversees regional RCs
  • ROCs responsible for organising the operations in
    a region
  • Coordinate deployment of middleware, etc
  • CERN coordinates sites not associated with a ROC

RC - Resource Centre ROC - Regional Operations
Centre CIC Core Infrastructure Centre
12
Grid monitoring
  • Operation of Production Service real-time
    display of grid operations
  • Accounting information
  • Selection of Monitoring tools
  • GIIS Monitor Monitor Graphs
  • Sites Functional Tests
  • GOC Data Base
  • Scheduled Downtimes
  • Live Job Monitor
  • GridIce VO fabric view
  • Certificate Lifetime Monitor

13
LCG Deployment Schedule
14
LCG Service Challenges
  • Service Challenge 2
  • Throughput test from LCG Tier-0 to LCG Tier-1
    sites
  • Started 14th March
  • Set up Infrastructure to 7 Sites
  • NL, IN2P3, FNAL, BNL, FZK, INFN, RAL
  • 100MB/s to each site
  • 500MB/s combined to all sites at same time
  • 500MB/s to a few sites individually
  • Goal by end March05, sustained 500 MB/s at CERN

15
SC2 met its throughput targets
  • gt600MB/s daily average for 10 days was achieved -
    Midday 23rd March to Midday 2nd April
  • Not without outages, but system showed it could
    recover rate again from outages
  • Load reasonable evenly divided over sites (give
    network bandwidth constraints of Tier-1 sites)

16
Service Challenge 3
  • Throughput phase
  • 2 weeks sustained in July 2005
  • Primary goals
  • 150MB/s disk disk to Tier1s
  • 60MB/s disk (T0) tape (T1s)
  • Secondary goals
  • Include a few named T2 sites (T2 -gt T1 transfers)
  • Encourage remaining T1s to start disk disk
    transfers
  • Service phase
  • September end 2005
  • Start with ALICE CMS, add ATLAS and LHCb
    October/November
  • All offline use cases except for analysis
  • More components WMS, VOMS, catalogs,
    experiment-specific solutions
  • Implies production setup (CE, SE, )

17
EGEE Activities
  • 48 service activities (Grid Operations, Support
    and Management, Network Resource Provision)
  • 24 middleware re-engineering (Quality
    Assurance, Security, Network Services
    Development)
  • 28 networking (Management, Dissemination and
    Outreach, User Training and Education,
    Application Identification and Support, Policy
    and International Cooperation)

Emphasis in EGEE is on operating a
production grid and supporting the end-users
18
Future EGEE Middleware - gLite
  • Intended to replace present middleware (LCG-2)
  • Developed mainly from existing components
  • Aims to address present shortcomings and advanced
    needs from applications
  • Regular, iterative updates for fast user feedback
  • Makes use of web-services where currently feasible

gLite-2
gLite-1
LCG-2
LCG-1
Globus 2 based
Web services based
Application requirements http//egee-na4.ct.infn.i
t/requirements/
19
Architecture Design
  • Design team including representatives from
    Middleware providers (AliEn, Condor, EDG,
    Globus,) and Operations, including US partners
    produced middleware architecture and design.
  • Takes into account input and experiences from
    applications, operations, and related projects
  • Focus on medium term (few months) and
    commonalities with other projects (e.g. OSG)
  • Effective exchange of ideas, requirements,
    solutions and technologies
  • Coordinated development of new capabilities
  • Open communication channels
  • Joint deployment and testing of middleware
  • Early detection of differences and disagreements
  • The 2nd release of gLite (v1.1) made in May05
  • http//cern.ch/glite/packages/R1.1/R20050430/defau
    lt.asp
  • http//cern.ch/glite/documentation

gLite is not just a software stack, it is a
new framework for international collaborative
middleware development. Much has been
accomplished in the first year. However, this is
just the first step.
20
gLite Services in Release 1Software stack and
origin (simplified)
  • Computing Element
  • Gatekeeper (Globus)
  • Condor-C (Condor)
  • CE Monitor (EGEE)
  • Local batch system (PBS, LSF, Condor)
  • Workload Management
  • WMS (EDG)
  • Logging and bookkeeping (EDG)
  • Condor-C (Condor)
  • Information and Monitoring
  • R-GMA (EDG)
  • Storage Element
  • glite-I/O (AliEn)
  • Reliable File Transfer (EGEE)
  • GridFTP (Globus)
  • SRM Castor (CERN), dCache (FNAL, DESY), other
    SRMs
  • Catalog
  • File/Replica Metadata Catalogs (EGEE)
  • Security
  • GSI (Globus)
  • VOMS (DataTAG/EDG)
  • Authentication for C and Java based (web)
    services (EDG)

Now doing rigorous scalability and performance
tests on pre-production service
21
Software Process
  • JRA1 Software Process is based on an iterative
    method
  • It comprises two main 12-month development cycles
    divided in shorter development-integration-test-re
    lease cycles lasting 1 to 4 weeks
  • The two main cycles start with full Architecture
    and Design phases, but the architecture and
    design are periodically reviewed and verified.
  • The process is documented in a number of standard
    documents
  • Software Configuration Management (SCM) Plan
  • Test Plan
  • Quality Assurance Plan
  • Developers Guide

22
Release Process
Development
Integration
Testing
Deployment Packages
Software Code
Fail
Pass
Testbed Deployment
Integration Tests
Fix
Fail
Pass
Installation Guide, Release Notes, etc
23
Bug Counts and Trends
May 18, 2005
Defects/KLOC 2.01
24
gLite Whats next?
  • Focus and Priority is
  • Bug Fixing
  • Support to Service Challenge 3
  • File Transfer Service (FTS)
  • New planned features in 1.2
  • VOMS 1.5 (Oracle support)
  • CE Condor support (in addition to PBS/LSF)
  • WMproxy (Web Services Interface including bulk
    job submission)
  • Service discovery interface to BDII
  • File Transfer Service improvements
  • R-GMA aligned with the LCG-2 version
  • Beyond 1.2
  • DGAS accounting system
  • Job Provenance
  • Globus Workspace Services
  • Harmonization of Security Models
  • Integration with Service Discovery

25
EGEE Activities
  • 48 service activities (Grid Operations, Support
    and Management, Network Resource Provision)
  • 24 middleware re-engineering (Quality
    Assurance, Security, Network Services
    Development)
  • 28 networking (Management, Dissemination and
    Outreach, User Training and Education,
    Application Identification and Support, Policy
    and International Cooperation)

Emphasis in EGEE is on operating a
production grid and supporting the end-users
26
Outreach Training
  • Public and technical websites constantly evolving
    to expand information available and keep it up to
    date
  • 3 conferences organised
  • 300 _at_ Cork, 400 _at_ Den Haag, 500 _at_ Athens
  • Pisa 4th project conference 24-28 October 05
  • More than 70 training events (including the GGF
    grid school) across many countries
  • 1000 people trained
  • induction application developer advanced
    retreats
  • Material archive with more than 100 presentations
  • Strong links with GILDA testbed and GENIUS portal
    developed in EU DataGrid

27
Deployment of applications
  • Pilot applications
  • High Energy Physics
  • Biomed applications
  • http//egee-na4.ct.infn.it/biomed/applications.htm
    l
  • Generic applications Deployment under way
  • Computational Chemistry
  • Earth science research
  • EGEODE first industrial application
  • Astrophysics
  • With interest from
  • Hydrology
  • Seismology
  • Grid search engines
  • Stock market simulators
  • Digital video etc.
  • Industry (provider, user, supplier)

28
HEP (ATLAS) Utilisation
12000
  • 660K jobs total in (LCG, Nordugrid, US Grid3)
  • 400 kSI2k years of CPU
  • In latest period average 7K jobs/day with 5K in
    LCG

6000
29
Bioinformatics
  • GPS_at_ Grid Protein Sequence Analysis
  • NPSA is a web portal offering proteins databases
    and sequence analysis algorithms to the
    bioinformaticians (3000 hits per day)
  • GPS_at_ is a gridified version with increased
    computing power
  • Need for large databases and big number of short
    jobs
  • xmipp_MLrefine
  • 3D structure analysis of macromolecules from
    (very noisy) electron microscopy images
  • Maximum likelihood approach for finding the
    optimal model
  • Very compute intensive
  • Drug discovery
  • Health related area with high performance
    computation need
  • An application currently being ported in Germany
    (Fraunhofer institute)

30
Medical imaging
  • GATE
  • Radiotherapy planning
  • Improvement of precision by Monte Carlo
    simulation
  • Processing of DICOM medical images
  • Objective very short computation time compatible
    with clinical practice
  • Status development and performance testing
  • CDSS
  • Clinical Decision Support System
  • knowledge databases assembling
  • image classification engines widespreading
  • Objective access to knowledge databases from
    hospitals
  • Status from development to deployment, some
    medical end users

31
Medical imaging
  • SiMRI3D
  • 3D Magnetic Resonance Image Simulator
  • MRI physics simulation, parallel implementation
  • Very compute intensive
  • Objective offering an image simulator service to
    the research community
  • Satus parallelized and now running on EGEE
    resources
  • gPTM3D
  • Interactive tool for medical images segmentation
    and analysis
  • A non gridified version is distributed in several
    hospitals
  • Need for very fast scheduling of interactive
    tasks
  • Objectives shorten computation time using the
    grid
  • Status development of the gridified version
    being finalized

32
Status of Biomedical VO
PADOVA
BARI
33
Grid conclusions
  • e-Infrastructures deployment creates a powerful
    new tool for science as well as applications
    from other fields
  • Investments in grid projects and e-Infrastructure
    are growing world-wide
  • Applications are already benefiting from Grid
    technologies
  • Open Source is the right approach for publicly
    funded projects and necessary for fast and wide
    adoption
  • Europe is strong in the development of
    e-Infrastructure also thanks to the initial
    success of EGEE
  • Collaboration across national and international
    programmes is very important

34
Summary
  • EGEE is the first attempt to build a worldwide
    Grid infrastructure for data intensive
    applications from many scientific domains
  • A large-scale production grid service is already
    deployed and being used for HEP and BioMed
    applications with new applications being ported
  • Resources user groups are expanding
  • A process is in place for migrating new
    applications to the EGEE infrastructure
  • A training programme has started with many events
    already held
  • next generation middleware is being tested
    (gLite)
  • First project review by the EU successfully
    passed in Feb05
  • Plans for a follow-on project are being prepared

35
Contacts
  • EGEE Web Site
  • http//www.eu-egee.org
  • How to join
  • http//public.eu-egee.org/join/
  • gLite Web Site
  • http//www.glite.org
  • EGEE Project Office
  • project-eu-egee-po_at_cern.ch
Write a Comment
User Comments (0)
About PowerShow.com