Deployment Aspects of LCG - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Deployment Aspects of LCG

Description:

Deployment Aspects of LCG. Ian Bird. LCG Deployment Area Manager. Presentation to HEP-CCC Meeting ... subscribing to the (US) Virtual Data Toolkit - VDT ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 31
Provided by: ianb196
Category:

less

Transcript and Presenter's Notes

Title: Deployment Aspects of LCG


1
Deployment Aspects of LCG
  • Ian Bird
  • LCG Deployment Area Manager
  • Presentation to HEP-CCC Meeting
  • 18-Oct-2002

2
Summary
  • Introduction
  • LCG areas of activity
  • Deployment Goals and Timescale
  • Deployment Activities
  • Technology
  • Testing and certification
  • Support
  • Resources
  • Operations
  • Coordination, collaboration
  • Conclusions

3
Project Goals
Goal Prepare and deploy the LHC computing
environment
  • applications - tools, frameworks, environment,
    persistency
  • computing system ? global grid service
  • cluster ? automated fabric
  • collaborating computer centres ? grid
  • CERN-centric analysis ? global analysis
    environment
  • central role of data challenges

This is not another grid technology project
it is a grid deployment project
4
LCG Level 1 Milestonesproposed to LHCC
5
LCG-1 Timescale in a nutshell
  • LCG-1 must be defined end 2002
  • 2 major areas to be addressed
  • Define LCG-1 in terms of required functionality
    and services
  • Deployment schedule
  • Set up distributed organisational structure
  • Resources and scheduling,
  • Policies security, authentication, etc.
  • Operational agreements and responsibilities
  • Support services
  • End November Level 1 and 2 milestones in a
    quantifiable form
  • LCG-1 service must be in place July 2003
  • 6 months testing, integration, certification,
    packaging and deployment
  • Need to demonstrate performance end 2003
  • This should include adding current production
    services into LCG
  • Provide production service for data challenges in
    2004

6
LCG Activities
7
LCG and its interactions
Experiments
Grid Projects
HEPCAL
PPDG
iVDGL (VDT)
GriPhyN
Globus
GLUE
EDG
NorduGrid
GDB
AliEn
Regional Centres
CERN
8
Multi-dimensional problem
  • Regional Centres
  • Host one or more experiments
  • Different RCs deploy different grid middleware
    in existing testbeds
  • Have different operational and security policies
  • Experiments
  • Use middleware from various grid projects
  • Run at many regional centres
  • Provide applications that rely on specific
    middleware
  • Grid projects
  • Provide middleware that does not often (yet)
    interoperate
  • Starting to collaborate on common solutions and
    interoperability
  • ? The Deployment area of LCG ties these all
    together

9
Grid Technology
  • Short term (next 3-4 months)
  • Define LCG-1 in terms of minimum functionality
    and services to be provided
  • Recommend how to provide them
  • GTA, GDB, using HEPCAL document as a basis
  • Longer term
  • ensuring that the LCG requirements are known to
    current and potential Grid projects
  • active lobbying for suitable solutions
    influencing plans and priorities
  • negotiating support for tools developed by Grid
    projects
  • Essential for a production service!
  • developing a plan to supply solutions that do not
    emerge from other sources
  • BUT this must be done with caution important to
    avoid HEP-specific solutions

10
Technology
  • A base set of requirements has been defined
    (HEPCAL)
  • 43 use cases
  • 2/3 of which should be satisfied 2003 by
    currently funded projects
  • LCG plans to use the technology emerging from
    some of the many Grid projects receiving
    substantial national and EU RD funding, and
    perhaps later from industry
  • Today
  • many of these projects are led by, or strongly
    influenced by HEP
  • are built on the Globus toolkit
  • and form two main groups
  • around the (European) DataGrid project
  • subscribing to the (US) Virtual Data Toolkit -
    VDT
  • rapidly growing interest investment from other
    sciences, industry
  • HEP (LHC data challenges, BaBar, LCG, ) an early
    adopter
  • Tomorrow
  • must remain in the main line leverage the
    massive investments being made
  • increasingly difficult for HEP to influence
    direction
  • expect several major architectural changes before
    things mature
  • LCG must adapt and evolve both in functionality
    and in technology

11
Deployment
12
RC MoU and Requirements Catalog
  • The SC2 RTAG on Regional Centre categorisation,
    recommended
  • The GDB should work out the MoUs
  • Requirements to be considered
  • Quality of service
  • Policy of use
  • Network connectivity
  • Compatibility
  • User support and training
  • Consultation and problem tracking
  • Operating conditions

13
Grid Deployment goals of LCG-1
  • Production service for Data Challenges in 2H03
    2004
  • Focused on batch production work
  • Experience in close collaboration between the
    Regional Centres
  • Should have wide enough participation to
    understand the issues, but not too many initially
  • Learn how to maintain and operate a global grid
  • Focus on a production-quality service and all
    that implies
  • Robustness, fault-tolerance, predictability, and
    supportability take precedence over functionality
  • But minimum functionality to be of value
  • This requires
  • a middleware support group with integration,
    certification, testing, packaging etc.
    responsibilities
  • A support structure
  • LCG should be integrated into the sites physics
    computing services should not be something
    apart
  • This requires coordination between participating
    sites in
  • Policies and collaborative agreements
  • Resource planning and scheduling

14
What might LCG-1 look like?
  • Users perspective - requires
  • Functionality adequate to provide advantage over
    not using distributed model
  • Straightforward to use
  • Well defined services
  • Advice on how to use the system
  • Help with problems
  • Failures should be understandable
  • Ability to determine status of jobs and data
  • Sites perspective
  • Integrated into computer centre/IT (inc.
    security) infrastructures
  • Able to support service
  • Able to allocate and manage resources local
    autonomy where needed
  • Overall service perspective
  • Performance and problem monitoring
  • Accounting
  • Etc.

15
Grid Deployment
  • Grid Deployment Board
  • Representatives from experiments, Regional
    Centres, LCG
  • Define LCG-1
  • Put in place agreements and policies to enable
    the deployment and operation of LCG
  • Coordinates planning of resources for computing
    and physics data challenges
  • Initial meeting Oct 4, Milano
  • Grid Deployment Area
  • Certification Testing
  • System support
  • Operations
  • User Support
  • Resources planning scheduling

16
Grid Deployment Board
  • 1st Meeting in Milano Oct 4, 2002
  • Set up Technical Working groups
  • WG1 Define LCG-1 functionality and services,
    recommend how to provide them. Define priorities
    and schedule for additional functionality
  • WG2 Define the regional centres in LCG-1, and
    the resources that should be available in each.
    Schedule for rolling out the infrastructure and
    resources. Propose metrics to be used for
    allocation, accounting, and reporting.
  • WG3 Define a straightforward security and
    authentication model to be used in LCG-1, and
    identify the technical issues. Set up agreements
    and MoUs. Propose simple mechanism for
    authorization.
  • WG4 Define ops procedures responsibilities.
    Make agreements to ensure coordination of these
    activities. Define the requirements for a Grid
    Operations Centre to coordinate operational
    activities.
  • WG5 Propose a support model for LCG-1, including
    the scope of responsibilities for call
    centre/helpdesk, and specify requirements for
    problem resolution and tracking.
  • Follow ups 4 meetings (2 by phone) before end
    2002

17
Grid Deployment Teams the plan
suppliers integration teams provide tested
releases
common applications s/w
Trillium - US grid middleware
DataGrid middleware
certification, build distribution
LCG infrastructure coordination operation
user support
grid operation
call centre
LCG

fabric operation regional centre A
fabric operation regional centre B
fabric operation regional centre Y
fabric operation regional centre X
18
Certification Testing
  • Function shared between EDG and LCG
  • Groups
  • Installation Team Group ( iTeam)
  • Mostly EDG members, 1 LCG
  • Testing group (TSTG)
  • Mainly LCG
  • Certification Group (CTG)
  • Mainly LCG
  • Management of Certification and Testing is Zdenek
    Sekera (LCG)

19
ITeam responsibilities
  • the software can be built and packaged without
    obvious errors
  • the software passes the integration test suite
  • ensure that all fixes/features introduced into
    the software have their entry in the bugzilla,
    verify that entry has been updated when the
    fix/feature is checked into the software tree
  • build all standard configurations (with different
    compilers, libraries, etc) as defined by the GDB
    and test them with the integration test suite
  • report all problems via bugzilla
  • follow up with development problems reported in
    bugzilla

20
Test Group responsibilities
  • Testing basic grid functionality
  • Responsible for collecting and creating tests to
    provide
  • testing grid services
  • testing security
  • testing information
  • testing resource brokering
  • testing data catalogue and replication
  • testing connectivity
  • testing configurability
  • testing basic grid functionality
  • testing error recovery, fault tolerance
  • organize and perform complete geographically
    distributed tests as defined by GDB
  • make sure all new features come with the
    documentation
  • maintain the automated test suite
  • create and perform destructive tests
  • report every problem via bugzilla

21
Certification group responsibilities
  • Certify that the software satisfies the
    functional and stability requirements, including
    adequate documentation
  • setup, configure and maintain certification
    testbeds
  • verify the TSTG tests are complete and OK
  • follow up other GDB requirements for the Grid
    certification, create appropriate certification
    tests
  • ensure all certification tests run
  • pay attention to performance issues
  • work with ATG (Application Test Group) to ensure
    the complete Grid production testing environment
    is valid
  • create complete release package(s), integrating
    the up-to-date documentation (the documentation
    will come from other sources such as User Support
    Group)
  • create CDs etc for Grid software distribution

22
Certification Test Activities
  • Current activities
  • Prepare EDG November release
  • Recreate EDG, EDT iVDGL interoperability demos
    on LCG testbeds
  • Evaluation of software, GLUE results etc, with
    GDB WG1
  • Training and experience for new team members
  • Certification, testing, validation
  • Will be and will remain a significant activity of
    LCG
  • This is what will make LCG a production level
    service

23
Testbeds and Services
developers testbeds
development testbed a- and ß-testing integrating
and preparing a middleware release
DataGrid
production testbed stable, maintained service
for applications
production testbed stable, maintained service
for applications
demonstration testbed
2002
2003
certification testbed controlled changes,
in-depth application testing
LCG
production service stable, maintained, 24X7
service for applications
24
Operations team
  • Responsible for operating and maintaining the
    grid infrastructure and associated services
  • Gateways, information services, resource broker
    etc. i.e. grid specific services
  • Provide Grid Operations Centre
  • Leverage existing experience (iVDGL, etc.)
  • Assemble monitoring, reporting etc. tools
  • Authorisation, Authentication services and
    infrastructure inc. CAs
  • Accounting
  • Security operations incident response etc.

25
Grid Operation
queries monitoring alarms corrective actions
User
Local operation
Local user support
Local site
Call Centre
Grid Operations Centre
Grid information service
Grid operations
Virtual Organisation
Grid logging bookkeeping
Network Operations Centre
26
User Support
  • Essential for a production service
  • Two aspects
  • Experiment integration/ consultancy
  • Work directly with the experiments computing
    projects to ensure efficient use of LCG services,
    and optimum use of resources
  • Act as liaison to ensure experiment specific
    issues are resolved
  • User support
  • Helpdesk/call centre operation
  • Globally distributed 24x7, ensure single point
    of contact for user
  • Collaborative and distributed operation
  • Documentation
  • Training

27
Resource Planning Scheduling
  • Must tie together
  • global experiment requirements including some
    review process
  • Regional centre (and other) resource planning
  • Constraints e.g. some resources may be
    dedicated to specific experiments (if we succeed
    this should go away)
  • Optimise use of resources at centres
  • Ensure experiment needs are satisfied
  • Try to smooth out peaks of demand sharing of
    resources between experiments
  • Eventually be able to make use of non-HEP
    resources
  • Activity to build a database of requirements and
    available resources has begun

28
Coordination Collaboration
  • There are many opportunities for common
    solutions, which should be actively pursued
  • GLUE
  • Schema definitions interoperability work
  • HICB JTB, proposed new collaborative activities
  • Validation and Test Suites
  • Distribution and Meta-Packaging
  • Interoperable distribution and configuration
    utilities identified as a definite need by all
    the recent trans-Atlantic demonstration and
    validation work.
  • Support for this group comes from
  • LCG, EDG, EDT, Trillium, DataTAG
  • Other opportunities
  • Storage interfaces e.g. SRM
  • Grid operations centre
  • Authentication, authorisation and security
  • HEPiX as collaborative vehicle for RC managers,
    site coordinators
  • E.g. certification process for operating
    environments upgrade procedures configuration
    management helpdesk tools, etc.

29
Deployment Summary
  • Deploy middleware to support essential
    functionality, but goal is to evolve and
    incrementally add functionality
  • Added value is to robustify, support and make
    into a 24x7 production service
  • How?
  • Certification test procedure tight feedback
    to developers
  • must develop support agreements with grid
    projects to ensure this
  • Define missing functionality require from
    providers
  • Provide documentation and training
  • Provide missing operational services
  • Provide a 24x7 Operations and Call Centre
  • Guarantee to respond
  • Single point of contact for a user
  • Make software easy to install facilitate new
    centres joining

30
Conclusions
  • Deployment is a major activity of LCG
  • Encompasses all operational and practical aspects
    of a grid
  • Timescales are relatively short for LCG-1
  • But there is a lot of work already done that must
    be leveraged
  • Many opportunities for synergy and collaboration
  • E.g. certification and testing EDG/LCG
  • We will succeed if we use these opportunities
Write a Comment
User Comments (0)
About PowerShow.com