The Grid an overview and some current activities - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

The Grid an overview and some current activities

Description:

CLRC set up an e-Science Centre in summer 2000 ... Investigator Name Surname Sinn... Institution University of Hull ... Funding EPSRC ... – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 43
Provided by: Alexis50
Category:

less

Transcript and Presenter's Notes

Title: The Grid an overview and some current activities


1
The Grid - an overview and some current
activities
  • David Boyd
  • CLRC e-Science Centre
  • Rutherford Appleton Laboratory
  • http//www.e-science.clrc.ac.uk
  • d.r.s.boyd_at_rl.ac.uk

2
Outline
  • What is the Grid and how can it help me?
  • Current Grid activities in CLRC
  • CLRC Data Portal project
  • EU DataGrid project

3
UK Science Budgete-Science and the Grid
  • E-Science means science increasingly done
    through distributed global collaborations enabled
    by the Internet, using very large data
    collections, terascale computing resources and
    high performance visualisation.
  • The Grid is a set of technologies with the
    potential to deliver e-science
  • The Grid provides persistent environments that
    enable software applications to integrate
    instruments, displays, computational and
    information resources that are managed by diverse
    organisations in widespread locations.

4
UK Science Budgete-Science funding - 98M
  • RC applications testbed programmes
  • PPARC 26M
  • EPSRC 17M
  • NERC 7M
  • BBSRC 8M
  • MRC 8M
  • ESRC 3M
  • CLRC 5M (1M, 1.5M, 2.5M)
  • EPSRC core technologies programme
  • 15M 20M from DTI industry contributions
  • EPSRC High Performance Computing (HPCx) - 9M

5
Vision behind the e-Science programme
  • Demanding scientific applications provide the
    drivers
  • These applications define technology requirements
  • Major new scientific advances can be achieved in
    2-3 years using todays emerging Grid
    technologies
  • Scientists working together with technologists
    can develop solutions based on these Grid
    technologies which meet the scientific needs
  • Science advances and new technology
    emerges into the marketplace to support
    e-business

6
The Global Grid scene . . .
  • Grid tools have been under development in US for
    5 years
  • Globus, Condor, Legion, Storage Resource Broker
    (SRB), . . .
  • Major science-led initiatives now based on these
    tools
  • GriPhyN, PPDG, NASA IPG, DataGrid, EuroGrid, . .
    .
  • Virtual Data Concept (eg National Virtual
    Observatory)
  • generate new data on demand from existing global
    data archives either by targeted extraction or by
    real-time derivation
  • Access Grid - persistent electronic shared
    space
  • support large scale collaborative interaction and
    visualization
  • Global Grid Forum now established - EU US Far
    East
  • encourage collaboration and common practice by
    consensus
  • working groups in many areas relevant to UK
    programme

7
An example application Network for Earthquake
Engineering Simulation
  • NEESgrid US national infrastructure to couple
    earthquake engineers with experimental
    facilities, databases, computers, each other
  • On-demand access to experiments, data streams,
    computing, archives, collaboration
  • NEESgrid Argonne, Michigan, NCSA, UIUC, USC

8
How can the Grid help me?
  • Provide access to a global distributed computing
    environment
  • via authentication, authorisation, negotiation,
    security
  • Identify and allocate appropriate resources
  • interrogate information services -gt resource
    discovery
  • enquire current status/loading via monitoring
    tools
  • decide strategy - eg move data or move
    application
  • (co-)allocate resources -gt process flow
  • Schedule tasks and analyse results
  • ensure required application code is available on
    remote machine
  • transfer or replicate data and update catalogues
  • monitor execution and resolve problems as they
    occur
  • retrieve and analyse results - eg using local
    visualization

9
To make this happen you need . . .
  • agreed protocols (cf WWW -gt W3C)
  • defined application programming interfaces (APIs)
  • existence of metadata catalogues
  • both system and application metadata
  • distributed data management
  • availability of current status of resources
  • monitoring tools
  • accepted authentication procedures and policies
  • network traffic management
  • these will be provided by Grid-based
    toolkits and services

10
CLRCs e-Science remit
  • CLRC will be allocated 5m additional resource
    over the period from the e-science Cross-Council
    programme to promote a data and computational
    Grid philosophy across all the activities of the
    Laboratory. It will aim to pilot the relevant
    technologies (metadata development, remote
    working, high performance computing, data
    curation and transfer) on the present facilities
    so that Grid functionality can be built into
    upgraded facilities as they come on stream and
    into DIAMOND from the start.

11
CLRC e-Science programme
  • CLRC set up an e-Science Centre in summer 2000
  • An initial set of Grid development projects
    started in October 2000
  • http//www.e-science.clrc.ac.uk/
  • These are now partially completed and plans are
    being developed for the next stage of the
    programme
  • CLRC is also involved in external Grid
    developments including the EU DataGrid project

12
CLRC e-Science activities - 1
  • HPC Applications Portal - Rob Allan (DL)
  • web-based portal to a range of HPC Grid resources
  • currently supports
  • resource identification and allocation on a
    remote machine
  • running a job and retrieving the output streams
  • visualising output data and tidying up
  • uses Globus, Apache, OpenSSL, OpenLDAP, GridPort,
    HotPage, Perl, CGI, etc . . .
  • now developing user registration and accounting
    aspects
  • experience - steep learning curve, installation
    complex, but it works!

13
CLRC e-Science activities - 2
  • StarGrid - David Giaretta (RAL)
  • Grid-based access to remote image archives
  • location-independent data retrieval
  • access to only parts of large images
  • uses the standard Starlink user interface
  • demonstrator with European Southern Observatory
    (ESO)

14
CLRC e-Science activities - 3
  • BADC Grid developments - Bryan Lawrence (RAL)
  • aim is to provide access to data in the BADC and
    other data centres at RAL via Grid technology
  • evaluating Globus for remote database access via
    GUIs and APIs
  • investigating Grid security mechanisms to control
    and account use of sensitive data
  • participating in EU DataGrid project (Earth
    Observation workpackage) and EU EnviGrid proposal
  • develop early experience of using Grid tools
    within the NERC data centre community

15
CLRC e-Science activities - 4
  • Beowulf clusters - Pete Oliver (RAL)
  • 16 node Beowulf cluster is fully Grid-accessible
  • applications running for protein simulation,
    climate modelling, computational chemistry,
    computational fluid dynamics, . . .
  • building new 32 CPU cluster based on AMD 1.2GHz
    processors and Wulfkit SCI interconnect using MPI
    software
  • AMD sponsoring the dual processor boards (40K)
  • measuring significantly better performance per
    compared to conventional HPC systems

16
CLRC e-Science activities - 5
  • Globus testbed and CA - Andrew Sansum (RAL)
  • set up 2 reference Globus installations based on
    PCs running Red Hat Linux and Globus 1.1.3
  • one system is a stable reference, the other is
    used to evaluate new Globus software releases
  • available to new users as a testbed and to help
    them build their own systems technical support
    provided
  • established a rudimentary Certificate Authority
    to issue digital certificates for the UK HEP Grid
    testbed project
  • investigating upgrading CA and addressing
    certification policy issues for more
    heterogeneous communities of users

17
CLRC e-Science activities - 6
  • Petabyte data storage - John Gordon (RAL)
  • current large scale data storage facilities at
    RAL comprise
  • 30TB IBM tape robot storage system for general
    scientific data now accessible via the Grid using
    Globus i/o
  • 120TB STK robot (expandable to 300TB) for
    particle physics data as part of the UK Tier1
    centre at RAL for processing data from the Large
    Hadron Collider at CERN
  • planning a 2PB Scientific Data Store for use by
    all RCs as a national Grid-accessible resource
  • provide long term data curation and preservation
    in support of UK science programmes and
    facilities
  • expandable to 10PB at incremental cost as demand
    grows

18
CLRC e-Science activities - 7
  • Gigabit networking - Chris Cooper (RAL)/Paul
    Kummer (DL)
  • upgrading networks at RAL and DL sites to provide
    1Gbps to all major facilities desktops where
    necessary
  • expect to upgrade internally to 10Gbps within
    2-3 years
  • maintain connection to SJ4 at highest available
    bandwidth
  • assessing MPLS for multi-service traffic
    management to handle a mix of real time video,
    bulk data transfer, etc
  • involved in a European Research Networks
    collaboration investigating multi-service
    networking over GEANT
  • investigating network QoS in transatlantic trials
    with Internet2 in US using HEP bulk data
    transfers

19
CLRC e-Science activities - 8
  • Scientific Data Portal - Brian Matthews (RAL)
  • The problem . . .
  • many scientific facilities (accelerators,
    telescopes, satellites, supercomputers, . . . )
    each producing and storing data independently
    with no common way of representing relevant
    metadata or accessing the original data files
  • A solution . . .
  • develop a single easy-to-use interface, based on
    a common model of scientific metadata, which
    provides the ability to search multiple data
    resources, possibly across disciplinary
    boundaries, and retrieve required data files

20
Data Portal - design approach
  • Focus initially on data from two facilities, ISIS
    and SRS, which each serve a range of disciplines
    and each maintain their own data archives
  • Talk to scientists (facility users and facility
    experts)
  • Identify user scenarios and define use cases as
    basis for developing pilot system and user trials
  • Design a modular architecture combining
  • a web-based user interface
  • a common metadata schema and catalogue
  • a mechanism for interfacing to distributed data
    resources

21
Data Portal - architecture
22
Data Portal - metadata model
A generic core spanning all scientific
applications with extensions for each domain
- can answer questions about specific domains -
can answer questions across domains
23
Data Portal - metadata catalogue
  • Uses an XML schema to represent the generic core
    scientific metadata
  • (eg owner, discipline, subject of study, facility
    used, . . . )
  • Schema extended to include distributed
    domain-specific metadata in a consistent way
  • Relational database based on the same schema
    holds the metadata and responds to queries by
    returning XML files
  • Catalogue holds URIs of logical datasets and
    (possibly multiple) URLs of physical datasets on
    relevant Grid-accessible servers enabling direct
    access to data

24
Data Portal -creating metadata
  • Relevant metadata was identified and investigated
    . . .
  • . . . but quickly discovered that no consistent
    metadata capture practice or even policy existed
  • laboratory notebooks, memory, personal
    abbreviations, etc
  • Some metadata contains personal details so DPA
    applies
  • Creating metadata long after the data proved to
    be very labour intensive and unreliable
  • Now have a parallel project to build automatic
    metadata capture and catalogue entry into
    experimental facilities

25
Data Portal - metadata example
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
CLRCMetadata SYSTEM "clrcmetadata.dtd"gt ltCLRCMetad
atagtltMetadataRecord metadataID"N000001"gt ltTopicgt
ltDisciplinegtChemistrylt/Disciplinegt ltSubjectgtC
rystal Structurelt/Subjectgt ltSubjectgtCopperlt/Subj
ectgt... ltExperimentgt ltStudyNamegtCrystal
Structure Copper Palladium complex 150K
... ltInvestigatorgtltNamegtltSurnamegtSinn...ltInstitu
tiongtUniversity of Hull ... ltFundinggtEPSRC
... ltTimePeriodgtltStartDategtltDategt21/04/1999.
ltPurposegtltAbstractgt To study the structure of
Copper and Palladium co-ordination complexes at a
150K. ltDataManagergtltNamegtltSurnamegtTeat... ltIn
strumentgtSRS Station 9.8, BRUKER AXS SMART
1K... ltConditiongt...Wavelength...ltUnitsgtAngstr
om...ltParamValuegt0.6890... ltConditiongtCrystal-t
o-detector distanceltUnitsgtcm...ltParamValuegt5.00...
26
Data Portal - metadata hierarchy
Investigation
Data Holding
Data Holding
Data Holding
Data-Set 1 (Raw)
Data-Set 2 (Inter)
Data-Set 3 (Final)
File 1 name date
File 1 name date
File 1 name date
27
Data Portal - operation
USER
Key
user input interpreter
user output generator
Internal
http
pre-set XSL Script
query generator
response generator
module
XML Schema
XML parser
external agent
XML File
XML File
Central metadata repository
Remote metadata repository
ascii file
28
Data Portal - example

Result of searching search across facilities -
returns XML to session and displays summary
29

Expand Results - give more details from the same
XML
30

Going Deeper - Can browse the data sets
31
Select data - pick the required data files and
download from convenient location.
32
Data Portal - current developments
  • Make more robust and scalable
  • goal is a production system handling all ISIS and
    SRS data
  • Extend to other scientific disciplines
  • earth sciences and space sciences want to use the
    system
  • extend the metadata schema for these sciences
  • Provide bridges to other metadata catalogues
  • Automate metadata capture and catalogue entry
  • Develop more comprehensive security features
  • Extend to support agent-based access via an API
  • Incorporate discipline-specific thesauri

33
DataGrid project
  • EU supported project with 21 partners in 10
    countries
  • EU funding is 9.8MEuro over 3 years with 20MEuro
    being provided by partners - started Jan 2001 -
    BIG!
  • CERN leads, 6 principal partners (PPARC in UK),
    15 associates plus industry (inc IBM/UK)
  • Aim is to link the Grid developments in each
    country into a pan-European Grid infrastructure
    for science
  • 3 application areas - particle physics (LHC),
    earth observation (ENVISAT) and bioscience
  • Large teams are now working in each country

34
DataGrid motivation - the LHC problem
  • LHC will produce several PBs of data per year for
    at least 10 years from 2005 - with no way to
    handle it!
  • Data analysis will be carried out by farms of
    1000s of commodity processors (the computing
    fabric) in each of about 10 regional Tier1
    centres - RAL is UK Tier1
  • Each Tier1 centre will need to hold several PBs
    of raw data and results of physics analysis
  • Communication will be via the GEANT European
    network
  • Strong focus on middleware and testbeds - open
    source
  • Will use Globus toolkit (but architecturally
    independent)

35
DataGrid - project structure
  • 4 major components
  • Grid-enabling basic resources
  • computing fabric, mass storage, networking
  • developing generic middleware
  • security, info services, resource allocation,
    file replication
  • building application services
  • job scheduling, resource management, process
    monitoring
  • testing with 3 science applications
  • particle physics (LHC), earth observation and
    bioscience

36
DataGrid workpackages
  • WP 1 Grid Workload Management (F. Prelz/INFN)
  • WP 2 Grid Data Management (B. Segal/CERN)
  • WP 3 Grid Monitoring services (R.
    Middleton/CLRC-PPARC)
  • WP 4 Fabric Management (O. Barring/CERN)
  • WP 5 Mass Storage Management (J.
    Gordon/CLRC-PPARC)
  • WP 6 Integration Testbed (F. Etienne/CNRS)
  • WP 7 Network Services (C. Michau/CNRS)
  • WP 8 HEP Applications (F. Carminati/CERN)
  • WP 9 EO Science Applications (L. Fusco/ESA)
  • WP 10 Biology Applications (C.
    Michau/CNRS)
  • WP 11 Dissemination (G. Mascari/CNR)
  • WP 12 Project Management (F. Gagliardi/CERN)

37
DataGrid workpackage structure
38
DataGrid - Earth Observation application
  • Focus on ENVISAT data from SCIAMACHY instrument
  • Distributed access to atmospheric ozone data as
    testbed
  • Data mining and long time-series processing
  • Scalable solutions for future high data rate
    instruments
  • On demand (re)processing of large (100TB)
    datasets
  • Data access via distributed Processing and
    Archiving Centres (PACs)
  • Testbed for distributed and parallel data
    modelling for EO
  • Local area meteorological modelling using wide
    area prediction model results

39
DataGrid testbed
A.Sansum_at_rl.ac.uk
40
UK testbed sites
Glasgow
  • Clusters
  • Scotland
  • North West
  • Midlands
  • London
  • Testbed sites
  • (integrated)

Edinburgh
Durham
Lancaster
Liverpool
Manchester
Dublin
Sheffield
Birmingham
Oxford
Cambridge
RAL
UCL,IC,Brunel,RHUL
Bristol
41
DataGrid collaborations
  • Strong links with Grid activities in US . . .
  • Globus teams at Argonne and USC - exchange of
    people
  • also Condor and other development centres
  • GriPhyN - consortium of US universities
  • Particle Physics Data Grid - team of US
    government labs
  • Global Grid Forum working groups
  • . . . and in Europe
  • DataGrid Industry and Research Forum
  • EuroGrid
  • GEANT
  • UK Grid programme will need to develop similar
    links

42
The UK Grid - where are we now?
  • UK Grid programme is now getting started in
    earnest . . .
  • RC application testbed programmes are taking
    shape
  • core technologies programme is defining its
    roadmap
  • but a practical context for collaboration has
    still to emerge
  • Real experience of Grid tools and techniques is
    growing in places like CLRC - and is available to
    share with others
  • Strong pressures are driving in some areas eg HEP
  • Pressure to involve industry will grow
  • Focus must now be on delivering convincing
    results quickly if there is to be a follow-on
    programme
Write a Comment
User Comments (0)
About PowerShow.com