The Quantum Chromodynamics Grid - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

The Quantum Chromodynamics Grid

Description:

The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh Overview Overview The data ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 19
Provided by: LornaSmi
Category:

less

Transcript and Presenter's Notes

Title: The Quantum Chromodynamics Grid


1
The Quantum Chromodynamics Grid
  • James Perry, Andrew Jackson, Matthew Egbert,
    Stephen Booth, Lorna Smith
  • EPCC, The University Of Edinburgh

2
Overview
  • Overview
  • The data grid
  • The metadata catalogue and browser
  • Conclusions
  • References

3
Overview
  • Aim
  • To implement a 'QCDgrid' to become a production
    environment for UKQCD, a collaboration of UK
    Scientists carrying out Quantum Chromodynamics
    (QCD) simulations
  • The Grid
  • This multi-terabyte storage system will
    supporting distributed data management across
    four UK sites Edinburgh, Glasgow, Liverpool and
    Swansea
  • Funding
  • QCDGrid is part of the GridPP project, a PPARC
    funded initiative

4
Why build a QCD Grid?
  • QCD currently generates terabytes petabytes of
    data
  • Especially when their purpose built HPC system
    QCDOC comes on line
  • Post-processing is highly diverse and distributed
  • Involves multinational collaborations
  • The challenge is to store and access this data
  • Secure, reliable and expandable distributed
    storage system required
  • Initially, the QCDGrid project aims to address
    this issue
  • Develop a multi-terabyte storage system,
    supporting distributed data management across
    different UK sites

5
The QCDGrid
  • Stage 1 Implement a multi-site data storage Grid
  • Globus toolkit for toolkit for basic grid
    operations e.g. data transfer, security
  • Globus replica catalogue for to maintain a
    directory of files on the Grid
  • Intend to use EDG software in the future e.g. for
    file replication
  • Stage 2 Develop structured data which describes
    the characteristics of the raw data (metadata)
  • Develop an XML schema for lattice QCD
    Calculations
  • Implement a metadata catalogue
  • Develop a metadata catalogue browser

6
The QCDGrid Structure
7
Basic DataGrid Requirements
  • The data grid must distribute data across the
    four sites
  • Robustly
  • Each file must be replicated at at least two
    sites
  • Efficiently
  • Where possible, files should be stored close to
    where they are needed most often
  • Transparently
  • End users should not need to be concerned with
    how the data grid is implemented

8
DataGrid Implementation
  • Hardware
  • Storage elements are PCs
  • Data stored in RAID arrays cheap and offer
    built in redundancy
  • Software
  • Red Hat Linux 7.2 OS
  • Globus Toolkit 2.0 used for low level grid
    services
  • European DataGrid software intended to be used in
    next phase for data replication/job submission
  • Custom written QCDGrid software builds on Globus
    to implement QCDGrid client tools and control
    thread

9
Data Grid Structure
10
Simple Use Case Adding a File
  • The user issues a put command
  • The software chooses a suitable storage element
    and copies the file to its NEW directory
  • On its next scan, the control thread finds the
    new file and moves it to its actual home,
    registering it with the replica catalogue
  • On its next scan, the control thread finds there
    is only one copy of the file and makes another
    one at a suitable site, registering it with the
    replica catalogue

11
Simple Use Case Getting a File
  • The user issues a get command on a client
    machine
  • The software looks up the replica catalogue to
    find the nearest copy of the file
  • The file is transferred from that copy
  • If the transfer fails, the software looks up the
    replica catalogue again to find the next nearest
    copy, and tries to transfer that instead

12
Fault Tolerance
  • Probably the most important requirement of
    QCDgrid
  • Central control thread
  • Constantly monitoring nodes to make sure they are
    still working
  • Node fails without warning
  • E-mail sent to the system administrator
  • Control thread begins to replicate the files that
    were on the node elsewhere
  • Nodes can be temporarily disabled if they have to
    be shut down or rebooted
  • Prevents the grid moving data around
    unnecessarily
  • A secondary node is constantly monitoring the
    central node
  • Backing up the replica catalogue and
    configuration files.
  • Grid can still be accessed (albeit read-only) if
    the central node goes down

13
Current Progress
  • Data grid software has been implemented and is
    undergoing testing
  • A 4 node test grid has been set up across two of
    the sites (Edinburgh and Liverpool)
  • A web-based status monitor exists, allowing users
    to check the state of the data grid

14
Metadata
  • Storing metadata which describes the actual data
  • This allow users to see what is on the grid and
    find what they want more easily
  • Data described by XML metadata files
  • A schema is being developed for the QCD metadata
  • The XML files stored centrally in an XML database
    the QCDGrid metadata catalogue
  • Using Apache Xindice
  • The XML files will also be submitted to the data
    grid itself
  • Ensures there is a backup copy of the metadata
  • Metadata catalogue can be reconstructed from the
    data grid in the event that it is lost

15
Implementation of Metadata
  • Data submitted to the grid must be accompanied by
    a valid metadata file
  • This can be enforced by checking it against the
    schema
  • A submission tool (graphical or command line)
    takes care of sending the data and metadata to
    the right places
  • The Xindice XML database is accessed as a grid
    service
  • The API for this is being developed by the OGSA
    DAI project
  • A graphical metadata browser will allow easy
    access to data stored on the grid, based on
    meaningful characteristics

16
Current Progress
  • XML schema development is well advanced
  • Prototype available
  • Metadata browser applet exists
  • May require modification due to changes in APIs
    used
  • Metadata catalogue
  • OGSA DAI project are providing grid service
    software to QCDGrid

17
Conclusions
  • Aim
  • To implement a 'QCDgrid' to become a production
    environment for UKQCD
  • Developed a prototype distributed data grid
  • Adding real data to the grid this month
  • Developed a prototype XML schema and browser
  • Utilising the OGSA DAI grid service software for
    the XML metadata catalogue

18
References
  • QCDGrid
  • Software mailing list ukqcd-software_at_jiscmail.ac.
    uk
  • Project information e-mail qcdgrid_at_epcc.ed.ac.uk
  • Or see http//www.epcc.ed.ac.uk/computing/researc
    h_activities/grid/qcdgrid/
  • Example Schema, see
  • http//www.ph.ed.ac.uk/ukqcd/community/the_grid/xm
    l_schema/xml_schema.html
  • GridPP
  • http//www.gridpp.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com