David Wallom - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

David Wallom

Description:

Big Brother monitoring client. 7. Resource Broker. Uses the Condor-G job distribution mechanism. ... 11. Virtual Organisation Management. Two tier system: For ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 23
Provided by: Brow165
Category:
Tags: big | brother | david | uk | wallom

less

Transcript and Presenter's Notes

Title: David Wallom


1
David Wallom
  • The University of Bristol Grid, a production
    campus grid
  • April 2005

2
Outline
  • Why?
  • What was planned?
  • Components
  • Problems
  • Outputs

3
The UoBGrid, why?
  • Leveraging extra use from existing resources by
  • evening out load between heavily and lightly
    systems
  • using currently non-scientifically used resources
  • (i.e. Admin department systems after hours).
  • Large communities of users with serial
    computation needs, equally large of parallel
    users, currently using the same resources which
    is not optimal.
  • Enable experience to be gained in supporting a
    distributed system before joining NGS.

4
The UoBGrid, what?
  • Planned for 1000 CPUs from 1.2 ? 3.2GHz arranged
    in 7 clusters 4 Condor pools located in 6
    different departments.
  • Central services all run on individual servers in
    Information Services main computer room.
  • Resource Broker.
  • Information Systems Monitoring.
  • Virtual Organisation Credential Management.
  • Storage Resource Broker Vault.
  • Choice of software used to be lead by
    developments within other UK efforts (NGS).

5
The UoBGrid System Layout
6
Compute Clusters installation
  • Primary concern of compute owners is on-going
    operations
  • Software used must be
  • Self-contained,
  • Simple installation,
  • Must not need key service interruption,
  • Present simple system status information to
    support staff.
  • Solution
  • VDT 1.2.2
  • Non-WS Globus Patches
  • GSI-SSH
  • EDG-Gridmapfile
  • myProxy
  • SRB S-commands
  • Big Brother monitoring client

7
Resource Broker
  • Uses the Condor-G job distribution mechanism.
  • Custom script for determination of resource
    status priority.
  • Integrated the Condor ClassAds and Globus MDS.
  • Lightweight self contained solution (20Mb).

8
Resource Broker Operation
9
Information Services
  • Central UoBGrid Globus GIIS.
  • Each worker node configured with GRIS to publish
    Scheduler as well as node data using GLUE as well
    as job manager reporters.
  • May change core server to BDII.
  • Allowed systems for registration controlled by
    VOM.
  • Small system hic-up when new machine added as
    GIIS needs restarting.

10
UoBGrid, Monitoring
4 Hourly Grid Systems monitoring and Reporting
Resource Broker Job distribution status
11
Virtual Organisation Management
  • Two tier system
  • For local only users
  • Web based system developed in-house.
  • Runs completely on server with push model out to
    clients using GridFTP for distribution.
  • For NGS registered users
  • EDG make gridmapfile to construct files based
    upon the use of pool accounts.
  • Longer term intention to use only NGS style pool
    accounts for simplified user management.

12
Virtual Organisation Management
13
Resource Usage Service
  • Custom changes to jobmanager scripts.
  • Usage records for each job as follows
  • User
  • Start-time
  • End-time
  • Executable
  • Records usage whenever job completes
    successfully.
  • Publishes back to webserver on VOM system.

14
How to use UOBGrid
  • Using an e-Science certificate for AA.
  • Simple command line interface
  • What out users want,
  • The most successful way of getting happy users
    has been to change their usual interface/usage
    model as little as possible.

15
The Users
  • Polymer Nano
  • Run optical trapping simulations in readiness for
    real experiments in the new building.
  • BioChemistry
  • Protein ligand docking simulation.
  • Earth Sciences
  • River simulation.
  • Comp Sci
  • Radiance.
  • Myself
  • Charge distribution simulation code for system
    testing.

16
New Users/being ported
  • Chemistry
  • Gaussian computational Chemistry application
  • GENIE, Geographical Sciences
  • Whole Earth system modelling.
  • Civil Engineering
  • EuroNEES/UKNEES.

17
Usage
  • Current record
  • 15000 individual jobs in a week, 4500 in one
    day.
  • Single submitted job containing 2000 individual
    sub-components

18
Software Problems Encountered
  • Some of the middleware that we have been trying
    to use has not been a reliable as we would have
    hoped.
  • MDS is a prime examples where necessity for
    reliability has defined our usage model.
  • More software than originally wanted has had to
    be designed/written in house due to externally
    released software not being anywhere near where
    it was advertised.
  • Constant polling of queue managers by job-manager
    solved by adding a sleep, reduced head-node load
    by 70!
  • Some systems are running operating systems
    versions so old that the middleware refused to
    install!

19
Things we worry about
  • System upgrading
  • Now it is a service we cannot take it all down at
    once to upgrade it because we have real academic
    users!
  • Future directions and scalability of the
    certificate mechanism.
  • Future compatibility of tools such as Condor to
    Globus/SRB/anything else that is useful!

20
Results
  • http//escience.bristol.ac.uk/Science_results.htm
  • Rendering On Demand output, Graphics!

21
Future Plans
  • Expand UoBGrid to become SWGrid
  • Incorporate resources from UWE, Exeter and other
    SW institutions.
  • Maintain central and cluster systems with
    up-to-date middleware.
  • Longer-term uncertain
  • University has to believe benefits are tangible
    in the long term, necessity for lots of
    specialist people is bad!
  • competition from professional solutions such as
    LSF and GridMP.

22
Further Information
  • Centre for e-Research Bristol
  • http//escience.bristol.ac.uk
  • Email david.wallom_at_bristol.ac.uk
  • Telephone 44 (0)117 928 8769
  • UOBGrid uobgrid-admin_at_bristol.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com