First Use of the UK eScience Grid - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

First Use of the UK eScience Grid

Description:

The Grid and Globus an overview. The Physics ... To the point where a colleague was able to use the Grid relatively painlessly with education ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 26
Provided by: matthew378
Category:

less

Transcript and Presenter's Notes

Title: First Use of the UK eScience Grid


1
First Use of the UK e-Science Grid
  • Matthew Palmer, HEP Group, Cambridge University

CLRC e-Science Centre
ATLAS
2
Overview
  • The Grid and Globus an overview
  • The Physics
  • Using the Grid from certificates to vast
    quantities of data
  • Results
  • Other Grid usage in Particle Physics
  • Wish list
  • Conclusion

3
Overview of the Grid
  • Foster, Kesselman, Tuecke The anatomy of the
    Grid
  • The problem
  • coordinated resource sharing and problem solving
    in dynamic multi-institutional virtual
    organisations
  • The solution requires
  • Resource management
  • Single sign-on
  • Delegation
  • Integration with local security solutions
  • User-based trust relationships
  • The solution should be protocol based.

4
Globus
  • The lowest level of software in a Grid
  • Services provided
  • Security Globus Security Infrastructure
  • Authentication based on certificates
  • Authorization based on local files (e.g. gridmap
    files)
  • Job submission and interface to local batch
    system
  • Information services
  • Provided through Meta-Directory Services (MDS)
  • File transfer
  • GridFTP
  • Global Access to Secondary Storage (GASS)
  • Resource Management
  • Globus Resource Allocation Manager (GRAM)

5
The PhysicsThe ATLAS experiment
6
The PhysicsThe ATLAS experiment
  • Part of the Large Hadron Collider (LHC) at CERN.
  • Will produce 1 Petabyte of data per year 1
    billion events.
  • Need to be able to discover processes that result
    in a handful (100) events per year.
  • Therefore must simulate all relevant processes
    completely otherwise we could get caught out by
    a few unusual events from well-know/boring
    processes.
  • Simulation is trivially parallelizable each
    simulated event is entirely independent from all
    others.

7
The PhysicsGraviton Resonances
  • My work last year involved looking at decays of
    massive graviton resonances predicted by some
    higher dimensional theories.
  • But, background is very large and therefore needs
    to be simulated accurately.
  • With a locally generated background sample
  • 2 million events
  • error in simulation is typically larger than the
    signal!
  • So, use the Grid to generate the events we need.
  • 150 million of them!

8
The PhysicsGraviton Resonances
9
Using the GridGetting access
  • Acquire a certificate.
  • Get user accounts on all the machines you want to
    use.
  • This involves sending pieces of paper to people.
  • Administrative effort is O(n) .
  • Result is a series of accounts across the UK
    e-Science network that you arent supposed to use
    directly.
  • and some unwanted email addresses that get
    spammed!
  • Speak to the sysadmin and add your certificate to
    the local gridmap file.
  • Now you can use the Grid yes its that simple!

10
Using the GridOr maybe not
  • Firewall issues
  • Client machines have to be added into firewall
    tables.
  • Specific port numbers must be used.
  • Configuration problems/requests.
  • Fortunately system administrators are really
    helpful!
  • Now that the systems are beginning to be used,
    most issues should be resolved.
  • Not too many users though sysadmins respond
    quickly!
  • Continuous updates of software mean that theyll
    be issues for some time.

11
Using the GridPracticalities 1
  • Start a Grid session (obtain a proxy certificate)
  • grid-proxy-init
  • Run a job
  • E.g. globus-job-run herschel.amtp.cam.ac.uk
    /bin/ls
  • Lists the contents of your home directory on
    herschel
  • Submit a job
  • E.g. globus-job-submit farm003.hep.phy.cam.ac.uk/
    jobmanager-pbs myjob
  • Runs myjob on the HEP group farm using the PBS
    job manager
  • Returns a contact URL e.g. https//farm003.hep.ph
    y.cam.ac.uk2045/53546/21305646/
  • Job status
  • globus-job-status https//farm003.hep.phy.cam.ac.u
    k2045/53546/21305646/
  • Returns PENDING, RUNNING, DONE

12
Using the GridPracticalities 2
  • Retrieve stdout and stderr
  • globus-job-get-output out https//farm003.hep.phy
    .cam.ac.uk2045/53546/21305646/
  • But, a bug means that stdout and stderr get
    deleted after about 20 minutes, so dont rely on
    them!!
  • Redirect stdout and stderr to files instead
  • Find out information
  • You can find out static information about what
    machines are available and their capabilities and
    configuration using
  • grid-info-search x h farm003.hep.phy.cam.ac.uk
  • But, list of machines is fixed (you need
    accounts), so why bother?
  • What about dynamic information, e.g. number of
    jobs in queue, expected time before running and
    so on? This would be really useful.

13
Using the GridPracticalities 3
  • File transfer
  • Use the GSI enabled ncFTP client
  • Has many nice features (later)
  • Dont use globus-url-copy its very primitive
  • Binary issues
  • May have to compile locally
  • May have library problems
  • Simple solution restrict use to ix86 Linux
    machines and compile statically!

14
Using the GridSite Homogeneity
  • Not all sites are the same!
  • Disk quotas large files are expected to be
    placed in certain locations.
  • Firewall constraints certain port numbers have
    to be used.
  • Different job managers for each system.
  • This has major implications for scripts and
    programs
  • Ideally want to just copy the same set of
    scripts, programs and data to each site.
  • But it doesnt work because of these issues.
  • Can use MDS to find out most things and hide
    this, but it can still cause problems and adds a
    lot of extra development time. Plus, setup can
    need debugging on each system.

15
Using the GridGSIncFTP
  • Currently, using the Grid can feel very remote.
  • Time to list files in a directory can be many
    seconds as authorization is done for each
    command.
  • This is a problem as things have a tendency to go
    wrong!
  • Greatly reduces productivity in two ways
  • Latency lots of time spent waiting for a
    directory listing.
  • Psychological Grid becomes hard to use you
    end up fighting it.
  • GSIncFTP is a GSI-enabled version of the popular
    ncFTP client
  • Has many nice and usable features e.g. tab
    completion, listing of file contents, transfer of
    directories, wild-carding etc.
  • Operations are fast as authentication/authorizatio
    n are only done once at logon.
  • Look forward to using GSI-SSH
  • will give fast, normal access to Grid machines.

16
Using the GridInformation issues
  • MDS gives a wealth of information about processor
    speeds, RAM, and so on.
  • But, dynamic information is lacking. In
    particular, the status of currently running jobs
    is inadequate.
  • Callback features would be useful
  • E.g. e-mail when a job is finished or if it
    crashes.
  • Usage statistics would be useful but are missing.
  • These can be implemented manually using scripts.

17
Using the GridSummary
  • Using the Grid via Globus is not too dissimilar
    to using a batch system.
  • There are just more hoops to jump through and
    different tools to use.
  • Potentially it gives you access to a much more
    computing power than through a batch system.
  • But, with the current system of obtaining
    accounts, the UK e-Science Grid does not scale
    10 systems is probably the most it would scale
    to.
  • Middle-ware is being developed to introduce
    single sign-on across the e-Science Grid.

18
Sites used
  • RAL e-Science Centre
  • Hrothgar16 x dual 1.2 GHz AthlonMP
  • London e-Science Centre
  • Pioneer 20 x 1 GHz Athlon
  • Southampton e-Science Centre
  • Metropolis dual 450 MHz Pentium III
  • Cambridge e-Science Centre
  • Herschel 16 x dual 450 MHz Pentium II
  • Tempo 1 GHz Pentium III
  • Cambridge HEP Group
  • Farm 16 x 1.2 GHz Pentium III
  • Also has EDG software installed and no direct
    access (my Globus certificate maps onto a pooled
    account)
  • Total 104 CPUs of varying speeds

19
The Results
  • Generated 150 million events (1200 CPU hours) in
    less than 24 real hours.
  • This corresponds to all of the relevant events in
    1 year of operation.
  • Then had to analyse them took many hours of
    interactive analysis.
  • The results of this have just been submitted to
    JHEP.
  • I have instructed a colleague on how to use the
    Grid he is now using it for his own intensive
    simulations after relatively little effort.

20
The Results
21
Comparison with EDG
  • The European Data-Grid (EDG) is middle-ware that
    runs on top of Globus.
  • Users join virtual organisations (VOs). The
    system gridmap file is automatically updated from
    VO member lists. Users are mapped to pooled
    accounts.
  • Pro Single sign-on for all EDG sites.
  • Con Pooled account may change security and
    persistency issues.
  • Jobs are not submitted directly to a machine.
    They are submitted to a resource broker (RB)
    which queries currently available machines and
    submits the job to the most appropriate.

22
Comparison with EDG 2
  • EDG has support for Grid enabled storage
    including replica catalogues.
  • Cons
  • Machine administration is handed over to a
    program security issues and incompatible with
    existing systems.
  • Requires a very particular system version only
    available for commodity Linux clusters.
  • Other middle-ware is available e.g. NorduGrid
    (designed to be run on existing systems). UK
    e-Science is also developing some of its own
    middle-ware.
  • Middle-ware is essential for the e-Science Grid
    to reach its potential.

23
Current and Future Particle Physics Grid Usage
  • Large scale data challenges a portion of which
    is being generated on various Grids. Typical
    size
  • 3000 CPUs
  • 100 Terabytes of data
  • 107 events
  • Full simulation
  • 40 minutes per event
  • 150 million events gt 11,000 CPU years!
  • High multiplicity events
  • Large numbers of sub-processes split into
    separate jobs
  • 1000 jobs submitted in a short time
  • User analysis of large data sets
  • Distributed, interactive analysis of Tb of data
  • Many challenges in the current Grid environment

24
Wish list
  • Single sign on the highest priority
  • Not really a Grid until this is achieved
  • Site homogeneity
  • Or robust abstractions of all variations
  • GSI-ssh
  • Debugging will be much easier
  • More friendly
  • Available now
  • Book-keeping tools
  • To keep track of currently running jobs, status
    information, etc.

25
Conclusion
  • The UK e-Science Grid can be used now to do
    physics
  • To the point where a colleague was able to use
    the Grid relatively painlessly with education
  • But
  • It cannot be used to its full potential
  • It does not scale above about 10 sites
  • Need for middle-ware
  • There are many middle-ware packages available now
  • e-Science are developing more for their own needs
  • Soon the UK e-Science Grid will reach its
    potential
Write a Comment
User Comments (0)
About PowerShow.com