The Grid and enabling applications for it - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

The Grid and enabling applications for it

Description:

'The collection of people, hardware, and software... Brute force attempt to crack strong encryption. Protein folding. It's not just compute cycles... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 35
Provided by: annta
Category:

less

Transcript and Presenter's Notes

Title: The Grid and enabling applications for it


1
The Grid and enabling applications for it
CCPN/TEMBLOR Workshop, Hinxton, 19th May 2004
Mark Hayes, Technical Director, Cambridge
eScience Centre
2
In the beginning
"The collection of people, hardware, and
software... will become a node in a
geographically distributed computer network.
Through the network... all the large computers
can communicate with one another. And through
them, all the members of the community can
communicate with other people, with programs,
with data, or with a selected combination of
those resources. J.C.R.Licklider, The Computer
as a Communication Device Science and
Technology, April 1968
The ARPAnet in 1970
3
International connectivity - 1991
4
International connectivity - 1997
5

International bandwidth
From 3D geographic network displays - Cox et
al, ACM Sigmod Record - December 1996
6
What does the Internet look like?
http//www.cybergeography.org/
7
The World Wide Web
Invented at CERN by Tim Berners-Lee in 1989 as a
tool for collaboration and information sharing in
the particle physics community.
8
The Grid - 1998
Editors Foster Kesselman 700 pages 22
chapters 40 authors A computational Grid is a
hardware and software infrastructure that
provides dependable, consistent, pervasive
and inexpensive access to high-end computational
capabilities. Analogy with the electrical power
grid - just plug in.
9
The Grid - 2003
Editors Berman, Hey, Fox 1000 pages 43
chapters 116 authors
Applications, data sharing and virtual
communities.
10

4 types of Grid
  • CPU intensive cycle scavenging (SETI_at_home)
  • Data sharing
  • Application provision
  • Human-human interaction (e.g. Access Grid)

11
Early distributed computing
1.2 million CPU years so far...
Brute force attempt to crack strong encryption
Protein folding
12
Its not just compute cycles...
An exponential growth in data from many areas of
science.
13

The data explosion - some big numbers
  • CFD turbulence simulations - 100TB
  • BaBar particle physics experiment - 1TB/day
  • CERN LHC will generate 1GB/s or 10PB/year
  • VLBA radio telescope generates 1GB/s today
  • NCBI/EMBL database is only 0.5TB but doubling
    each year
  • brain imaging - 4TB/brain at full colour, 10mm
    resolution
  • (4PB/brain at 1mm i.e. cellular resolution)
  • Pixar - 100TB/movie

FTP and GREP are not adequate (Jim Gray)
14

Application provision
  • Google - 10K cpus, 2PB database (2 years ago)
  • free email services - HotMail, Yahoo! 2-10PB
    storage
  • netsolve - numerical algorithms on demand
  • with Matlab Mathematica plugins
  • renderfarm.net - graphics rendering on demand

15
The Access Grid
High end video conferencing and collaboration
technology. O(100) nodes world wide.
...one of the most compelling glimpses into the
future Ive seen since I first saw NCSA Mosaic.
Larry Smarr
16
The Grid in the UK
Pilot projects in particle physics, astronomy,
medicine, bioinformatics, environmental
sciences...
Contributing to international Grid software
development efforts
10 regional eScience Centres
17
Some UK Grid resources
  • Daresbury - loki - 64 proc Alpha cluster
  • Manchester - green - 512 proc SGI Origin 3800
  • Imperial - saturn - large SMP Sun
  • Southampton - iridis - 400 proc.Intel Linux
    cluster
  • Rutherford Appleton Lab - hrothgar - 32 proc
    Intel Linux
  • Cambridge - herschel - 32 proc Intel Linux
    cluster
  • ...
  • coming soon 4x gt64 CPU JISC clusters, HPC(X)

18
Applications on the UK Grid
Ion diffusion through radiation damaged crystal
structures (Mark Calleja, Earth Sciences,
Cambridge)
  • Monte Carlo simulation lots of independent runs
  • small input output
  • more CPU -gt higher temperatures, better stats
  • access to 100 CPUs on the UK Grid
  • Condor-G client tool for farming out jobs

19
Applications on the UK Grid
Reality Grid (Stephen Pickles, Robin Pinning -
Manchester)
  • Fluid dynamics of complex mixtures, e.g
  • oil, water and solid particles (mud)
  • Used CPU at London, Cambridge
  • Remote visualisation using SGI
  • Onyx in Manchester (from a laptop
  • in Sheffield)
  • Computational steering

20
Applications on the UK Grid
GENIE - Grid Enabled Integrated Earth system
model (Steven Newhouse, Murtaza Gulamali -
Imperial)
  • Ocean-atmosphere modelling
  • How does moisture transport from the
  • atmosphere effect ocean circulation?
  • 1000 independent 4000year runs
  • (3 days real time!) on 200 CPUs
  • Flocked condor pools at London Southampton
  • Coupled modelling

21

1 buys...
  • 1 day of cpu time
  • 4 GB ram for a day
  • 1 GB of network bandwidth
  • 1 GB of disk storage
  • 10 M database accesses
  • 10 TB of disk access (sequential)
  • 10 TB of LAN bandwidth (bulk)

22

How do you move a terabyte?
Source Terascale SneaketNet, Jim Gray et al
23

Some consequences
Compute cycles are (almost) free... by comparison
with network costs. -The cheapest and fastest
way to move 1TB of data out from CERN is still by
FedEx. Though this considers only bandwidth,
low latency networks are even more expensive!
(MPI over WAN doesnt work well.)
24

What makes a good Grid application?
A distributed community of users. Tiny network
input output, huge compute requirement. Databas
e access storage is also expensive, therefore
put the computation near the data.
25
Web services
  • A web service is a network-accessible application
  • identified by a URI
  • e.g. http//terraservice.net/TerraService.asmx
    ?opGetTile
  • with an interface defined in terms of XML based
    messages
  • these messages transported by internet
    protocols (usually HTTP)
  • The application its interface definition
    should be
  • discoverable by other applications
  • independent of OS platform programming
    language.
  • W3C standards body http//www.w3c.org/

26
Acronym soup
XML - eXtensible Markup Language XSLT -
eXtensible Stylesheet Language Transformations SOA
P - Simple Object Access Protocol WSDL- Web
Service Description Language UDDI - Universal
Description, Discovery Integration
protocol BPEL - Business Process Execution
Language WSIF - Web Services Invocation
Framework ..
27
terraservice.net
Web service interface to http//terraserver.micro
soft.com/ Example app US Department
of Agriculture have a database of soil
properties, federated with terraservice.net to
provide geographical topographic detail.
28

Databases available as Web Services
  • Google
  • Amazon
  • SDSS SkyServer
  • EMBL
  • EBI-MSD
  • EBI Open Bibliographic Query Service
  • ...
  • http//www.escience.cam.ac.uk/services/dblist.html

29

Radar scattering from aircraft
Aim increase the efficiency of the aircraft
design engineering process the scale of radar
scattering simulations to otherwise intractable
objects (i.e. whole aircraft) A collaboration
between the University of Cambridge Department of
Applied Mathematics Theoretical Physics (DAMTP)
and the BAE Advanced Technology Centre at
Filton. Mark Spivack (PI), Andrew Usher
(visualisation programmer), Xiaobo Yang
(scientific programmer), CC-HPCF, new cluster,
BAE input expertise, data,...
30
  • Workflow

BAE
Cambridge
Portal
HPCF
Reflection data
Visualisation
CAD Design
31

Visualisation tools
Based on the Visualisation Toolkit - open source
C library cross platform, extendable, large
user base - http//www.vtk.org

Surface currents, virtual fly through, looking
for hot-spots
32

Increasing efficiency
The calculation can be split into a two stage
process
  • Initial long-running, high fidelity calculation
    of induced surface
  • currents on the HPCF.
  • 3D electromagnetic fields can be calculated on a
    cluster.
  • Using an approximation technique currently under
    development,
  • subsequent small changes can be re-calculated on
    the cluster.
  • In theory, this would allow interactive design of
    the aircraft without
  • the need for scheduling long-running jobs.

33

Tying it all together with Web Services
34

Questions?
Write a Comment
User Comments (0)
About PowerShow.com