Title: What are grid computing and e-Science?
1What are grid computing and e-Science?
- Mike Mineter
- mjm_at_nesc.ac.uk
2Policy for re-use
- This presentation can be re-used for academic
purposes. - However if you do so then please let us know. We
need to gather statistics of re-use no. of
events, number of people trained. Thank you!! - If you use a significant part of the course,
sendmailtotraining-support_at_nesc.ac.uk?subjectN
GSI_course - If just this module please mailtotraining-suppo
rt_at_nesc.ac.uk?subjectNGSI_intro - THANK YOU!!!!
3Acknowledgements
- This talk was prepared by Mike Mineter of NeSC
and includes slides from previous tutorials and
talks delivered by - Dave Berry, Richard Hopkins (National e-Science
Centre) - the EDG training team
- Ian Foster, Argonne National Laboratories
- Jeffrey Grethe, SDSC
- EGEE colleagues
4Goals of this module
- To introduce the concepts of e-Science and Grid
computing assuming no previous knowledge
5Contents
- The Grid vision
- What is a grid ?
- Drivers of grid computing
- Some examples
- Current status of grids
- Are grids for you?!
6The Grid Metaphor
7The grid vision
- The grid vision is of Virtual computing (
information services to locate computation,
storage resources) - Compare The web virtual documents ( search
engine to locate them) - MOTIVATION collaboration through sharing
resources (and expertise) to expand horizons of - Research
- Commerce engineering, the knowledge economy
- Public service health, environment,
8Before Grids
9The Grid Vision
Slide derived from EDG / LCG tutorials
10Grid projects
- Many Grid development efforts all over the
world
- UK OGSA-DAI, RealityGrid, GeoDise,
Comb-e-Chem, DiscoveryNet, DAME, AstroGrid,
GridPP, MyGrid, GOLD, eDiamond, Integrative
Biology, - Netherlands VLAM, PolderGrid
- Germany UNICORE, Grid proposal
- France Grid funding approved
- Italy INFN Grid
- Eire Grid proposals
- Switzerland - Network/Grid proposal
- Hungary DemoGrid, Grid proposal
- Norway, Sweden - NorduGrid
- NASA Information Power Grid
- DOE Science Grid
- NSF National Virtual Observatory
- NSF GriPhyN
- DOE Particle Physics Data Grid
- NSF TeraGrid
- DOE ASCI Grid
- DOE Earth Systems Grid
- DARPA CoABS Grid
- NEESGrid
- DOH BIRN
- NSF iVDGL
- DataGrid (CERN, ...)
- EuroGrid (Unicore)
- DataTag (CERN,)
- Astrophysical Virtual Observatory
- GRIP (Globus/Unicore)
- GRIA (Industrial applications)
- GridLab (Cactus Toolkit)
- CrossGrid (Infrastructure Components)
- EGSO (Solar Physics)
11Contents
- The Grid vision
- What is a grid ?
- Drivers of grid computing
- Some examples
- Current status of grids
- Are grids for you?!
12A grid
- The initial vision The Grid
- The present reality Many grids
- Each grid is an infrastructure enabling one or
more virtual organisations to share computing
resources - Whats a VO?
- People in different organisations seeking to
cooperate and share resources across their
organisational boundaries - Why establish a Grid?
- Share data
- Pool computers
- Collaborate
VO
Institute
Desktop
13A computer
- The Operating System enables easy use of
- Input devices
- Processor
- Disks
- Display
14An institutes resources on a LAN
- Middleware runs on each computer
- To allow sharing of disks and printers (using,
e.g. Samba) - To share processors for computation (e.g. Condor)
- User just perceives shared resources, with no
regard to location in the building - Authenticated by username / password
- Authorised to use own files,
15Typical current grid
- Grid middleware runs on each shared resource
- Data storage
- (Usually) batch jobs on pools of processors
- Users join VOs
- Virtual organisation negotiates with sites to
agree access to resources - Distributed services (both people and middleware)
enable the grid, allow single sign-on
16What is a Grid?
- An infrastructure that enables flexible, secure,
coordinated resource sharing among dynamic
collections of individuals, institutions and
resources Ian Foster and Carl Kesselman
17Key concepts
- Virtual organisation people and resources
collaborating - crosses admin, organisational
boundaries - Single sign-on
- I connect to one machine some sort of digital
credential is passed on to any other resource I
use - Authentication How do I identify myself to a
resource without username/password for each
resource I use? - Authorisation what can I do? Determined by
- My role in a VO (role-based in near future)
- VO negotiations with resource providers
- Grid middleware runs on each resource
- User just perceives shared resources, with no
awareness of location or owning organisation
18What is Grid computing not?
- Grid computing is a trendy phrase!
- Its therefore also a misused term!
- Sometimes in Industry Grids clusters
- Motivations better use of resources scope for
commercial services - Also used to refer to the harvesting of unused
compute cycles, e.g. - SETI_at_home, Climateprediction.net
19Contents
- The Grid vision
- What is a grid ?
- Drivers of grid computing
- Some examples
- Current status of grids
- Are grids for you?!
20 The first driver e-Science
- What is e-Science? Collaborative science that is
made possible by the sharing across the Internet
of resources (data, instruments, computation,
peoples expertise...) - Often very compute intensive
- Often very data intensive (both creating new data
and accessing very large data collections) data
deluges from new technologies - Crosses organisational boundaries
21Other major drivers for grids
- e-Research not just e-Science
- Also curation and digital data libraries
(DILIGENT, DELOS, GRACE) - Commerce not just academia!!
- Politics the knowledge economy
- e-Infrastructure A shared resource
- That enables science, research, engineering,
medicine, industry, - It will improve UK / European / productivity
Collaboration
Grid
Operations, Support and training
Network infrastructure linking resource centres
22Contents
- The Grid vision
- What is a grid ?
- Drivers of grid computing
- Some examples
- Current status of grids
- Are grids for you?!
23Astronomy
- No. sizes of data sets as of mid-2002,
grouped by wavelength - 12 waveband coverage of large areas of the
sky - Total about 200 TB data
- Doubling every 12 months
- Largest catalogues near 1B objects
Data and images courtesy Alex Szalay, John
Hopkins University
24Earth Observation
- ESA missions
- 100s of Gbytes of data per day
- Grid contribution to EO
- Enhance the ability to access high level products
- Allow reprocessing of large historical archives
- Improve Earth science complex applications (data
fusion, data mining, modelling )
Derived from L. Fusco, June 2001
Federico.Carminati , EU review presentation, 1
March 2002
25DAME Grid based tools and Infer-structure for
Aero-Engine Diagnosis and Prognosis
- A Significant factor in the success of the
Rolls-Royce campaign to power the Boeing 7E7 with
the Trent 1000 was the emphasis on the new
aftermarket support service for the engines
provided via DSS. Boeing personnel were shown
DAME as an example of the new ways of gathering
and processing the large amounts of data that
could be retrieved from an advanced aircraft such
as the 7E7, and they were very impressed, DSS
2004
XTO
Companies Rolls-Royce DSS Cybula
Universities York, Leeds, Sheffield, Oxford
Engine Model
Case Based Reasoning
Signal Data Explorer
26Large Hadron Collider at CERN
- Data Challenge
- 10 Petabytes/year of data !!!
- 20 million CDs each year!
- Simulation, reconstruction, analysis
- LHC data handling requires computing power
equivalent to 100,000 of today's fastest PC
processors! - Operational challenges
- Reliable and scalable through project lifetime of
decades
Mont Blanc (4810 m)
Downtown Geneva
27Contents
- The Grid vision
- What is a grid ?
- Drivers of grid computing
- Some examples
- Current status of grids
- Are grids for you?!
28If The Grid vision leads us here
then where are we now?
29Current status
- Many key concepts identified and known
- Many grid projects have tested these
- Major efforts now on establishing
- Standards (a slow process) (Global Grid Forum,
http//www.gridforum.org/ ) - Production Grids for multiple VOs
- Production Reliable, sustainable, with
commitments to quality of service - New user communities
- whilst research development continues
- In Europe, EGEE
- In UK, NGS
- In US, Teragrid
301997- Present Globus
- A software toolkit addressing certain technical
problems in the development of Grid enabled
tools, services, and applications - Offers a modular bag of technologies
- Made available under liberal open source license
- Not turnkey solutions, but building blocks and
tools for application developers and system
integrators - Tools built on Grid Security Infrstructure to
include - Job submission (GRAM) run a job on a remote
computer - Information services So I know which computer to
use - File transfer (GridFTP) so large data files can
be transferred - Replica management so I can have multiple
versions of a file close to the computers where
I want to run jobs - Production grids are (currently) based on release
2 - http//www.globus.org/
31Computing Resources Feb 2005
- Country providing resources
- Country anticipating joining
- In LCG-2
- 113 sites, 30 countries
- gt10,000 cpu
- 5 PB storage
- Includes non-EGEE sites
- 9 countries
- 18 sites
32Grid security and trust -1
- Providers of resources (computers, databases,..)
need risks to be controlled they are asked to
trust users they do not know - Users need single sign-on logon to a machine
that can pass the users identity to other
resources - Build middleware on layer providing
- Authentication know who wants to use resource
- Authorisation know what the user is allowed to
do - Security reduce vulnerability, e.g. from outside
the firewall - Non-repudiation knowing who did what
- GSI from the Globus toolkit does this for NGS
33Grid security and trust -2
- Currently, achieved by Certification
- Users identity has to be certified by one of the
national Certification Authorities (CAs) - mutually recognized http//www.gridpma.org/
- In UK go to http//www.grid-support.ac.uk/ca/ralis
t.htm - Resources (node machines) have to be certified
by CAs - Digital certificate installed on the machine
accessed by user basis of AA - User joins a VO
- Identity passed to other resources you use, where
it is mapped to a local account the mapping is
maintained by the VO - Common agreed policies establish rights for a
Virtual Organization to use resources
34The key for new VOs
- Application development environment, portals
- Insulate applications from changing middleware
35Contents
- Definitions of e-Science and a grid
- Exploring the definitions
- Why now?!
- Some examples
- Current status of grids
- Are grids for you?!
36Are Grids for you?!
- IF a community effort is vital to achieving
goals, by sharing services of data and
computation, - AND that effort crosses organisation boundaries
- THEN yes!
- In the UK, plan to join the NGS!
- OR if you wish to use computation/storage/data
services provided on a Grid then YES!
37Summary
- Collaboration across multiple organisations
- Single sign-on to resources in multiple
organisations - Need for people-services as well as middleware
services to enable this e.g. to run - Enabling services (e.g. info service)
- Certification authority for AA
- VO management to negotiate with sites
- Helpdesk,
- Drives are towards
- Production services
- In the UK, the NGS
- In Europe, EGEE
- Standards (tomorrow)
- e-Infrastructure integration of networking
and middleware to support collaboration
38Additional slides
39Exponential Growth
Optical Fibre(bits per second)
Doubling Time(months)
Gilders Law(32X in 4 yrs)
Data Storage(bits per sq. inch)
Storage Law (16X in 4yrs)
Performance per Dollar Spent
Chip capacity( transistors)
Moores Law(5X in 4yrs)
0 1 2
3 4 5
Number of Years
Triumph of Light Scientific American. George
Stix, January 2001
40How Different 2005 is from 1995
- Enormous quantities of data Petabytes
- For an increasing number of communities
- Constraint is not collection but analysis
- Ubiquitous Internet
- gt100 million hosts
- Security and Trust are crucial issues
- Ultra-high-speed networks gt10 Gb/s
- Global optical networks
- Bottlenecks last kilometre firewalls
- Huge quantities of computing gt100 Top/s
- Moores law gives us all supercomputers
- Organising their effective use is the challenge
- Moores law everywhere
- Instruments, detectors, sensors, scanners,
- Organising their effective use is the challenge