Title: IBM UK
1 IBM TCG Symposium Prof. Malcolm
Atkinson Director www.nesc.ac.uk 21st May
2003
2Outline
- What is e-Science?
- UK e-Science
- UK e-Science Roles and Resources
- UK e-Science Projects
- NeSC e-Science Institute
- Events
- Visitors
- International Programme
- Scientific Data Curation
- Data Access Integration
- Data Analysis Interpretation
- e-Science driving Disruptive Technology
- Economic impact, Mobile Code, Decomposition
- Global infrastructure, optimisation management
- Dont care where computing
3What is e-Science?
4Foundation for e-Science
- e-Science methodologies will rapidly transform
science, engineering, medicine and business - driven by exponential growth (1000/decade)
- enabling a whole-system approach
sensor nets
5Convergence Ubiquity
Multi-national, Multi-discipline,
Computer-enabled Consortia, Cultures Societies
New Opportunities, New Results, New Rewards
6UCSF
UIUC
From Klaus Schulten, Center for Biomollecular
Modeling and Bioinformatics, Urbana-Champaign
7global in-flight engine diagnostics
100,000 engines 2-5 Gbytes/flight 5 flights/day
2.5 PB/day
Distributed Aircraft Maintenance Environment
Universities of Leeds, Oxford, Sheffield York
8Tera ? Peta Bytes
- RAM time to move
- 15 minutes
- 1Gb WAN move time
- 10 hours (1000)
- Disk Cost
- 7 disks 3500 (SCSI)
- Disk Power
- 100 Watts
- Disk Weight
- 5.6 Kg
- Disk Footprint
- Inside machine
- RAM time to move
- 2 months
- 1Gb WAN move time
- 14 months (1 million)
- Disk Cost
- 6800 Disks 490 units 32 racks 4.7 million
- Disk Power
- 100 Kilowatts
- Disk Weight
- 33 Tonnes
- Disk Footprint
- 60 m2
Now make it secure reliable!
May 2003 Approximately Correct
9e-Science in the UK
10UK 2000 Spending Review
From presentation by Tony Hey
11Additional UK e-Science Funding
- First Phase 2001 2004
- Application Projects
- 74M
- All areas of science and engineering
- gt60 Projects
- 340 at first All Hands Mtg
- Core Programme
- 35M
- Collaborative industrial projects
- 80 Companies
- gt 30 Million
- Second Phase 2003 2006
- Application Projects
- 96M
- All areas of science and engineering
- Core Programme
- 16M 25M (?)
- Core Grid Middleware
EU money ! 40M Janet upgrade HPC(x) 55M
12e-Science and SR2002
- 2004-6 2001-4
- MRC 13.1M (8M)
- BBSRC 10.0M (8M)
- NERC 8.0M (7M)
- EPSRC 18.0M (17M)
- HPC 2.5M (9M)
- Core Prog. 16.2M ? (15M) 20M
- PPARC 31.6M (26M)
- ESRC 10.6M (3M)
- CLRC 5.0M (5M)
13National e-Science Centre
14NeSC in the UK
You are here
Edinburgh
Glasgow
Newcastle
Belfast
Manchester
Daresbury Lab
Cambridge
Oxford
Hinxton
RAL
Cardiff
London
Southampton
15www.nesc.ac.uk
16UK Grid Operational Heterogeneous
- Currently a Level-2 Grid based on Globus Toolkit
2 - Transition to OGSI/OGSA will prove worthwhile
- There are still issues to be resolved
- OGSA definition / delivery
- Hosting environments Platforms
- Combinations of Services supported
- Material and grids to support adopters
- A schedule of transitions should be
(approximately provisionally) published - Expected time line
- Now GT2 L2 service GT3 M/W development
evaluation - Q3 Q4 2003 GT2 L3 GT3 L1
- Q1 Q2 2004 significant project transitions to
GT3 L2/L3 - Late Q4 2004 most projects have transitioned
end GT2 L3
17e-Science Institute
18e-Science Institute Past Programme of Events
- Planned 6 two-week research workshops / year
- Actually ran 48 events in first 12 months!
- Highlights
- GGF5, HPDC11 and a cluster of workshops
- Protein Science, Neuroinformatics,
- Major training events
- Steve Tuecke Grid Globus (2)
- Web Services, DiscoveryLink, Relational DB
design,
- e-SI Clientele and Outreach (year 1)
- gt 2600 individuals
- From gt 500 organisations
- 236 speakers
- Many participants return frequently
19Data Access Integration
20Biology Medicine
- Extensive Research Community
- gt1000 per research university
- Extensive Applications
- Many people care about them
- Health, Food, Environment
- Interacts with virtually every discipline
- Physics, Chemistry, Nanoengineering,
- 450 Databases relevant to bioinformatics
- Heterogeneity, Interdependence, Complexity,
Change, - Wonderful Scientific Questions
- How does a cell work?
- How does a brain work?
- How does an organism develop?
- Why is the biosphere so stable?
- What happens to the biosphere when the earth
warms up?
1 petabyte digital data / hospital / year
21Database Growth
PDB Content Growth
22ODD-Genes
PSE
23Scientific Data
- Challenges
- Data Huggers
- Meagre metadata
- Ease of Use
- Optimised integration
- Dependability
- Opportunities
- Global Production of Published Data
- Volume? Diversity?
- Combination ? Analysis ? Discovery
- Opportunities
- Specialised Indexing
- New Data Organisation
- New Algorithms
- Varied Replication
- Shared Annotation
- Intensive Data Computation
- Challenges
- Fundamental Principles
- Approximate Matching
- Multi-scale optimisation
- Autonomous Change
- Legacy structures
- Scale and Longevity
- Privacy and Mobility
24Infrastructure Architecture
Data Intensive X Scientists
Data Intensive Applications for Science X
Simulation, Analysis Integration Technology for
Science X
Generic Virtual Data Access and Integration Layer
OGSA
OGSI Interface to Grid Infrastructure
Compute, Data Storage Resources
Distributed
Virtual Integration Architecture
25Draft Specification for GGF 7
26Disruptive e-Science Drivers?
27Mohammed Mountains
- Petabytes of Data cannot be moved
- It stays where it is produced or curated
- Hospitals, observatories, European Bioinformatics
Institute, - Distributed collaborating communities
- Expertise in curation, simulation analysis
- Distributed diverse data collections
- Discovery depends on insights
- Tested by combining data from many sources
- Using sophisticated models algorithms
- What can you do?
28Move computation to the data
- Assumption code size ltlt data size
- Develop the database philosophy for this?
- Queries are dynamically re-organised bound
- Develop the storage architecture for this?
- Compute on disk? (SoC space on disk chips??)
- Safe hosting of arbitrary computation
- Proof-carrying code for data and compute
intensive tasks robust hosting environments - Provision combined storage compute resources
- Decomposition of applications
- To ship behaviour-bounded sub-computations to
data - Co-scheduling co-optimisation
- Data Code (movement), Code execution
- Recovery and compensation
29Software Changes
- Integrated Problem Solving Environments
- Users application developers see
- Abstract computer and storage system
- Where and how things are executed can be ignored
- Diversity, detail, ownership, dependability, cost
- Explicit and visible
- Increasing sophistication of description
- Metadata for discovery
- Metadata for management and optimisation
- Applications developed dynamically by composition
- Mobile, Safe Re-organisable Code
- Predictable behaviour
- Decomposition re-composition
- New programming languages understanding needed
30Organisational Cultural Changes
- Access to Computation Data must be simple
- All use a computational, semantic, data-rich web
- Responsibility of data publishers
- Cost, dependability, trustworthy, capable,
flexibility, - Shared contributions compose indefinitely
- Knowledge accumulation and interdependence
- Contributor recognition and IPR
- Complexity and management of infrastructure
- Always on
- Must be sustained
- Paid for
- Hidden
Health, Energy, Finance, Government , Education
Games _at_ Home
31Comments Questions Please
www.ogsadai.org.uk
www.nesc.ac.uk
32Extra slides?
33DAI basic Services
34DAIT basic Services
1a. Request to Registry for
sources of data about x
Data
y
Registry
1b. Registry
responds with
Factory handle
2a. Request to Factory for access and
integration from resources Sx and Sy
Factory
2c. Factory
returns handle of GDS to client
3b. Client
2b. Factory creates
tells
GridDataServices network
analyst
Client
3a. Client submits sequence of
scripts each has a set of queries
GDTS
to GDS with XPath, SQL, etc
1
XML
Analyst
GDS
GDTS
database
GDS
2
S
x
GDS
S
y
3c. Sequences of result sets returned to
Relational
analyst as formatted binary described in
GDTS
GDS
GDS
2
3
a standard XML notation
database
1
GDS
GDTS