Earth System Modelling - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Earth System Modelling

Description:

(Kerstin Kleese, Roy Lowry, Kevin O'Neill, Andrew Woolf & others) ... British Atmospheric Data Centre (BADC, PI: Bryan Lawrence) ... 2 Mbit/s link ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 42
Provided by: bryanla
Category:

less

Transcript and Presenter's Notes

Title: Earth System Modelling


1
Earth System Modelling the NDG
  • Bryan Lawrence
  • (Kerstin Kleese, Roy Lowry, Kevin ONeill, Andrew
    Woolf others)
  • NCAS/British Atmospheric Data Centre
  • Rutherford Appleton Laboratory, CCLRC

2
NDG Partners
  • As funded a partnership between
  • British Atmospheric Data Centre (BADC, PI Bryan
    Lawrence)
  • British Oceanographic Data Centre (BODC, Co-I
    Roy Lowry)
  • CLRC E-science Centre (Co-I Kerstin Kleese)
  • PCMDI at LNL in the US (Dean Williams, Bob Drach,
    Mike Fiorino)
  • Project has caught the imagination, extra funding
    now supports
  • A number of groups at the NERC Centre for Ecology
    and Hydrology
  • (CEH Ecology DataGrid)
  • NERC Earth Observation Data Centre Plymouth
    Marine Lab Remote Sensing
  • Not directly funded major collaborators will
    include
  • ClimatePrediction.net, GODIVA (NERC e-science
    projects)
  • NCAS/CGAM The Centre for Global Atmospheric
    Modelling at the University of Reading (via Lois
    Stenman-Clark and Katherine Bouton)
  • Project will support HIGEM

3
Outline
  • Motivation
  • The NDG Goals
  • NDG Metadata
  • Networks
  • Summary

4
British Atmospheric Data Centre
The Role Key words Curation and Facilitation!
5
Easily catalogued, but successful preservation?
Phaistos Disk, 1700BC
One could argue that the writers of these
documents did a brilliant job of preserving the
bits-and-bytes of their time And yes theyve
both been translated many times, its a shame
the meanings are different
6
NERC Metadata Gateway - SST
  • No clean handover from discovery to browse and
    use!
  • Geospatial coordinates forgotten. Time reference
    forgotten. Need to get entire field(s), and find
    correct time!
  • And if I want to compare data from different
    locations?
  • - multiple logins
  • - multiple formats
  • - discovery?

7
How good is our metadata?
  • A priori would any user know to look in the
    COAPEC data set?
  • Earth system-science means we have to remove
    these boundaries!
  • detailed file level metadata isnt visible, and
    so data mining applications impossible.

NB Dynamic catalogues!
8
Finding Data
The Goal Very simple interface, hide the complex
software!
9
A newer dataset
The extreme relevance of this example from Amazon
was pointed out by Jon Callahan (LAS project,
PMEL)!
10
PCMDI Best practice!
Final references are papers!
Is the information coupled to the datasets? What
if I take a dataset home, and another, and
another and then forget which is which?
(if you know where to look)
Can I ask the question what datasets used the
Semtner sea ice parameterisation?
11
Huge variety of Data Sets
12
Different types of data returned Wallingford
Supporting very diverse user community NetCDF is
not enough
13
Modelling advances Baseline Numbers
  • T42 CCSM (current, 280km)
  • 7.5GB/yr, 100 years -gt .75TB
  • T85 CCSM (140km)
  • 29GB/yr, 100 years -gt 2.9TB
  • T170 CCSM (70km)
  • 110GB/yr, 100 years -gt 11TB

14
Capacity-related Improvements
Increased turnaround, model development, ensemble
of runs Increase by a factor of 10, linear
data
  • Current T42 CCSM
  • 7.5GB/yr, 100 years -gt .75TB 10 7.5TB

15
Capability-related Improvements
Spatial Resolution T42 -gt T85 -gt T170 Increase
by factor of 10-20, linear data Temporal
Resolution Study diurnal cycle, 3 hour
data Increase by factor of 4, linear data
CCM3 at T170 (70km)
16
Capability-related Improvements
Quality Improved boundary layer, clouds,
convection, ocean physics, land model, river
runoff, sea ice Increase by another factor of
2-3, data flat Scope Atmospheric chemistry
(sulfates, ozone), biogeochemistry (carbon
cycle, ecosystem dynamics), middle Atmosphere
Model Increase by another factor of 10, linear
data
17
Model Improvement Wishlist
Grand Total Increase compute by a Factor
O(1000-10000)
18
Climate in 20010 A graphic Illustration
Figures from Gary Strand, NCAR, ESG website
19
Summary thus far
  • Contentions
  • The average atmospheric scientific project
    involves about 1/3 of the time data handling!
    (Getting, reformatting etc).
  • The problem for earth system model projects is
    about to get worse for everyone, from the
    initiator, to the archiver, to the analyst, to
    the contributor, to the improver.
  • (Remember the documentation problem is growing
    exponentially too new sub-components etc)

20
The NERC DataGrid
21
(No Transcript)
22
The Data Use Chain
23
Requirements Information (1)
Scientist are are real people too Jon Callahan
(from the LAS project at PMEL)
  • Amazon Discovery gives good examples
  • Browse
  • Similar datasets
  • Details
  • Content examples
  • Our domain Issues
  • require
  • Dealing with Volume
  • Formats
  • Providing Tools

Learn from the library and book handling
community!
All require documentation (aka metadata) We
need to improve our information handling
24
NDG Metadata Taxonomy
25
What is metadata?
The answer depends on who you are!
26
NDG A and B metadata in practice
  • Clear separation of function between use and
    discovery.
  • Standards Compliant
  • Avoid tie-in to details of particular fields or
    data formats or even components
  • Metadata model (B)
  • Intermediate schema, supports multiple
    discovery formats
  • NDG Data Model (A).
  • provides an abstract semantic model for the
    structure of data within NDG,
  • enables the specification of concrete instances
    for use by NDG Data Services

27
(B) Metadata Model

28
(B) Metadata Model an NDG Intermediate Schema,
Conceptual Overview
29
NDG Discovery Service Element
Traditional and Grid Service (GT3) Interfaces
30
NDG Semantic Data Model (A)
31
NDG Prototype
Layout not important (yet!) Its whats under the
hood that counts ( the data is NOT in NetCDF.
The original data is available the search
covered data that could have been harvested
the architecture works!)
32
NDG Metadata Status
  • We have built a SIMPLE prototype based primarily
    on our data model and used our structures to
    find, locate, reformat and deliver data typical
    of BODC and BADC observational data. (This is a
    first)
  • We are about to re-engineer.
  • Key issues to address will be
  • Vocabularies, and
  • Ontologies
  • Developing a Model Attribute Language (with CGAM,
    PRISM, PCMDI and others).
  • Populating our metadata a boring and laborious
    job!

33
Metadata Origins
Consider a hierarchy of data users beginning with
an individual scientist,
who may herself be part of a research group,

itself part of a community
sharing resources, lying in the wider internet
To be well integrated the metadata should have a
role at each level!
(The data portal client and server interface may
be different at each level).
At each level extra metadata will be required,
probably produced by dedicated staff at the
research group, or data centre.
34
Requirements (2)
  • We need to think about our networks and our tools
    for moving and keeping track of data!
  • We cant rely on the leave it at the
    supercomputer site
  • How do we do joint analysis?
  • How do we process the data at all?
  • Malcolm Atkinson quoting Jim Gray pointed out
    that it takes
  • o(minute) to grep or ftp a GB
  • o(2 days) to grep or ftp a TB
  • o(3 years) to grep or ftp a PB
  • Requires
  • sophisticated fire and forget file transfer
    (that has to out perform sneaker net).
  • Disk and compute resources for processing.

35
SuperJanet4
  • We need to address
  • local firewall issues (not just at the Met
    Office)
  • spur bandwidths. The limits are not in the
    backbones!
  • 2 Mbit/s link
  • 80 minutes to transfer 500 MB cf 40 minutes with
    GridFTP, or less than 1 minute between DL and RAL
    (1 Gbit/s)

36
ESG1 Results (Supercomputing, 2001)
Dallas to Chicago
Allcock et al. 2001
37
Starting with the LAS
Deployment for UK users within a few weeks
(constraint is primarily access control)
38
LAS Simple Box fill Output
Work for us to do Labelling is inadequate as yet
..
39
Cache management in LAS/CDAT
Cache also checks if enough room, deletes oldest
files if necessary and checks against disk space
limit.
40
Summary
  • Earth System Modelling extends the data handling
    challenge.
  • We need better information management
  • We need better tools for moving things around
  • We need better tools for using remote data
  • and we need data manipulation hardware!
  • The NDG is attempting (with help) to address
  • Information management
  • Data movement
  • Tools to manipulate large volumes of data.

41
Youve gone TOO FAR!
Write a Comment
User Comments (0)
About PowerShow.com