GFDL Data Portal - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

GFDL Data Portal

Description:

Data Portal was launched in 1995 as simple ftp server. The idea and the term 'Data Portal' arose 3 years ago. ... Climate Model Output Rewriter (CMOR) processing ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 29
Provided by: gfdl9
Category:
Tags: gfdl | data | portal | rewriter

less

Transcript and Presenter's Notes

Title: GFDL Data Portal


1
GFDL Data Portal
  • Current Status, Achievements and Future
    Development

K.Dixon, V.Balaji, S.Nikonov GFDL,
Princeton
NOAATECH-2006
2
History
  • Data Portal was launched in 1995 as simple ftp
    server.
  • The idea and the term Data Portal arose 3 years
    ago.
  • Originally it served data by occasional requests.
  • Now the main assets are IPCC data.

NOAATECH-2006
3
Common technical characteristics
  • Software
  • Red Hat Linux
  • Apache Web Server
  • DODS Aggregation Server
  • THREDDS
  • LAS Server
  • GrADS-DODS

NOAATECH-2006
4
Hardware
  • Dell Power Edge 2650 machine
  • Dual Processor Intel Xeon 2.4 GHz
  • 3 GB RAM
  • 7 Dell Power Vault 220S with
  • 14 HDs in each, 19 TB total (expansion
    pending up to 35 TB)
  • Network bandwidth
  • internet 9 Mbit/s
  • internet-2 100 Mbit/s

NOAATECH-2006
5
WEB Site Structure
NOAATECH-2006
6
Basic Metadata
  • Model description
  • Experiment description
  • Institution
  • Extra metadata for treating tripolar grids
    (including ferret scripts for their
  • visualization)
  • Metadata is compliant with standard CF
  • Metadata accompanies each data file

NOAATECH-2006
7
Basic features GFDL LAS server
  • Dynamic data presentation chosen by user
  • Spatial/time subsampling with included metadata
  • Defining on a fly new variables calculated by
    given formula
  • ferret visualization

NOAATECH-2006
8
General Statistics 01-Oct-2004 to 01-Oct-2005
  • Total amount of CM2 Climate Model Data 12 TB
  • More then 10000 NetCDF files, average file size
    1 GB
  • Successful requests 62,000
  • Average successful requests per day 200
  • Distinct files requested 5,000
  • Distinct hosts served 850
  • Data transferred 15 TB
  • Average data transferred per day 42 GB
  • Number of journal articles submitted that include
    analyses of GFDL CM2 model output gt 100

NOAATECH-2006
9
Current standard procedure of publishing data
  • Climate Model Output Rewriter (CMOR) processing
  • manual configuring for different models,
    experiments, variables
  • triggered manually
  • Quality Control
  • made by scientist, includes checking metadata,
    time ranges, values diapasons, etc.
  • Splitting up CMORized, QC-ed data into small
    (lt2GB) NCDF files and pushing them out of
    firewall to Data Portal
  • manual configuring scripts doing this
  • starting scripts manually
  • Preparing checksum report on Data Portal
  • running cron started script
  • Configuring Aggregation Server and LAS
  • made manually

NOAATECH-2006
10
Current Data Portal workflow
NOAATECH-2006
11
Desirable Features of Data Portal
  • Relational Database storing metadata with
    description of
  • model components and model configuration
  • scenarios
  • postprocessing (model output and CMOR)
  • experiments
  • variables
  • formulized rules of Quality Control
  • data locations in Archive
  • task scheduler
  • users and groups accounts
  • XML as data exchange format
  • for compliance with FMS Runtime Environment (FRE)
  • working format of existing third party software
  • good fitted for hierarchical metadata description
  • prevalent in world, easy to exchange with others
    Data Portals
  • Publisher Control Center (PCC)
  • controls CMOR subsystem
  • controls Data Publisher Manager
  • controls data quality (QAC)

NOAATECH-2006
12
Desirable Features of Data Portal(continue)
  • Climate Model Output Rewriter (CMOR) subsystem
  • prepares data consistently with specific project
    requirements
  • Data Publisher Manager
  • transfers data to target destination in
    accordance to settings from DB
  • Front-end Data Portal Software Package
  • Configuration Manager (configures Aggregation
    Server and Data Portal Interface)
  • Search Catalog Engine
  • Data Subsampling Engine
  • Data Computation Engine
  • Data Visualization
  • Data Delivery Manager

NOAATECH-2006
13
Proposed functionality schema of GFDL Data
Factory
NOAATECH-2006
14
Standard scenario of functioning Model Data
Factory (ideal picture)
  • Scientist builds model in existing GFDL FMS
    Runtime Environment System (FRE) using available
    model components, datasets and forcing scenario.
  • FRE puts metadata about built model, scenario,
    experiment into curator DB and runs experiment
  • Postprocessing subsystem extracts metadata about
    postprocessing plan from curator DB and
    executes it, and on finish puts metadata about
    processed experiment back into DB.
  • Data Publisher (DP) regularly checks curator DB
    for new experiments marked as public and if
    finds any invokes CMOR.
  • CMOR goes to curator DB for metadata and
    processes needed data following metadata
    instructions.
  • DP calls QAC and then transfers data to Data
    Portal storage.
  • Configuration Manager configures Aggregation
    Server and Data Portal Interface and puts records
    about new public data in curator DB.
  • End of process, data is ready to go.

NOAATECH-2006
15
Database Compartments
Database curator design
  • Model Metadata Compartment
  • contains models descriptions, allows to build
    coupled model of needed configuration
  • Variables Compartment
  • List of all related physical variables
  • Workflow Compartment
  • contains scenarios, experiments, institutions,
    projects and users info
  • Postprocessing Compartment
  • defines postprocessing plan for conducting
    experiment
  • Data Portal Compartment
  • contains info about experiment data

NOAATECH-2006
16
Interaction between compartments
NOAATECH-2006
17
  • MySQL DB CURATOR

NOAATECH-2006
18
Model Metadata Compartment(in development)
Workflow Compartment
Experiments
Variables Compartment
NOAATECH-2006
19
Data Samples from Model Compartment
NOAATECH-2006
20
Variables Compartment
Workflow Compartment
NOAATECH-2006
21
Data Sample from Variables Compartment
NOAATECH-2006
22
Workflow Compartment
NOAATECH-2006
23
Data Samples from Workflow Compartment
NOAATECH-2006
24
Postprocessing Compartment
Data Samples from Postprocessing Compartment
NOAATECH-2006
25
Data Portal Compartment
NOAATECH-2006
26
Data Samples from Data Portal Compartments
NOAATECH-2006
27
Curator DB on Data Portal stream
  • Curator DB is already used on GFDL Data Portal.
  • JSP technology with servlets on backend was
    applied
  • New data transferred onto Data Portal is
    automatically registered in Curator DB with all
    accompanied metadata.
  • It turned out the fastest way to search for data
    on Data Portal
  • CM2.0
  • CM2.1

NOAATECH-2006
28
Another Aspects of Future Development
  • Set up model metadata schema standards in
    scientific community and develop SQL metadata
    schema.
  • Populate Curator with real metadata extracted
    from GFDL models.
  • Conjugate Curator DB with GFDL FMS Modeling
    System
  • Customize LAS server to use the Curator DB
  • Design user interfaces

NOAATECH-2006
29
  • END
  • Questions?
  • Thanks!

NOAATECH-2006
Write a Comment
User Comments (0)
About PowerShow.com