Title: GFDL Data Portal
1GFDL Data Portal
- Current Status, Achievements and Future
Development
K.Dixon, V.Balaji, S.Nikonov GFDL,
Princeton
NOAATECH-2006
2History
- Data Portal was launched in 1995 as simple ftp
server. - The idea and the term Data Portal arose 3 years
ago. - Originally it served data by occasional requests.
- Now the main assets are IPCC data.
-
NOAATECH-2006
3Common technical characteristics
- Software
- Red Hat Linux
- Apache Web Server
- DODS Aggregation Server
- THREDDS
- LAS Server
- GrADS-DODS
-
NOAATECH-2006
4Hardware
- Dell Power Edge 2650 machine
- Dual Processor Intel Xeon 2.4 GHz
- 3 GB RAM
- 7 Dell Power Vault 220S with
- 14 HDs in each, 19 TB total (expansion
pending up to 35 TB) - Network bandwidth
- internet 9 Mbit/s
- internet-2 100 Mbit/s
NOAATECH-2006
5WEB Site Structure
NOAATECH-2006
6Basic Metadata
- Model description
- Experiment description
- Institution
- Extra metadata for treating tripolar grids
(including ferret scripts for their - visualization)
- Metadata is compliant with standard CF
- Metadata accompanies each data file
NOAATECH-2006
7Basic features GFDL LAS server
- Dynamic data presentation chosen by user
- Spatial/time subsampling with included metadata
- Defining on a fly new variables calculated by
given formula - ferret visualization
NOAATECH-2006
8General Statistics 01-Oct-2004 to 01-Oct-2005
-
- Total amount of CM2 Climate Model Data 12 TB
- More then 10000 NetCDF files, average file size
1 GB - Successful requests 62,000
- Average successful requests per day 200
- Distinct files requested 5,000
- Distinct hosts served 850
- Data transferred 15 TB
- Average data transferred per day 42 GB
- Number of journal articles submitted that include
analyses of GFDL CM2 model output gt 100
NOAATECH-2006
9Current standard procedure of publishing data
- Climate Model Output Rewriter (CMOR) processing
- manual configuring for different models,
experiments, variables - triggered manually
- Quality Control
- made by scientist, includes checking metadata,
time ranges, values diapasons, etc. - Splitting up CMORized, QC-ed data into small
(lt2GB) NCDF files and pushing them out of
firewall to Data Portal - manual configuring scripts doing this
- starting scripts manually
- Preparing checksum report on Data Portal
- running cron started script
- Configuring Aggregation Server and LAS
- made manually
NOAATECH-2006
10Current Data Portal workflow
NOAATECH-2006
11Desirable Features of Data Portal
- Relational Database storing metadata with
description of - model components and model configuration
- scenarios
- postprocessing (model output and CMOR)
- experiments
- variables
- formulized rules of Quality Control
- data locations in Archive
- task scheduler
- users and groups accounts
- XML as data exchange format
- for compliance with FMS Runtime Environment (FRE)
- working format of existing third party software
- good fitted for hierarchical metadata description
- prevalent in world, easy to exchange with others
Data Portals - Publisher Control Center (PCC)
- controls CMOR subsystem
- controls Data Publisher Manager
- controls data quality (QAC)
NOAATECH-2006
12Desirable Features of Data Portal(continue)
- Climate Model Output Rewriter (CMOR) subsystem
- prepares data consistently with specific project
requirements - Data Publisher Manager
- transfers data to target destination in
accordance to settings from DB - Front-end Data Portal Software Package
- Configuration Manager (configures Aggregation
Server and Data Portal Interface) - Search Catalog Engine
- Data Subsampling Engine
- Data Computation Engine
- Data Visualization
- Data Delivery Manager
NOAATECH-2006
13Proposed functionality schema of GFDL Data
Factory
NOAATECH-2006
14Standard scenario of functioning Model Data
Factory (ideal picture)
- Scientist builds model in existing GFDL FMS
Runtime Environment System (FRE) using available
model components, datasets and forcing scenario. - FRE puts metadata about built model, scenario,
experiment into curator DB and runs experiment
- Postprocessing subsystem extracts metadata about
postprocessing plan from curator DB and
executes it, and on finish puts metadata about
processed experiment back into DB. - Data Publisher (DP) regularly checks curator DB
for new experiments marked as public and if
finds any invokes CMOR. - CMOR goes to curator DB for metadata and
processes needed data following metadata
instructions. - DP calls QAC and then transfers data to Data
Portal storage. - Configuration Manager configures Aggregation
Server and Data Portal Interface and puts records
about new public data in curator DB. - End of process, data is ready to go.
NOAATECH-2006
15Database Compartments
Database curator design
- Model Metadata Compartment
- contains models descriptions, allows to build
coupled model of needed configuration -
- Variables Compartment
- List of all related physical variables
- Workflow Compartment
- contains scenarios, experiments, institutions,
projects and users info - Postprocessing Compartment
- defines postprocessing plan for conducting
experiment - Data Portal Compartment
- contains info about experiment data
NOAATECH-2006
16Interaction between compartments
NOAATECH-2006
17NOAATECH-2006
18Model Metadata Compartment(in development)
Workflow Compartment
Experiments
Variables Compartment
NOAATECH-2006
19Data Samples from Model Compartment
NOAATECH-2006
20Variables Compartment
Workflow Compartment
NOAATECH-2006
21Data Sample from Variables Compartment
NOAATECH-2006
22Workflow Compartment
NOAATECH-2006
23Data Samples from Workflow Compartment
NOAATECH-2006
24Postprocessing Compartment
Data Samples from Postprocessing Compartment
NOAATECH-2006
25Data Portal Compartment
NOAATECH-2006
26Data Samples from Data Portal Compartments
NOAATECH-2006
27Curator DB on Data Portal stream
- Curator DB is already used on GFDL Data Portal.
- JSP technology with servlets on backend was
applied - New data transferred onto Data Portal is
automatically registered in Curator DB with all
accompanied metadata. - It turned out the fastest way to search for data
on Data Portal - CM2.0
- CM2.1
NOAATECH-2006
28Another Aspects of Future Development
- Set up model metadata schema standards in
scientific community and develop SQL metadata
schema. - Populate Curator with real metadata extracted
from GFDL models. - Conjugate Curator DB with GFDL FMS Modeling
System - Customize LAS server to use the Curator DB
- Design user interfaces
NOAATECH-2006
29NOAATECH-2006