Title: LEAD Tools THRUST Group
1LEAD Tools THRUST Group
Slides and Background Materialsfor LEAD All
Hands Meeting June 2004 Presentation at the
meeting to be given by Sara Graves and Mohan
Ramamurthy
2Need more than high level concepts?
3Main Issues for Tools Thrust
- Settling on the overall system architecture
(driven by scenarios) with emphasis on component
dependencies - Agreeing on a priority order and schedule for
implementing the tools at each testbed - Defining input/output interfaces among the tools
in order to develop an interoperable set - Establishing a mechanism for turning tools into
web/grid services and then creating those
services - Implementing the workflow orchestration that
chains the tools together into useful end-to-end
systems
4Key Tools
- ADAM for data mining
- ADAS for data assimilation
- Decoders for format transformation
- IDV visualization
- LDM/IDD for real-time data delivery
- OPeNDAP, ESML/OPeNDAP, ADDE data services
- THREDDS catalog generation and services
- WRF model
- gridFTP for file transfer
5Comprehensive Mining Testbed ComponentsDynamic
event detection and response using LEAD testbed
technologies
Mesocyclone Detection
Event
Notify
3
email
Yes
Mining
Store Events In Data Pool
Application Data Access
Local NWS radar
5
Data File Locations and other metadata
6
Online Data Pool
DODS Server
THREDDS Catalog
IDV
generate
4
1
8
ESML
Other data
LDM
METAR
NEXRAD Cache
Data Pool Order and FTP Access
Decoded
2
ESML
NCEPModels
7
Data Access
WCS
Maps
WMS
Subsetted Data
Map Access
OGC Viewer
6Mining Testbed Component Explanations
- Test beds receive LEAD data sets via LDM from
upstream sites. - The LDM nodes are configured to decode and cache
the incoming data streams to the sitess online
data archive. - Mesocyclone detection and other ADaM mining
algorithms are run on the NEXRAD incoming stream
in near-real time in an effort to quickly target
developing weather situations. A notification
service is triggered by the event detection to
alert listeners and possibly automatically send
them subscribed data sets based on the ongoing
event. - The UAH Data Pool provides online access to
global passive microwave data and LEAD regional
data. The Data Pool provides OpenDAP (DODS), FTP
and HTTP access to the data sets. - Applications will primarily utilize OpenDAP
protocols for data access.
- THREDDS catalogs are generated daily (or more
often if necessary) to provide metadata to
applications on the location of data sets. - The Data Visualization and Access workflow
contains OpenGIS-compliant data services that
provide public data access and visualization
capabilities to data pool contents. - Users will receive notification of detected
events and will be able to access the data
through usual FTP/HTTP, applications such as IDV
using OpenDAP protocols, and through OpenGIS
protocols such as WMS and WCS.
7Phased Implementation NeededBut with concurrent
work on some aspects of each phase
- Define high level, comprehensive architecture(s)
such as the mining diagram shown earlier - Start by building minimal end-to-end system (show
later) - Define interfaces among tools to facilitate
parallel work - Add remaining key tools
- Convert tools to web services
- Develop workflow orchestration for existing tools
- Integrate tools and services into LEAD portal and
MyLEAD - Incorporate web services and workflow into GRID
framework
8Initial End-to-End SystemWalk before we try to
run
7
WRFoutput to IDVinput Decoder
Eta to WRFinput Decoder
IDV
LDM
WRF Regional Model
9Minimal System Component Explanations
- Initial (3 month) Components
- Test beds receive LEAD NCEP ETA data set via LDM.
- The LDM nodes are configured to decode and cache
the incoming data streams into a form suitable
for initializing the WRF regional model and store
the decoded datasets into the sitess online data
archive. - WRF regional model reads the decoded data from
the testbed archive and runs regional model. - WRF regional model output is stored into testbed
data store. - Applications will initially utilize OpenDAP
protocols for data access. - THREDDS catalogs are generated for all datasets
on an ongoing basis - Initial visualization of datasets is via the IDV.
- Next Steps (1 year)
- Building on experience with minimal system,
define interfaces for additional tools and
datasets - Incorporate ADDE and ESML/OPeNDAP for serving
other data types - Replace NCEP ETA initialization for WRF with
ADAS-based true data assimilation system. - Utilize ADAM data mining for model trigger and
guidance mechanism (as shown in the Comprehensive
diagram. - Develop LEAD-specific visualization facilities
for all data types. - Construct orchestrated workflow for minimal
system components - Select several tools and develop web prototype
web services versions
10Advantages of Minimalist Approach
- Gets end-to-end system running soon (can
demonstrate now with Workstation ETA in place of
WRF) - Clarifies remaining tool interface work
- Provides examples of working tools for
conversion to Web services - Establishes working testbed of tools for workflow
orchestration group - Gives us working system to demonstrate and
experiment with - Allows each group to make progress in own area of
expertise in parallel
11Unidata Focus
- Tailor IDV for LEAD
- Replace Unidata Workstation ETA with WRF
- Modify current ETA decoder to create WRF
initialization data - Create WRF output decoder for CF conventions
- Support other testbeds implementing LDM,
decoders, THREDDS, OPeNDAP, and ADDE - Work with OU on ADAS for initialization
- Work with UAH to incorporate ADAM and
ESML/OPeNDAP - Work with CS experts on conversion of tools to
orchestrated services
12Next Steps
- Discuss and refine the suggested approach to
tools planning - Agree on overall plan
- Develop very specific plans and commitments for
each set of tools developers for 3-month time
frame - Develop specific plans for 1-year time frame
- Get realistic tools implementation commitments
from testbed sites - At 3-month intervals, each tools developer and
testbed site reports on progress and revised
goals for next 3 months - Based on progress reports and revised goals for
each individual group, publish a revised overall
plan annually
13Component Tools and Services
- Infrastructure Tools and Services
- Data formats
- RT data transport
- On-Demand data transport
- Metadata formats
- Catalog services
- Portal
- Modeling and Analysis Tools
- Assimilation
- Models
- Mining
- Visualization
14Infrastructure Tools and Services
- Data formats
- RT data transport
- On-Demand data transport
- Metadata formats
- Catalog services
- Portal
15Data Formats and Representations
Tool netCDF Desc Network Common Data Form Main
Institution(s) Unidata Availability
Now Expected installations Some test
sites Prerequisites none
Tool McIDAS AREA files Main Institution(s)
Unidata? Availability Now? Expected
installations Some test sites Prerequisites none
Tool NcML Desc netCDF metadata in XML Main
Institution(s) Unidata Availability
Now Expected installations Some test
sites Prerequisites none
Tool GML Main Institution(s) OGC Availability
Now Expected installations Some
archives Prerequisites none
16Initial Test Bed Data Sets
- METARS (Meteorology Aviation Routine report)
- 12-hourly upper-air balloon soundings (aka
rawinsondes, radiosondes) - 5-minute ACARS
- NEXRAD Level II
- NEXRAD Level III
- (GOES) visible and infrared imagery
- Eta Forecast Model
Expect all products available via LDM.
Individual sites will decide how much of each to
cache. LDM ID Data Set -------------------------
--------------------------------------------------
------ IDS/DDPLUS METARS IDS/DDPLUS 12-hourly
balloon soundings PCWS 5-minute
ACARS NEXRD2 NEXRAD Level II NNEXRAD NEXRAD
Level III UNIWISC (GOES) visible and infrared
CONDUIT Eta Forecast Model
17NetCDF
NetCDF (network Common Data Form) is an interface
for array-oriented data access and a library that
provides an implementation of the interface. The
netCDF library also defines a machine-independent
format for representing scientific data.
Together, the interface, library, and format
support the creation, access, and sharing of
scientific data. The netCDF software was
developed at the Unidata Program Center in
Boulder, Colorado.
18NetCDF Markup Language (NcML)
NcML is an XML representation of netCDF metadata,
(roughly) the header information one gets from a
netCDF file with the "ncdump -h" command. NcML is
similar to the netCDF CDL (network Common data
form Description Language), except, of course, it
uses XML syntax.
19McIDAS AREA
20Geography Markup Language (GML)
Geography Markup Language is an XML grammar
written in XML Schema for the modelling,
transport, and storage of geographic information.
21Real Time Data Systems and Transport Services
Tools LDM/IDD/Decoders Desc Data
Streaming/Transport Main Institution(s)
Unidata Availability Now Expected installations
All testbed sites Prerequisites None
22IDD/LDM
23On-Demand Web Data Transport Services
Tool OPeNDAP Main Institution(s)
URI Availability Now Expected installations All
archive sites Prerequisites Web server
Tool ADDE Main Institution(s) ? Availability
Now Expected installations Where
necessary Prerequisites ?
Tool OPeNDAP-ESML Server Main Institution(s)
UAH Availability Now (beta test) Expected
installations Data Archives Prerequisites
OPeNDAP Server
24OPeNDAP (DODS) Server Architecture
DODS Client
Data Analysis Application
Data Set Specific DODS Server
Dataset
Internet via HTTP
DODS Lib
Data Set Specific DODS Server
Dataset
Local Dataset
25DODS-ESML Server Architecture
ESML Descriptions
DODS Client
Data Analysis Application
DODS Server
Dataset
ESML
Dataset
Dataset
Internet via HTTP
DODS Lib
Data Set Specific DODS Server
Dataset
Local Dataset
26ADDE
27Metadata Formats and Tools
Tool ESML Desc External Structural Metadata and
tools Main Institution(s) UAH Availability
Now Expected installations Some test sites,
other tools Prerequisites none
Tool THREDDS Desc Data discovery and access
tools Main Institution(s) Unidata Availability
Now Expected installations Data Archives, other
tools Prerequisites none
Tool FGDC Desc Content metadata for geographic
data Main Institution(s) USGS Availability
Now Expected installations Data
Archives Prerequisites none
28Thematic Realtime Environmental Distributed Data
Services
The mission of THREDDS is for students, educators
and researchers to publish, contribute, find, and
interact with data relating to the Earth system
in a convenient, effective, and integrated
fashion. Just as the World Wide Web and
digital-library technologies have simplified the
process of publishing and accessing multimedia
documents, THREDDS is building infrastructure
needed for publishing and accessing scientific
data in a similarly convenient fashion.
29THREDDS Support for Distributed Data Servers
30Distributed THREDDS Catalogs
31Interoperability Problem
DATA FORMAT 3
DATA FORMAT 2
DATA FORMAT 1
FORMAT CONVERTER
READER 1
READER 2
APPLICATION
- Requires specialized code for every format
- Difficult to assimilate new data types
- Makes applications tightly coupled to data
- One possible solution - enforce a Standard Data
Format - Not practical for legacy datasets
32Interchange Technology Solution
DATA FORMAT 1
DATA FORMAT 3
DATA FORMAT 2
ESML FILE 2
ESML FILE 1
ESML FILE 3
ESML LIBRARY
APPLICATION
- ESML (external metadata) files containing the
structural description of the data format - Applications utilize these descriptions to
interpret how to read data files resulting in
data interoperability for applications
33What is ESML?
- It is a specialized markup language for Earth
Science structural metadata based on XML - It is a machine-readable and -interpretable
representation of the structure of any data file,
regardless of data format (machine readable
README) - ESML description files contain external metadata
that can be generated by either data producer or
data consumer (at collection, data set, and/or
granule level) - ESML provides the benefits of a standard,
self-describing data format (like HDF, HDF-EOS,
netCDF, geoTIFF, ) without the cost of data
conversion - ESML is the basis for core Interchange Technology
that allows data/application interoperability
34Catalog Services
Tool THREDDS Main Institution(s)
Unidata Availability Now Expected installations
Data Archives Prerequisites none
Tool MyLEAD Main Institution(s)
IU Availability ? Expected installations
? Prerequisites ?
Tool MCS Main Institution(s) ? Availability
Now Expected installations ? Prerequisites ?
35Thematic Realtime Environmental Distributed Data
Services
The mission of THREDDS is for students, educators
and researchers to publish, contribute, find, and
interact with data relating to the Earth system
in a convenient, effective, and integrated
fashion. Just as the World Wide Web and
digital-library technologies have simplified the
process of publishing and accessing multimedia
documents, THREDDS is building infrastructure
needed for publishing and accessing scientific
data in a similarly convenient fashion.
36myLEAD
myLEAD is an active catalog for scientific
metadata, with specialized facilities for
searching, content storage, data object
cataloging, and active engagement.
37MCS
38Portal Tools
Tool LEAD Portal Desc Web based Workflow
Application Main Institution(s) IU Availability
? Expected installations portal
host Prerequisites none
39LEAD Portal
40Modeling and Analysis Tools
- Assimilation
- Models
- Mining
- Visualization
41Data Assimilation
Tools ADAS Desc Assimiliation
Transformations Main Institution(s)
OU Availability Now Expected installations Data
Modeling sites Prerequisites Domain knowledge
42ARPS Data Analysis System (ADAS)
ADAS, the ARPS Data Analysis System, is a
3-dimensional weather analysis program. This
implementation is using Rapid Update Cycle (RUC)
forecasts from the National Center for
Environmental Prediction (NCEP) as background
fields. Oklahoma Mesonet , surface airways, NOAA
wind profiler , and radiosonde data are used in
the analysis.
43Models
Tool WRF Desc Meteorological Model Main
Institution(s) OU Availability Now Expected
installations Modeling sites Prerequisites none?
44Data Mining
Tools ADaM Desc Data Mining/Image Processing
Toolkit Main Institution(s) UAH Availability
Now Expected installations Data analysis
sites Prerequisites Mining expertise
45ADaM System Overview
- Developed by the Information Technology and
Systems Center at the University of Alabama in
Huntsville - Consists of over 75 interoperable mining and
image processing components - Each component is provided with a C application
programming interface (API), an executable in
support of scripting tools (e.g. Perl, Python,
Tcl, Shell) - ADaM components are lightweight and autonomous,
and have been used successfully in a grid
environment - ADaM has several translation components that
provide data level interoperability with other
mining systems (such as WEKA and Orange), and
point tools (such as libSVM and svmLight) - Components include Python wrappers and web
service interfaces are planned
46Visualization
Tool IDV Desc Web based visualization Main
Institution(s) Unidata Availability
Now Expected installations Client
sites Prerequisites none
Tool OGC Web Map Services Desc Web based
visualization Main Institution(s) Unidata,
UAH Availability Now Expected installations
Data Servers Prerequisites none
47Integrated Data Viewer
The Integrated Data Viewer (IDV) from Unidata is
a Java(TM)-based software framework for analyzing
and visualizing geoscience data. The IDV brings
together the ability to display and work with
satellite imagery, gridded data, surface
observations, balloon soundings, NWS WSR-88D
Level II and Level III RADAR data, and NOAA
National Profiler Network data, all within a
unified interface.
48OGC Web Mapping Services
A Web Map Service (WMS) produces maps of
geo-referenced data. A particular WMS provider
in a distributed WMS network need only be the
steward of its own data collection. This stands
in contrast to vertically-integrated web mapping
sites that gather in one place all of the data to
be made accessible by their own private
interface.