Title: AIST NRA02 Presentation to SSO
1Extensions of Grid Technology for Applications in
Earth Science The Geospatial Grid
Liping Di ldi_at_gmu.edu
Laboratory for Advanced Information Technology
and Standards George Mason University
July 20, 2005 ESMF on Grid Workshop
2Introduction
- Geospatial data is the major type of data that
human beings has collected. - more than 80 of the data are geospatial data.
- Image/gridded data is dominant form of geospatial
data in terms of volume. - Most of those data are collected by the EO
community. - Geospatial data will grow to exabyte very soon.
- NASA EOSDIS has more than one petabyte of data in
archives more than 2 terabytes per day of new
data are added. - Application data centers 10s of terabytes of
imagery - Tens of thousands of datasets on-line now.
- How to effectively, wisely, and easily use the
geospatial data is the key information technology
issue that we have to solve.
3The Grid Technology
- The Grid technology is developed for securely
sharing computational resources within an virtual
organization. - Computer CPU cycles
- Storage
- Networks
- Data, Information, algorithms, software,
services. - It was originally motivated and supported from
sciences and engineering requiring high-end
computing, for sharing geographically distributed
high-end computing resources. - The core of the technology is the the open source
middleware called Globus Toolkit. - The latest version of Globus is version 4.0 which
implements the Open Grid Service Architecture
(OGSA) and converged with Web services technology.
4Why Grid Is Useful to the Earth Science Community?
- Earth science community is one of the key
communities for collecting, managing, processing,
archiving and distribution geospatial data and
information. - Most of Earth science data are collecting through
Earth observation (EO) via satellite remote
sensing. - Because of the large volumes of EO data and
geographically scattered receiving and processing
facilities, the EO data and associated
computational resources are naturally
distributed. - The multi-disciplinary nature of Earth science
research and applications requires the integrated
analysis of huge volume of multi-source data from
multiple data centers. This requires sharing of
both data and computing powers among data
centers. - Therefore, Grid is an ideal technology for Earth
science community.
5Why Needs the Geospatial Extensions of Grid
- Geospatial data and information are significantly
different from those in other disciplines. - Very complex and diverse.
- Formats, projection, resolutions.
- Hyper-dimensions spatial, temporal, spectral,
thematic. - Raster vs. vectors
- Large data volume
- more than 80 of data human beings has collected
is spatial data. - The geospatial community has developed a set of
standards specifically for geospatial data and
information that users have been familiar with.
(e.g., OGC, ISO, FGDC). - Grid technology is developed for general sharing
of computational resources and not aware of the
specialty of geospatial data. - In order to make Grid technology applicable to
geospatial data, we have to do the geospatial
domain-specific extensions.
6Geospatial Grids
- Geospatial Grids are the extensions and
domain-specific applications of the fundamental
Grid technology in the geospatial discipline. - Geospatial Grids include both geospatial data
Grids and geospatial computational Grids. - Geospatial data Grids emphasize data access and
information services on large, distributed
geospatial data archives. - Geospatial computational Grids are mainly for
coordinating computational resources for
large-scale geospatial modeling and applications
such as climate modeling. - A geospatial Grid could be combination of both.
7 Objectives of GMU Geospatial Grid Project
- Making NASA EOSDIS data easily accessible to
Earth science modeling and applications
communities by combining the advantages of both
OGC and Grid technology - Develop the geospatial extensions of Grid
technology to make it geospatially enabled
(Geospatial Grid). - Enable OGC geospatial clients access Grid-managed
distributed geospatial resources. - Provide virtual/intelligent geospatial products
in the Grid environment. - Test methods for automating the process from
geospatial data to knowledge - Demonstrate the geospatial Grid technology in
realistic NASA EOS data environment. - Contribute technology, software, and the data
pool application to the CEOS Grid testbed. - It is both a geospatial data grid and a
geospatial computing grid.
8The OGC Web Service Specifications
- The Web Coverage Services (WCS) specification
defines the standard interfaces between web-based
clients and servers for accessing coverage data. - All imagery type of remote sensing data is
coverage data. - The Web Feature Services (WFS) specification
defines the standard interfaces between web-based
clients and servers for accessing feature-based
geospatial data. - vector and point data are feature data.
- The Web Map Services (WMS) specification define
the standard interfaces for accessing and
assembling maps from multiple servers. - visualization of geospatial data
- The Catalog Services for Web (CSW) specification
defines the interfaces between web-based clients
and servers for finding the required data or
services from registries. - WCS, WFS, CSW, and WMS form the foundation for
the interoperable geospatial data access and
service environment
9Areas of Extensions
- Internally in the Grid, it have to be spatially
aware. - Extend Globus toolkit to handle the spatial,
spectral, temporal, thematic based spatial data
and information management. - Develop enough Grid-enable tools for geospatial
data handling/services. - Must provide data/information access and services
interfaces that are standard in the geospatial
community. - The Open GIS Consortiums Web Data Access/Service
interfaces (e.g., OGC WCS, WMS, WFS, and CSW).
10Virtual Geospatial Datasets
- A virtual dataset is a dataset that
- not exist in a data and information system
- The system knows how to create it on-demand.
- A virtual dataset, once created, can be kept for
fulfilling the same request from next users. - The client/data user will not know the difference
between a real dataset and a virtual dataset. - A virtual dataset can be produced (materialized)
by - running a program dedicated to the production of
the virtual dataset (dedicated program approach). - running a series of service modules, each one
takes care of a small step of the materialization
of the virtual dataset (service approach).
11The Service Approach to Virtual Datasets
- A service is defined as self-contained,
self-describing, modular applications that can be
published, located, and dynamically invoked
across a network. - It performs functions, which can be anything from
simple requests to complicated business
processes. - Once a service is deployed, other applications
(and other services) can discover and invoke the
deployed service. - A service can be implemented in the Web
environment, called a web service, or in the Grid
environment, called a Grid service. - Standards on service discovery, declaration,
binding, and invocation allow dynamically
chaining individual services across a network
together to fulfill a complex task. - A virtual dataset, in the service environment,
basically is a service chain that describes steps
to be taken to produce the virtual dataset. - With enough elementary service models, it is
possible to provide unlimited numbers of virtual
datasets by just creating the service chains.
12Geo-object, Geo-tree, Virtual Dataset, Geospatial
Models
modeling and virtual data services
no service
data service
User Requested
User Obtained
archived geo-object
user geo-object
Geospatial web/Grid services
Intermediate geo-object
Automated data transformation service(WCS/WFS)
13User Creation of Geospatial Models
- A user-requested products maybe not exist both
virtually and no virtually. - If the user knows the thought process to create
the data products from lower-level inputs
step-by-step (the logical geospatial modeling) - With help of a good user interface and the
availability of service modules and
models/submodels, the user can construct a
geospatial model/virtual data product
interactively. - The system then can produce the virtual data
product for the user. - The user-created model can be incorporated into
the system as a part of the virtual datasets the
system can provide. - This allows the system to grow capabilities with
time. - Advantages
- allows users to obtain the ready-to-use
scientific information instead of the raw data,
significantly reducing the data traffic between
the users and the geospatial Grid. - allows users to explore huge resources available
at a data Grid and to conduct tasks that they
never be able to conduct before.
14Current Status
- We have fulfilled all objectives of the project
except for the users-defined customizable virtual
geospatial products. - Currently the major work is concentrated on this
area. - A realistic testbed that simulates NASA EOS data
environment has been created. - Grid-enabled OGC web services software have been
developed. - Operational geospatial data services through Grid
is available.
15Grid Security (GSI) and VO Setup
GMU (Solaris) (laits.gmu.edu) Globus 3.2, 3.9
with CEOS Certs.
NASA SGT (Linux) (arao2.sgt-inc.com) Globus 3.2
with CEOS Certs.
GMU CA center
GMU (Linux) (llinux.laits.gmu.edu) Globus 2.2
with Laits Certs.
NASA (Linux) (former.intl-interfaces.net) Globus
3.0 with CEOS Certs.
CEOS VO
GMU (Mac) (geobrain.laits.gmu.edu) Globus 3.2
ESG CA center
IPG CA center
LLNL esg2 (Linux) (esg2.llnl.gov) Globus 3.2 with
ESG Certs.
Ames ipg05 (Linux) (ipg05.ipg.nasa.gov) Globus
3.2 with IPG Certs.
GMU (Linux) (data.laits.gmu.edu) Globus 3.2, 3.9
with Laits Certs.
LLNL ESG VO
GMU LAITS VO
NASA IPG VO
Authentication among different VO
16Geospatial Grid Software
- Software for Geospatial data grid software
- GCSW and portal-The Grid-enabled catalog services
for both geospatial data and services. - GWCS and portal-The Grid-enabled web coverage
services for providing data access to raster data - GWMS and portal-The Grid-enabled web map services
for providing access to maps (visualization of
data) - iGSMIntelligent Grid Service Mediator
coordinate the resource to fulfill the data
access requests. - Software for Geospatial computational grid and
virtual data products - Grid Geospatial Processing ServicesMultiple
geospatial data handling and processing functions
worked as individual grid services - The building blocks for geospatial processing
models/workflows - Converting GRASS software into Grid-enabled web
services - Grid-enabled workflow engine (BPELPower)
- Executing the workflow at geospatial grid
environment to materialize the virtual geospatial
products.
17 Geospatial Data Grid with GCSW/GWCS/GWMS/iGSM/ROS/
DTS
User/Client Interface (Web Download MPGC)
2
2
1
WMS Portal
WCS Portal
CSW Portal
Laits (3)
Ames
GWCS
iGSM
LLNL
GCSW
GWMS
ROS
DTS
HDF-EOS Data
Geospatial Catalog DB
MDS
Replica DB
18A Data Request Scenario at GMU geospatial Grid
19Lessons learned
- Use of Grid technology in Earth science is not an
easy job. It needs significant time and
resources. - Grid software
- Globus Toolkit is not very user friendly
- Significant learning curve.
- Multiple versions
- Platform supports are not complete
- you may need to compile the executables from
source codes - Security issues
- Firewells
- Specific ports
- Certificate of Authentications (CA).
- Organizations
- needed dedicated persons to coordinate the
sharing of resources
20Acknowledgement
- The project team includes Prof. Liping Di (PI,
GMU), Dr. Piyush Mehrotra (Co-I, NASA Ames), Dr.
Dean Williams (Co-I, DOE LLNL), Dr. Chaumin Hu
(NASA Ames), Dr. Aijun Chen (implementation
lead, GMU), Dr. Yuqi Bai (GMU), Mr. Yang Liu
(GMU), Yaxing Wei (GMU). - The project is funded by NASA Advanced
Information System Technology program (AIST).