Title: GEON: The Geosciences Network
1GEONThe Geosciences Network The National
Laboratory for Advanced Data Research (NLADR)
- Chaitan Baru
- Division Director, Science RD
- San Diego Supercomputer Center
2Outline
- About SDSC
- Cyberinfrastructure projects
- E.g., TeraGrid, BIRN, SCEC/CME, GEON, SEEK, NEES,
- GEON
- NLADR
3SDSC Organization Chart
Administration Operations
Director (Fran Berman) Exec Director (Vijay
Samalam)
Strategic Partnerships External Relations
User Services Development (Anke Kamrath)
Production Systems (Richard Moore)
TechnologyRD (Vijay Samalam)
ScienceRD (Chaitan Baru)
Advanced Cyberinfrastructure Lab SRB
Lab Networking Research HPC Research Tech Watch
Group
- Consulting
- Training
- Documentation
- User Portals
- Outreach Education
- User Services
- SDSC/Cal-IT2 Synthesis Center
- Data Knowledge Labs
- Science Projects
- Bio-, neuro-, eco-, geo-informatics
- NLADR
Allocated Sys Production Servers Networking
Ops SAN/Storage Ops Servers/Integration Security
Ops TeraGrid Operations
4An emphasis on end-to-end Cyberinfrastructure
(CI)
- Development of broad infrastructure, including
services, not just computational cycles - Referred to as e-science in the UK
- A major emphasis at SDSC on data, information,
and knowledge - Increased focus on
- Strategic applications, and strategic
communities - Training and Outreach, e.g. Summer Institutes
- Community codes, but also data collections,
databases - Researcher-level services, e.g. Linux cluster
management software, ease transition from local
environment to large-scale computing environment
5SDSC and CI Projects
- SDSC is involved in several, NSF and NIH-funded,
community-based CI projects - TeraGrid Providing access to high-End,
national-scale, physical computing infrastructure
- BIRN Biomedical Informatics Research Network,
funded by NIH. Integrating distributed brain
image data - GEON Geosciences Network. Integrating
distributed Earth Sciences data - SCEC/CME Southern California Earthquake
Consortium Community Modeling Environment - SEEK Scientific Environment for Ecological
Knowledge. Integrating distributed biodiversity
data along with tools - OptIPuter Distributed computing environment
using - Lambda Grids
- NEES Network for Earthquake Engineering
Simulation. - Integrating distributed earthquake
simulation and sensor data - ROADNet Realtime Observatories, Applications,
and Data - management Network
- TeraBridge Health Monitoring of Civil
Infrastructure
6The TeraGridHigh-end Grid Infrastructure
PSC
7Typical Characteristics of CI Projects
- Close collaboration between science and IT
researchers - Need to provide data and information management
- Storage management, archiving
- Data modeling, semantic modelingspatial,
temporal, topic, process - Data and Information visualization
- Semantic integration of data
- Logic-based formalisms to represent knowledge and
map between ontologies - as well as high-end computing
- BIRN, SCEC, GEON, TeraBridge all have
allocations on the TeraGrid - Convert community codes into Web/Grid services
- Enable scientists to access much larger computing
capability from local cluster/desktop - Provide support for scientific workflow systems
(visual programming environments for Web
services)
8Biomedical Informatics Research NetworkExample
of a community Grid
PI of BIRN CC Mark Ellisman Co-Is of BIRN CC
Chaitan Baru, Phil Papadopoulos, Amarnath Gupta,
Bertram Ludaescher
9The GEONgrid Another community Grid
Rocky Mountain Testbed
Mid-Atlantic Coast Testbed
www.geongrid.org
10Project Overview
- Close collaboration between geoscientists and IT
to interlink databases and Grid-enable
applications - Deep data modeling of 4D data
- Situating 4D data in contextspatial, temporal,
topic, process - Semantic integration of Geosciences data
- Logic-based formalisms to represent knowledge and
map between ontologies - Grid computing
- Deploy a prototype GEON Grid heterogeneous
networks, compute nodes, storage capabilities.
Enable sharing of data, tools, expertise. Specify
and execute workflows - Interaction environments
- Information visualization. Visualization of
concept maps - Remote data visualization via high-speed networks
- Augmented reality in the field
- Linkage to BIRN
11Funding Sources
- National Science Foundation ITR Project,
2002-2007, 11.6M - Also, 900K for Chronos, 1M for CUAHSI-HIS (NSF)
- Partners
- California Institute for Telecommunications and
Information Technology, Cal-(IT)2 - Chronos
- CUAHSI-HIS
- ESRI
- Geological Survey of Canada
- Georeference Online
- HP
- IBM
- IRIS
- Kansas Geological Survey
- Lawrence Livermore National Laboratory
- NASA Goddard, Earth System Division
- Southern California Earthquake Consortium (SCEC)
- U.S. Geological Survey (USGS)
- Affiliated Project
- EarthScope
- PI Institutions
- Arizona State University
- Bryn Mawr College
- Penn State University
- Rice University
- San Diego State University
- San Diego Supercomputer Center/UCSD
- University of Arizona
- University of Idaho
- University of Missouri, Columbia
- University of Texas at El Paso
- University of Utah
- Virginia Tech
- UNAVCO
- Digital Library for Earth System
- Education (DLESE)
12Science Drivers (1)DYSCERNDYnamics, Structure,
and Cenozoic Evolution of the Rocky Mountains)
- Rocky Mountain region is at apex of a broad
dynamic orogenic plateau between stable interior
of North America and the active plate margin
along the west coast. - For the past 1.8 billion years, the region has
been the focus of repeated tectonic activity - has experienced complex intra-plate deformation
for the past 300 million years. - The deformation processes involved are the
subject of considerable debate - GEON is undertaking an ambitious project to map
the lithospheric structure in the Rocky Mountain
region in a highly integrated analysis and input
the result into a 3-D geodynamic model - to elucidate our understanding of the Cenozoic
evolution of this region.
13Science Drivers (2)CREATOR Crustal
EvolutionAnatomy of an Orogen
- The Appalachian Orogen is a continental scale
mountain belt that provides a geologic template
to examine the growth and break up of continents
through plate tectonic processes. The record
spans a period in excess of 1000 million years. - Focus on developing an integrated view of
collisional processes represented by
Siluro-Devonian Acadian Orogeny. Integration
scenarios will require IT-based solutions,
including design of ontologies and new tools - Research activities include
- Organization of geologic and petrologic database
for the mid-Atlantic test bed - Development of an ontologic framework to
facilitate web based analysis of data. - Registration of geologic and terrane maps, and
data for igneous rocks - Application of data mining techniques for
discovering similarities in geologic databases - Design of workflow for Web-based navigation and
analysis of maps and igneous rock databases - Development of Web services for mineral and rock
classification, including use of SVG-based
graphics
14(No Transcript)
15GEONgrid Service Layers
Portal (login, myGEON)
GeonSearch
GeoWorkbench
Workflow Services
Registration Services
Data Mediation Services
Indexing Services
Visualization Mapping Services
Core Grid Services Authentication, monitoring,
scheduling, catalog, data transfer, replication,
collection management, databases
Physical Grid RedHat Linux, ROCKS, Internet, I2,
OptIPuter
16GEON Workbench Registration
- Uploadable
- OWL ontologies
- OWL inter-ontology mappings (articulations)
- Data sets (shape files)
- Semantic Registration
- Link data set D with ontology O1 (w/
instance-based heuristic) - Query D using ontology O2
- (e.g. rock classification O1 GSC, O2BGS)
- Ontology-Enabled Application
17A Multi-Hierarchical Rock Classification
Ontology (GSC)
Genesis
Fabric
Composition
Kai Lin, SDSC Boyan Brodaric, GSC
Texture
18Geology Workbench Uploading Ontologies
19Geology Workbench Data RegistrationChoose
Ontology Class
20Geology Workbench Data RegistrationStep 2 Map
data to selected ontology
AREA
PERIMETER
AZ_1000
AZ_1000_ID
GEO
PERIOD
ABBREV
DESCR
D_SYMBOL
P_SYMBOL
21Geology Workbench Data RegistrationStep 3
Resolve mismatches
22Geology Workbench Ontology-enabled Map Integrator
23Geology Workbench Change Ontology
24GEON Ontology Development Workshops
- Workshop format
- Led by GEON PIs
- Involves small group of domain experts from
community - Participation by a few IT experts in data
modeling and knowledge representation - Igneous Petrology, led by Prof. Krishna Sinha,
VaTech, 2003 - Seismology, led by Prof. Randy Keller, UT El
Paso, Feb 24-25, 2004 - Aqueous Geochemistry, led by Dr. William
Glassley, Livermore Labs, March 2-3, 2004 - Structural Geology, led by Prof. John Oldow,
Univ. of Idaho, 2004 - Metamorphic Petrology, led by Prof. Maria
Crawford, Bryn Mawr, under planning - Chronos and CUAHSI are planning ontology efforts
- Also, on-going ontology work in SCEC
- Discussion with Steve Bratt, COO, W3C
25Community-Based Ontology Development
- Draft of an aqueous geochemistry ontology
developed by scientists
Bill Glassley (LLNL), Bertram Ludaescher, Kai
Lin (SDSC), et al
26Levels of Knowledge Representation
- Controlled vocabularies
- Database schema (relational, XML, )
- Conceptual schema (ER, UML, )
- Thesauri (synonyms, broader term/narrower term)
- Taxonomies
- Informal/semi-formal representations
- Concept spaces, concept maps
- Labeled graphs / semantic networks (RDF)
- Formal ontologies, e.g., in Description Logic
(OWL) - formalization of a specification
- constrains possible interpretation of terms
27Use of Knowledge Structures
- Conceptual models of a domain or application,
(communication means, system design, ) - Classification of
- concepts (taxonomy) and
- data/object instances through classes
- Analysis of ontologies e.g.
- Graph queries (reachability, path queries, )
- Reasoning (concept subsumption, consistency
checking, ) - Targets for semantic data registration
- Conceptual indexes and views for
- searching,
- browsing,
- querying, and
- integration of registered data
28Example of a Large Data ProblemRamon Arrowsmith,
Chris CrosbyArizona State University
- E.g. manipulation, analysis and use of LIDAR
(LIght Detection And Ranging) data
Ramon Arrowsmith, Chris Crosby, ASU
29LIght Detection And Ranging
- Airborne scanning laser rangefinder
- Differential GPS
- Inertial Navigation System
- 30,000 points per second at 15 cm accuracy
-
- 4001000/mi2, 106 points/mi2, or
0.040.1 cents/point - Extensive filtering to remove tree canopy
(virtual defor-estation)
Figure from R. Haugerud, U.S.G.S -
http//duff.geology.washington.edu/data/raster/lid
ar/About_LIDAR.html
Ramon Arrowsmith, Chris Crosby, ASU
30Northern San Andreas LIDAR fault geomorphology
Ramon Arrowsmith, Chris Crosby, ASU
Full Feature DEM
Bare Earth DEM
31Processing LiDAR data the problems
- Huge datasets
- 1 GB of point return (.txt) data
- 150 MB of point return (.txt) data
- 5.5 MB after filtering for ground returns
Fort Ross, CA 7.5 min quad
- How do we grid these data?
- ArcGIS cant handle it
- Expensive commercial software not an option for
most data consumers
Ramon Arrowsmith, Chris Crosby, ASU
32GRASS as a processing tool for LiDAR
- GRASS Open source GIS
- Interpolation commands designed for large data
sets - Splines use local pt density to segment data into
rectangular areas for interpolation - Can control spline tension and smoothness
- Modular configuration could easily be implemented
within the GEON work flow - E.g. User uploads point data to remote site
where GRASS interpolation module runs on super
computer and returns user a raster file. - Host the large LIDAR data sets on GEON Data Node
at SDSC, with access to large cluster computers
Ramon Arrowsmith, Chris Crosby, ASU
33Accessing data from more than one information
sourceFederated Metadata Query
Metadata Query
GSID a la LSID (Life Sciences Identifiers)
- Metadata Querying Middleware
- Search API
- Result format (XML, URIs)
Query Result Wrappers (return URIs)
SRB
Grid Metadata Catalog
DLESE
THREDDS
IRIS
Geography Network
34Federated GSID-based Data Access
GSID-based request
gsidsrb. gsidodbc.
- Data Access Middleware
- Map URIs to local access protocols
ArcXML
http
SRB
OpenDAP
ftp
ODBC JDBC
scp
GML
GridFTP
Data
Item-level Metadata
Collection- level Metadata
35iGEON International Cooperation Experiences
to date
- Canada
- Geological Society of Canada (Ottawa, Vancouver)
Dr. Boyan Brodaric is one of the original team
members of GEON. - Contributing important data sets by setting up a
WMS (Web Mapping Services) server at WestGrid in
Vancouver, BC. - 1Gbps link from Vancouver to GEON portal node at
SDSC - China
- Computational Geodynamics Lab will host a GEON
PoP node for iGEON in China - Australia
- Interactions between GEON and EON (Earth and
Ocean Network) - Work with Dietmar Mueller to help run mantel
convection codes on Linux clusters and provide as
a Web service in GEON - Russia, Kyrgyztan
- Held discussion with scientists from Russian
Academy on data integration and use of Grid
computing for geodynamics codes
36International Cooperation Planned
- Australia
- Collaboration planned with ACCESS
(www.access.edu.au), Australian computational
earth systems simulator. Install a GEON node. - Mexico
- Meeting planned between CICESE earth scientists
and GEON re. connectivity into Mexico - Japan
- Sending invitation to Earth Simulator
visualization group to attend GEON Visualization
workshop. - UK
- Visit to UK e-Science Center June 28/29, 2004
- Targeted ?
- iGEON in Asia-Pacific could collaborate with the
PRAGMA effort (Peter Arzberger, PI) - GEON will participate in next PRAGMA meeting as
one of the featured applications
37Opportunities
- Define common standards, e.g.
- Global Geosciences Identifiers (URI)
- Ontologies (Semantic Web standards)
- Web services definitions, and other standards
- Work towards linking GEON with other related
efforts - Travel funds for travel to each others science
and IT workshops and individual meetings - Sabbatical, training visits
- Share computing capabilities for GeoScience
applications - Technologies for 3D and 4D visualizations,
on-demand computing,
38FYI
- Cyberinfrastructure Summer Institute for the
Geosciences - August 16-20, 2004, San Diego
- See
- www.geongrid.org/summerinstitute
- for more information
39National Laboratory for Advanced Data Research
(NLADR)An SDSC/NCSA Data Collaboration
- Co-Directors
- Chaitan Baru, Data and Knowledge System (DAKS)
SDSC - Michael Welge, Automated Learning Group (ALG)
- NCSA
40NLADR Vision
- Collaborative RD activity between NCSA
(Illinois) and SDSC in advanced data technologies - guided by real applications from science
communities - to develop broad data architecture framework
- within which to develop, deploy, and test
data-related technologies - in the context of a national-scale physical
infrastructure (Internet-D)
41NLADR Focus
- Solving the data needs of real applications
- Initially focused on some Geoscience applications
(GEON, LEAD) - Also, looking into environmental science
applications (LTER, NEON, CLEANER) - NLADR Fellows programenable postdocs, faculty,
staff from domain sciences to partner with NLADR
staff
42Core Activities
- Internet-D Fielding a distributed, data testbed
- Core technologies and reference implementations
of data cyberinfrastructure - Standards activities
- Evaluation usability and performance
43Internet-D
- Distributed data testbed
- Initially, within networked environment between
SDSC and NCSA. - Open to community
- for testing new data management and data mining
approaches, protocols, middleware, and
technologies. - A minimum configuration will include
- Distributed infrastructure, e.g. cluster systems
at each end-pointwith maximum memory and
adequate disk capability, high-speed network
connectivity across the end points. - High-end configuration
- Prototype environment to represent very-high end,
extreme capability. - Provide highest possible end-to-end bandwidth,
from disk-to-disk - Very large main memory and very large disk arrays.
44NLADR Core Technologies
- Core data services
- Caching, replication, prefetching, multiple
transfer streams - Integration of distributed data
- Integrate independently-created, distributed,
heterogeneous databases - Mining complex data
- Data mining of distributed, complex scientific
data, including exploratory analysis and
visualization - Long-term Data Preservation
- Developing tools to preserve data for long
periods of time
45NLADR Evaluation Activities
- Data Grid benchmarking efforts
- Functionality and performance
- In multi-user, concurrent access environments
- Online, on-demand
- Evaluate parallel filesystems, parallel database
systems - Develop data experts for various modalities of
data - Investigate and characterize architectures,
capabilities for long term preservation
46Joining NLADR
- No formal process yet
- Contact me (baru_at_sdsc.edu), if interested
- Should be willing to contribute one or more of
- Interesting applications
- People time, to work on NLADR objectives
- Infrastructure (servers, storage, networking)
towards Internet-D
47Thank You!
- Visit www.geongrid.org
- Stay tuned for www.nladr.net
- My email baru_at_sdsc.edu