GEON: The Geosciences Network - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

GEON: The Geosciences Network

Description:

GEON: The Geosciences Network & The National Laboratory for Advanced Data Research (NLADR) Chaitan Baru Division Director, Science R&D San Diego Supercomputer Center – PowerPoint PPT presentation

Number of Views:338
Avg rating:3.0/5.0
Slides: 48
Provided by: chaita6
Category:

less

Transcript and Presenter's Notes

Title: GEON: The Geosciences Network


1
GEONThe Geosciences Network The National
Laboratory for Advanced Data Research (NLADR)
  • Chaitan Baru
  • Division Director, Science RD
  • San Diego Supercomputer Center

2
Outline
  • About SDSC
  • Cyberinfrastructure projects
  • E.g., TeraGrid, BIRN, SCEC/CME, GEON, SEEK, NEES,
  • GEON
  • NLADR

3
SDSC Organization Chart
Administration Operations
Director (Fran Berman) Exec Director (Vijay
Samalam)
Strategic Partnerships External Relations
User Services Development (Anke Kamrath)
Production Systems (Richard Moore)
TechnologyRD (Vijay Samalam)
ScienceRD (Chaitan Baru)
Advanced Cyberinfrastructure Lab SRB
Lab Networking Research HPC Research Tech Watch
Group
  • Consulting
  • Training
  • Documentation
  • User Portals
  • Outreach Education
  • User Services
  • SDSC/Cal-IT2 Synthesis Center
  • Data Knowledge Labs
  • Science Projects
  • Bio-, neuro-, eco-, geo-informatics
  • NLADR

Allocated Sys Production Servers Networking
Ops SAN/Storage Ops Servers/Integration Security
Ops TeraGrid Operations
4
An emphasis on end-to-end Cyberinfrastructure
(CI)
  • Development of broad infrastructure, including
    services, not just computational cycles
  • Referred to as e-science in the UK
  • A major emphasis at SDSC on data, information,
    and knowledge
  • Increased focus on
  • Strategic applications, and strategic
    communities
  • Training and Outreach, e.g. Summer Institutes
  • Community codes, but also data collections,
    databases
  • Researcher-level services, e.g. Linux cluster
    management software, ease transition from local
    environment to large-scale computing environment

5
SDSC and CI Projects
  • SDSC is involved in several, NSF and NIH-funded,
    community-based CI projects
  • TeraGrid Providing access to high-End,
    national-scale, physical computing infrastructure
  • BIRN Biomedical Informatics Research Network,
    funded by NIH. Integrating distributed brain
    image data
  • GEON Geosciences Network. Integrating
    distributed Earth Sciences data
  • SCEC/CME Southern California Earthquake
    Consortium Community Modeling Environment
  • SEEK Scientific Environment for Ecological
    Knowledge. Integrating distributed biodiversity
    data along with tools
  • OptIPuter Distributed computing environment
    using
  • Lambda Grids
  • NEES Network for Earthquake Engineering
    Simulation.
  • Integrating distributed earthquake
    simulation and sensor data
  • ROADNet Realtime Observatories, Applications,
    and Data
  • management Network
  • TeraBridge Health Monitoring of Civil
    Infrastructure

6
The TeraGridHigh-end Grid Infrastructure
PSC
7
Typical Characteristics of CI Projects
  • Close collaboration between science and IT
    researchers
  • Need to provide data and information management
  • Storage management, archiving
  • Data modeling, semantic modelingspatial,
    temporal, topic, process
  • Data and Information visualization
  • Semantic integration of data
  • Logic-based formalisms to represent knowledge and
    map between ontologies
  • as well as high-end computing
  • BIRN, SCEC, GEON, TeraBridge all have
    allocations on the TeraGrid
  • Convert community codes into Web/Grid services
  • Enable scientists to access much larger computing
    capability from local cluster/desktop
  • Provide support for scientific workflow systems
    (visual programming environments for Web
    services)

8
Biomedical Informatics Research NetworkExample
of a community Grid
PI of BIRN CC Mark Ellisman Co-Is of BIRN CC
Chaitan Baru, Phil Papadopoulos, Amarnath Gupta,
Bertram Ludaescher
9
The GEONgrid Another community Grid
Rocky Mountain Testbed
Mid-Atlantic Coast Testbed
www.geongrid.org
10
Project Overview
  • Close collaboration between geoscientists and IT
    to interlink databases and Grid-enable
    applications
  • Deep data modeling of 4D data
  • Situating 4D data in contextspatial, temporal,
    topic, process
  • Semantic integration of Geosciences data
  • Logic-based formalisms to represent knowledge and
    map between ontologies
  • Grid computing
  • Deploy a prototype GEON Grid heterogeneous
    networks, compute nodes, storage capabilities.
    Enable sharing of data, tools, expertise. Specify
    and execute workflows
  • Interaction environments
  • Information visualization. Visualization of
    concept maps
  • Remote data visualization via high-speed networks
  • Augmented reality in the field
  • Linkage to BIRN

11
Funding Sources
  • National Science Foundation ITR Project,
    2002-2007, 11.6M
  • Also, 900K for Chronos, 1M for CUAHSI-HIS (NSF)
  • Partners
  • California Institute for Telecommunications and
    Information Technology, Cal-(IT)2
  • Chronos
  • CUAHSI-HIS
  • ESRI
  • Geological Survey of Canada
  • Georeference Online
  • HP
  • IBM
  • IRIS
  • Kansas Geological Survey
  • Lawrence Livermore National Laboratory
  • NASA Goddard, Earth System Division
  • Southern California Earthquake Consortium (SCEC)
  • U.S. Geological Survey (USGS)
  • Affiliated Project
  • EarthScope
  • PI Institutions
  • Arizona State University
  • Bryn Mawr College
  • Penn State University
  • Rice University
  • San Diego State University
  • San Diego Supercomputer Center/UCSD
  • University of Arizona
  • University of Idaho
  • University of Missouri, Columbia
  • University of Texas at El Paso
  • University of Utah
  • Virginia Tech
  • UNAVCO
  • Digital Library for Earth System
  • Education (DLESE)

12
Science Drivers (1)DYSCERNDYnamics, Structure,
and Cenozoic Evolution of the Rocky Mountains)
  • Rocky Mountain region is at apex of a broad
    dynamic orogenic plateau between stable interior
    of North America and the active plate margin
    along the west coast.
  • For the past 1.8 billion years, the region has
    been the focus of repeated tectonic activity
  • has experienced complex intra-plate deformation
    for the past 300 million years.
  • The deformation processes involved are the
    subject of considerable debate
  • GEON is undertaking an ambitious project to map
    the lithospheric structure in the Rocky Mountain
    region in a highly integrated analysis and input
    the result into a 3-D geodynamic model
  • to elucidate our understanding of the Cenozoic
    evolution of this region.

13
Science Drivers (2)CREATOR Crustal
EvolutionAnatomy of an Orogen
  • The Appalachian Orogen is a continental scale
    mountain belt that provides a geologic template
    to examine the growth and break up of continents
    through plate tectonic processes. The record
    spans a period in excess of 1000 million years.
  • Focus on developing an integrated view of
    collisional processes represented by
    Siluro-Devonian Acadian Orogeny. Integration
    scenarios will require IT-based solutions,
    including design of ontologies and new tools
  • Research activities include
  • Organization of geologic and petrologic database
    for the mid-Atlantic test bed
  • Development of an ontologic framework to
    facilitate web based analysis of data.
  • Registration of geologic and terrane maps, and
    data for igneous rocks
  • Application of data mining techniques for
    discovering similarities in geologic databases
  • Design of workflow for Web-based navigation and
    analysis of maps and igneous rock databases
  • Development of Web services for mineral and rock
    classification, including use of SVG-based
    graphics

14
(No Transcript)
15
GEONgrid Service Layers
Portal (login, myGEON)
GeonSearch
GeoWorkbench
Workflow Services
Registration Services
Data Mediation Services
Indexing Services
Visualization Mapping Services
Core Grid Services Authentication, monitoring,
scheduling, catalog, data transfer, replication,
collection management, databases
Physical Grid RedHat Linux, ROCKS, Internet, I2,
OptIPuter
16
GEON Workbench Registration
  • Uploadable
  • OWL ontologies
  • OWL inter-ontology mappings (articulations)
  • Data sets (shape files)
  • Semantic Registration
  • Link data set D with ontology O1 (w/
    instance-based heuristic)
  • Query D using ontology O2
  • (e.g. rock classification O1 GSC, O2BGS)
  • Ontology-Enabled Application

17
A Multi-Hierarchical Rock Classification
Ontology (GSC)
Genesis
Fabric
Composition
Kai Lin, SDSC Boyan Brodaric, GSC
Texture
18
Geology Workbench Uploading Ontologies
19
Geology Workbench Data RegistrationChoose
Ontology Class
20
Geology Workbench Data RegistrationStep 2 Map
data to selected ontology
AREA
PERIMETER
AZ_1000
AZ_1000_ID
GEO
PERIOD
ABBREV
DESCR
D_SYMBOL
P_SYMBOL
21
Geology Workbench Data RegistrationStep 3
Resolve mismatches
22
Geology Workbench Ontology-enabled Map Integrator
23
Geology Workbench Change Ontology
24
GEON Ontology Development Workshops
  • Workshop format
  • Led by GEON PIs
  • Involves small group of domain experts from
    community
  • Participation by a few IT experts in data
    modeling and knowledge representation
  • Igneous Petrology, led by Prof. Krishna Sinha,
    VaTech, 2003
  • Seismology, led by Prof. Randy Keller, UT El
    Paso, Feb 24-25, 2004
  • Aqueous Geochemistry, led by Dr. William
    Glassley, Livermore Labs, March 2-3, 2004
  • Structural Geology, led by Prof. John Oldow,
    Univ. of Idaho, 2004
  • Metamorphic Petrology, led by Prof. Maria
    Crawford, Bryn Mawr, under planning
  • Chronos and CUAHSI are planning ontology efforts
  • Also, on-going ontology work in SCEC
  • Discussion with Steve Bratt, COO, W3C

25
Community-Based Ontology Development
  • Draft of an aqueous geochemistry ontology
    developed by scientists

Bill Glassley (LLNL), Bertram Ludaescher, Kai
Lin (SDSC), et al
26
Levels of Knowledge Representation
  • Controlled vocabularies
  • Database schema (relational, XML, )
  • Conceptual schema (ER, UML, )
  • Thesauri (synonyms, broader term/narrower term)
  • Taxonomies
  • Informal/semi-formal representations
  • Concept spaces, concept maps
  • Labeled graphs / semantic networks (RDF)
  • Formal ontologies, e.g., in Description Logic
    (OWL)
  • formalization of a specification
  • constrains possible interpretation of terms

27
Use of Knowledge Structures
  • Conceptual models of a domain or application,
    (communication means, system design, )
  • Classification of
  • concepts (taxonomy) and
  • data/object instances through classes
  • Analysis of ontologies e.g.
  • Graph queries (reachability, path queries, )
  • Reasoning (concept subsumption, consistency
    checking, )
  • Targets for semantic data registration
  • Conceptual indexes and views for
  • searching,
  • browsing,
  • querying, and
  • integration of registered data

28
Example of a Large Data ProblemRamon Arrowsmith,
Chris CrosbyArizona State University
  • E.g. manipulation, analysis and use of LIDAR
    (LIght Detection And Ranging) data

Ramon Arrowsmith, Chris Crosby, ASU
29
LIght Detection And Ranging
  • Airborne scanning laser rangefinder
  • Differential GPS
  • Inertial Navigation System
  • 30,000 points per second at 15 cm accuracy
  • 4001000/mi2, 106 points/mi2, or
    0.040.1 cents/point
  • Extensive filtering to remove tree canopy
    (virtual defor-estation)

Figure from R. Haugerud, U.S.G.S -
http//duff.geology.washington.edu/data/raster/lid
ar/About_LIDAR.html
Ramon Arrowsmith, Chris Crosby, ASU
30
Northern San Andreas LIDAR fault geomorphology
Ramon Arrowsmith, Chris Crosby, ASU
Full Feature DEM
Bare Earth DEM
31
Processing LiDAR data the problems
  • Huge datasets
  • 1 GB of point return (.txt) data
  • 150 MB of point return (.txt) data
  • 5.5 MB after filtering for ground returns

Fort Ross, CA 7.5 min quad
  • How do we grid these data?
  • ArcGIS cant handle it
  • Expensive commercial software not an option for
    most data consumers

Ramon Arrowsmith, Chris Crosby, ASU
32
GRASS as a processing tool for LiDAR
  • GRASS Open source GIS
  • Interpolation commands designed for large data
    sets
  • Splines use local pt density to segment data into
    rectangular areas for interpolation
  • Can control spline tension and smoothness
  • Modular configuration could easily be implemented
    within the GEON work flow
  • E.g. User uploads point data to remote site
    where GRASS interpolation module runs on super
    computer and returns user a raster file.
  • Host the large LIDAR data sets on GEON Data Node
    at SDSC, with access to large cluster computers

Ramon Arrowsmith, Chris Crosby, ASU
33
Accessing data from more than one information
sourceFederated Metadata Query
Metadata Query
GSID a la LSID (Life Sciences Identifiers)
  • Metadata Querying Middleware
  • Search API
  • Result format (XML, URIs)

Query Result Wrappers (return URIs)
SRB
Grid Metadata Catalog
DLESE
THREDDS
IRIS
Geography Network
34
Federated GSID-based Data Access
GSID-based request
gsidsrb. gsidodbc.
  • Data Access Middleware
  • Map URIs to local access protocols

ArcXML
http
SRB
OpenDAP
ftp
ODBC JDBC
scp
GML
GridFTP
Data
Item-level Metadata
Collection- level Metadata
35
iGEON International Cooperation Experiences
to date
  • Canada
  • Geological Society of Canada (Ottawa, Vancouver)
    Dr. Boyan Brodaric is one of the original team
    members of GEON.
  • Contributing important data sets by setting up a
    WMS (Web Mapping Services) server at WestGrid in
    Vancouver, BC.
  • 1Gbps link from Vancouver to GEON portal node at
    SDSC
  • China
  • Computational Geodynamics Lab will host a GEON
    PoP node for iGEON in China
  • Australia
  • Interactions between GEON and EON (Earth and
    Ocean Network)
  • Work with Dietmar Mueller to help run mantel
    convection codes on Linux clusters and provide as
    a Web service in GEON
  • Russia, Kyrgyztan
  • Held discussion with scientists from Russian
    Academy on data integration and use of Grid
    computing for geodynamics codes

36
International Cooperation Planned
  • Australia
  • Collaboration planned with ACCESS
    (www.access.edu.au), Australian computational
    earth systems simulator. Install a GEON node.
  • Mexico
  • Meeting planned between CICESE earth scientists
    and GEON re. connectivity into Mexico
  • Japan
  • Sending invitation to Earth Simulator
    visualization group to attend GEON Visualization
    workshop.
  • UK
  • Visit to UK e-Science Center June 28/29, 2004
  • Targeted ?
  • iGEON in Asia-Pacific could collaborate with the
    PRAGMA effort (Peter Arzberger, PI)
  • GEON will participate in next PRAGMA meeting as
    one of the featured applications

37
Opportunities
  • Define common standards, e.g.
  • Global Geosciences Identifiers (URI)
  • Ontologies (Semantic Web standards)
  • Web services definitions, and other standards
  • Work towards linking GEON with other related
    efforts
  • Travel funds for travel to each others science
    and IT workshops and individual meetings
  • Sabbatical, training visits
  • Share computing capabilities for GeoScience
    applications
  • Technologies for 3D and 4D visualizations,
    on-demand computing,

38
FYI
  • Cyberinfrastructure Summer Institute for the
    Geosciences
  • August 16-20, 2004, San Diego
  • See
  • www.geongrid.org/summerinstitute
  • for more information

39
National Laboratory for Advanced Data Research
(NLADR)An SDSC/NCSA Data Collaboration
  • Co-Directors
  • Chaitan Baru, Data and Knowledge System (DAKS)
    SDSC
  • Michael Welge, Automated Learning Group (ALG)
  • NCSA

40
NLADR Vision
  • Collaborative RD activity between NCSA
    (Illinois) and SDSC in advanced data technologies
  • guided by real applications from science
    communities
  • to develop broad data architecture framework
  • within which to develop, deploy, and test
    data-related technologies
  • in the context of a national-scale physical
    infrastructure (Internet-D)

41
NLADR Focus
  • Solving the data needs of real applications
  • Initially focused on some Geoscience applications
    (GEON, LEAD)
  • Also, looking into environmental science
    applications (LTER, NEON, CLEANER)
  • NLADR Fellows programenable postdocs, faculty,
    staff from domain sciences to partner with NLADR
    staff

42
Core Activities
  • Internet-D Fielding a distributed, data testbed
  • Core technologies and reference implementations
    of data cyberinfrastructure
  • Standards activities
  • Evaluation usability and performance

43
Internet-D
  • Distributed data testbed
  • Initially, within networked environment between
    SDSC and NCSA.
  • Open to community
  • for testing new data management and data mining
    approaches, protocols, middleware, and
    technologies.
  • A minimum configuration will include
  • Distributed infrastructure, e.g. cluster systems
    at each end-pointwith maximum memory and
    adequate disk capability, high-speed network
    connectivity across the end points.
  • High-end configuration
  • Prototype environment to represent very-high end,
    extreme capability.
  • Provide highest possible end-to-end bandwidth,
    from disk-to-disk
  • Very large main memory and very large disk arrays.

44
NLADR Core Technologies
  • Core data services
  • Caching, replication, prefetching, multiple
    transfer streams
  • Integration of distributed data
  • Integrate independently-created, distributed,
    heterogeneous databases
  • Mining complex data
  • Data mining of distributed, complex scientific
    data, including exploratory analysis and
    visualization
  • Long-term Data Preservation
  • Developing tools to preserve data for long
    periods of time

45
NLADR Evaluation Activities
  • Data Grid benchmarking efforts
  • Functionality and performance
  • In multi-user, concurrent access environments
  • Online, on-demand
  • Evaluate parallel filesystems, parallel database
    systems
  • Develop data experts for various modalities of
    data
  • Investigate and characterize architectures,
    capabilities for long term preservation

46
Joining NLADR
  • No formal process yet
  • Contact me (baru_at_sdsc.edu), if interested
  • Should be willing to contribute one or more of
  • Interesting applications
  • People time, to work on NLADR objectives
  • Infrastructure (servers, storage, networking)
    towards Internet-D

47
Thank You!
  • Visit www.geongrid.org
  • Stay tuned for www.nladr.net
  • My email baru_at_sdsc.edu
Write a Comment
User Comments (0)
About PowerShow.com