Title: Geoscience Knowledge Representation Using the SWEET Ontologies
1Geoscience Knowledge Representation Using the
SWEET Ontologies
- Rob Raskin
- Jet Propulsion Laboratory
2Transforming Data into Knowledge
Data Information Knowledge
Basic Elements Bytes Numbers Models Facts Serv
ices Ingest Archive Visualize
Infer Understand Predict Storage File
Database HDF-EOS GIS MIS Ontology
Mind Interoperability Syntactic OPeNDAP
WMS/WCS Semantic Volume/Density High/Low Low/H
igh Statistics Checksum Moments
Descriptive Inferential Analysis
Fourier Wavelet EOF
SSA Methodology Exploratory-analysis
Model-based-mining
Syntax Semantics
3What is Knowledge?
- Facts, relations, meanings, contexts
- Organized information
- Core ingredient in common sense
- Common understanding
- In a form to apply reasoning/inference
- Dynamic
- Expandable
4Semantic Understanding is Difficult!
Sea surface temperature measured 3 m above
surface Sea surface temperature measured at
surface
Data quality 5
Variable t temperature Variable t time
Lets eat, Grandma. Lets eat Grandma.
Time flies like an arrow. Fruit flies like a pie.
LA Times headline
Mission accomplished. Major combat operations in
Iraq have ended
5Database vs Knowledge Base
- Database
- Entities and Relations
- Closed world
- All facts included
- Knowledge base
- Classes and Properties
- Collection of facts
- Captures corporate memory
- Open world
- Facts not stated may be either true or untrue
6PO.DAAC Knowledge Bases
Public access
People
Documents
Roles/Tasks
Data Processing
(Docushare)
Data Products
Metadata
Tools/ Services
Web Pages
Science Concepts
Missions
Instruments
Organiza- tions
Applications
Announce- ments
Inquiries
Computers
7Relations
- People have roles
- Instruments measure science parameters
- Inquiries relate to data products
- etc.
8Example of Knowledge-Assisted Service
- Yellow Page Lookup
- cars vs automobiles
- Hotels vs motels vs resorts
9Semantic-based Service Example Google
- Type into Google gymnasiums in Seattle
- Generates map of Seattle with dots locating gyms
- Google understands that
- Seattle is a place
- Gymnasiums is a place-based service
- Google understands semantics so that the search
results also could include - locations near Seattle
- Similar services (e.g., health club)
10Assertion of Facts as Triples
- Subject-Verb-Object representation
- Flood subClassOf WeatherPhenomena
- HDF subClassOf FileFormat
- Pressure subClassOf PhysicalProperty
- Ocean hasSubstance Water
- AIRS measures Temperature
11Applications
- Software tools can find meaning in resources
for - Discovery
- Fusion
- Lineage
-
- Requirements
- Data products associated with objects in science
concept space - Richer descriptions than DIFs
- Data services associated with objects in service
concept space - Richer descriptions than SERFs
- Search/fusion tools that exploit ontologies
12Semantic Web Vision
- Web page creators place XML tags around technical
terms on web pages - XML tags point to knowledge base where term is
defined - Search tools use this information to provide
value-added services - Common search engines (Google) use these
capabilities only minimally, at present
13Ontologies
- Current preferred method to store facts
- General definition all that is known
- Computer science definition Machine-readable
definition of terms and how they relate to one
another - As with a dictionary, terms are defined in terms
of other terms - Provide shared understanding of concepts
- Support knowledge reuse
- Support machine-to-machine communications with
deeper semantics than controlled vocabulary
14XML-based Ontology Languages
- XML satisfies desired properties for language
syntax - Readable by both humans and machines
- However, there are too many possible ways that
XML tags can be named and used - No standardization of XML tag meanings as in HTML
(ltbgt lt/bgt pair gt renders in bold) - Additional standardized semantics needed to
exploit shared understanding of concepts
15RDF and OWL
- W3C has adopted languages that specialize XML
Resource Description Formulation (RDF) - Ontology Web Language (OWL)
- Languages predefine specific tags
- RDF Class, subclass, property, subproperty,
- RDF and OWL form a nested collection of
languages, each roughly a specialization of the
preceding language with further shared
understanding - XML
- RDF
- RDFS
- OWL Lite
- OWL DL
- OWL Full
16Semantic Web for Earth and Environmental
Terminology (SWEET)
- SWEET is a concept space
- Enables scalable classification of Earth system
science concepts - Currently being expanded to Space science
- Anybody can import, expand, and specialize the
work of others - No need to regenerate a physics, chemistry, or
math ontology - Concept space is translatable into other
languages/cultures using sameAs notions
17SWEET Ontologies and Their Interrelationships
Faceted Ontologies
Non-Living Substances
Living Substances
Integrative Ontologies
Physical Processes
Natural Phenomena
Earth Realm
Human Activities
Physical Properties
Data
Time
Space
Units
Numerics
18SWEET as an Upper Level Earth Science Ontology
Math
Physics
Chemistry
Space
import
Property EarthRealm Process, Phenomena Substance
Data
SWEET
Time
import
Stratospheric Chemistry
Biogeochemistry
Specialized domains
19Why an Upper-Level Ontology for Earth System
Science?
- Many common concepts used across Earth Science
disciplines (such as properties of the Earth) - Provides common definitions for terms used in
multiple disciplines or communities - Provides common language in support of community
and multidisciplinary activities - Provides common properties (relations) for tool
developers - Reduced burden (and barrier to entry) on creators
of specialized domain ontologies - Only need to create ontologies for incremental
knowledge
20How SWEET was Initially Populated
- Initial sources
- GCMD
- Over 10,000 datasets
- Over 1000 keywords
- Data providers submit far more than the 1000
terms for free-text search - CF
- Over 500 keywords
- Very long term names
- surface_downwelling_photon_spherical_irradiance_in
_sea_water - Decomposed into facets
21Spatial Ontology
- Concepts of 0-D, 1-D, 2-D, and 3-D objects
- Default coordinate system lat/lon/up
- Polygons used to store spatial extents
- Spatial attributes added (population, area, etc.)
- Scientific applications include geology to
represent 3-D structure
22Numerical Ontologies
- Numerics
- Extents interval, point, 0, positiveIntegers,
- Relations lessThan, greaterThan,
- SpatialEntities
- Extents country, Antarctica, equator, inlet,
- Relations above, northOf,
- TemporalEntities
- Extents duration, century, season,
- Relations after, before,
23Numerical Ontologies (cont.)
- Numeric concepts defined in OWL only through
standard XML XSD spec - Intervals defined as restrictions on real line
- Numerical relations defined in SWEET
- lessThan, max,
- Cartesian product (multidimensional spaces) added
in SWEET - Numeric ontologies used to define spatial and
temporal concepts
24Conceptual Ontologies
- Phenomena
- ElNino, Volcano, Thunderstorm, Deforestation)
- Each has associated, spatial/temporal extent,
EarthRealms, PhysicalProperties etc. - Specific instances included
- e.g., 1997-98 ElNino
- Human Activities
- Fisheries, IndustrialProcessing, Economics,
Public Good - State
- History or state of planet or component
25SWEET Users
- ESML- Earth Science Markup Language
- ESIP - Earth Science Information Partner
Federation - GEON- Geosciences Network
- GENESIS- Global Environmental Earth Science
Information System - IRI- International Research Institute (Columbia)
- LEAD- Linked Environments for Atmospheric
Discovery - MMI- Marine Metadata Initiative
- NOESIS
- PEaCE- Pacific Ecoinformatics and Computational
Ecology - SESDI- Semantically Enabled Science Data
Integration - VSTO- Virtual Solar-Terrestrial Observatory
26Collaboration Web Site
- Discussion tools
- Blog, wiki, moderated discussion board
- Version Control/ Configuration Management
- Trace dependencies on external ontologies
- Tools to search for existing concepts in
registered ontologies - Ontology Validation Procedure
- W3C note is formal submission method
- Registry/discovery of ontologies
- Support workflows/services for ontology
development
27Community Issues
- Content
- Maintain alignment given expansion of classes and
properties - Standards and Conventions
- Agreement on standards for use of OWL
- Fuzzy representation conventions
- Review Board
- Who will oversee and maintain for perpetuity (or
at least through the next funding cycle) - ESIP Federation? ESSI?
- Global Support
- Provide tools to visualize and appreciate the big
picture
28Update/Matching Issues
- No removal of terms except for spelling or
factual errors - Subscription service to notify affected
ontologies when changes made - Must avoid contradictions
- Additions can create redundancy if sameAs not
used - Humans must oversee matching
- CF has established moderator to carry out
analogous additions - OWL import imports entire file
- Associate community with ontology terms
- Community tagging
29Best Practices
- Keep ontologies small, modular
- Be careful that OwlImport imports everything
- Use higher level ontologies where possible
- Identify hierarchy of concept spaces
- Model schemas
- Try to keep dependencies unidirectional