Title: Semantic Data Integration and Ontologies
1Semantic Data Integration and Ontologies
- Peter Fox
- High Altitude Observatory, NCAR
- With thanks to Deborah McGuinness, Rob Raskin,
Krishna Sinha, Luca Cinquini and others
2Outline
- Background, definitions
- Semantic Web basics and ontologies
- Semantic Web Review and Technical Benefit
Examples - Methodology for building ontologies
- Summary
- Additional Material
- Virtual Observatories, use cases, ontology
- Data integration examples
- Editors, tools, triple stores, etc.
- More information
3Background
- Scientists should be able to access a global,
distributed knowledge base of scientific data
that - appears to be integrated
- appears to be locally available
- But data is obtained by multiple instruments,
using various protocols, in differing
vocabularies, using (sometimes unstated)
assumptions, with inconsistent (or non-existent)
meta-data. It may be inconsistent, incomplete,
evolving, and distributed - And there exist(ed) significant levels of
semantic heterogeneity, large-scale data, complex
data types, legacy systems, inflexible and
unsustainable implementation technology
4Definitions
- Semantic Web
- An extension of the current web in which
information is given well-defined meaning, better
enabling computers and people to work in
cooperation, www.semanticweb.org - Primer
http//www.ics.forth.gr/isl/swprimer/ - Semantic Grid
- Semantic services to use the resources of many
computers connected by a network to solve large
scale computational problems - Ontology (n.d.).
- An explicit?formal specification of how to
represent the objects, concepts?and other
entities that are assumed to exist in some area
of?interest and the relationships that hold among
them. - The Free On-line Dictionary of Computing.
http//dictionary.reference.com/browse/ontology - Provenance
- origin or source from which something comes,
intention for use, who/what generated for, manner
of manufacture, history of subsequent owners,
sense of place and time of manufacture,
production or discovery, documented in detail
sufficient to allow reproducibility. - Closed World where complete knowledge is
known/encoded, AI relied on this - Open Worldwhere knowledge is incomplete/
evolving, SW promotes this
5Semantic Web Basics
- The triple subject-object-predicate
- Interferometer is-a optical instrument
- Optical instrument has focal length
- An ontology is a representation of this knowledge
- W3C is the primary (but not sole) governing
organization for languages, specifications, best
practices, etc. - RDF - Resource Description Framework programming
environment for 14 languages, including C, C,
Python, Java, Javascript, Ruby, PHP - OWL 1.0 - Ontology Web Language (OWL 1.1 on the
way) - OWL-Lite, OWL-DL, OWL-Full - Encode the knowledge in triples, in a
triple-store, software is built to traverse the
semantic network, it can be queried or reasoned
upon - Put semantics between/ in your interfaces, i.e.
between layers and components in your
architecture, i.e. between users and
information to mediate the exchange
6 Ontology Spectrum
Thesauri narrower term relation
Selected Logical Constraints (disjointness,
inverse, )
Frames (properties)
Formal is-a
Catalog/ ID
Informal is-a
Formal instance
General Logical constraints
Terms/ glossary
Value Restrs.
Originally from AAAI 1999- Ontologies Panel by
Gruninger, Lehmann, McGuinness, Uschold, Welty
updated by McGuinness. Description in
www.ksl.stanford.edu/people/dlm/papers/ontologies-
come-of-age-abstract.html
7Semantic Web Layers (and some extensive
background experience)
- Ontology Level
- Language (OWL (RDF/XML compatible))
- Environments (inspired by FindUR, Chimaera,
Ontolingua, OntoBuilder/Server, Sandpiper
Tools, Cerebra, ) - Standards body leverage (W3Cs WebOnt, W3Cs
Semantic Web Best Practices, EU/US Joint Com,
OMG ODM, W3Cs RIF, Scientific Markup
Standards, ) - Query
- SPARQL, OWL-QL,
- Rules
- RIF, SWRL ,
- Logic
- Description Logics, FOL
- Proof
- PML, Inference Web Services and Infrastructure
- Trust
- IWTrust
http//www.w3.org/2003/Talks/1023-iswc-tbl/slide26
-0.html, http//flickr.com/photos/pshab/291147522/
8Application Areas for SW
- Smart search
- Annotation (even simple forms), smart tagging
- Geospatial
- Implementing logic (rules), e.g. in workflows
- Data integration
- Verification
- Web services
- Web content mining with natural language parsing
- User interface development (portals)
- Semantic desktop
- Wikis - OntoWiki, SemanticMediaWiki
- Sensor Web
- Software engineering
- Explanation . and the list goes on
9Selected Technical Benefits
- Integrating Multiple Data Sources
- Semantic Drill Down / Focused Perusal
- Statements about Statements
- Inference
- Translation
- Smart (Focused) Search
- Smarter Search Configuration
- Proof and Trust
Updated material reused from The Substance of
the Web. McGuinness and Dean. Semantic Web
Applications for National Security. May, 2005.
http//www.schafertmd.com/swans/agenda.html
101 Integrating Multiple Data Sources
- The Semantic Web lets us merge statements from
different sources - The RDF Graph Model allows programs to use data
uniformly regardless of the source - Figuring out where to find such data is a
motivator for Semantic Web Services
Ionosphere
magnetic
hasCoordinates
name
hasLowerBoundaryValue
100
Terrestrial Ionosphere
hasLowerBoundaryUnit
km
Different line text colors represent different
data sources
112 Drill Down /Focused Perusal
- The Semantic Web uses Uniform Resource
Identifiers (URIs) to name things - These can typically be resolved to get more
information about the resource - This essentially creates a web of data analogous
to the web of text created by the World Wide Web - Ontologies are represented using the same
structure as content - We can resolve class and property URIs to learn
about the ontology
NeutralTemperature
Norway
Internet
locatedIn
measuredby
...ISR
...FPI
type
operatedby
EISCAT
...MilllstoneHill
123 Statements about Statements
- The Semantic Web allows us to make statements
about statements - Timestamps
- Provenance / Lineage
- Authoritativeness / Probability / Uncertainty
- Security classification
-
- This is an unsung virtue of the Semantic Web
Dannys
Aurora
hasSource
hascolor
hasDateTime
Red
20031031
Ontologies Workshop, APL May 26, 2006
134 Inference
- The formal foundations of the Semantic Web allow
us to infer additional (implicit) statements that
are not explicitly made - Unambiguous semantics allow question answerers to
infer that objects are the same, objects are
related, objects have certain restrictions, - SWRL allows us to make additional inferences
beyond those provided by the ontology
Interferometer
Millstone Hill
OperatesInstrument
hasInstrument
isOperatedBy
isMeasuredBy
hasOperatingMode
hasTypeofData
hasMeasuredData
VerticalMeans
145 Translation
- While encouraging sharing, the Semantic Web
allows multiple URIs to refer to the same thing - There are multiple levels of mapping
- Classes
- Properties
- Instances
- Ontologies
- OWL supports equivalence and specialization SWRL
allows more complex mappings
precipitation
name
ont1EduLevel
ont1Precipitation
VOScientist
precipitation
name
ont2EduLevel
ont2Rain
EduVOK-12
156 Smart (Focused) Search
- The Semantic Web associates 1 or more classes
with each object - We can use ontologies to enhance search by
- Query expansion
- Sense disambiguation
- Type with restrictions
- .
16(No Transcript)
177 Smarter Search / Configuration
18 GEONGRID Ontology Search and Data Integration
Example
-
- Uses emerging web standards to enable smart web
applications - Given an upper-level domain choice
- Ecology
- Illustrate or list contained concepts/hierarchy
- VegetationCover, TreeRings, etc.
- Retrieve some specific options from web
- Maps, tree-ring data
- Info https//portal.geongrid.org8443/gridsphere
/gridsphere
19Semantic Web Integration Technology (as used in
the KSL Wine Agent)
OWL for representing a domain ontology of X
and Y their properties, and relationships between
them JTP theorem prover for deriving
appropriate pairings DQL/OWL QL for querying
a knowledge base Inference Web for explaining
and validating answers (descriptions or
instances) Web Services for interfacing with
vendors Connections to online web
agents/information services Utilities for
conducting and caching the above transactions
20VSTO - semantics and ontologies in an operational
environment vsto.hao.ucar.edu, www.vsto.org
21VO API
Web Serv.
VO Portal
Query, access and use of data
- Mediation Layer
- Ontology - capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and
associated classes, properties) and Service
Classes - Maps queries to underlying data
- Generates access requests for metadata, data
- Allows queries, reasoning, analysis, new
hypothesis generation, testing, explanation, etc.
Semantic mediation layer - VSTO - low level
Metadata, schema, data
DBn
DB2
DB3
DB1
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26Semantic Web Benefits
- Unified/ abstracted query workflow Parameters,
Instruments, Date-Time - Decreased input requirements for query in one
case reducing the number of selections from eight
to three - Generates only syntactically correct queries
which was not always insurable in previous
implementations without semantics - Semantic query support by using background
ontologies and a reasoner, our application has
the opportunity to only expose coherent query
(portal and services) - Semantic integration in the past users had to
remember (and maintain codes) to account for
numerous different ways to combine and plot the
data whereas now semantic mediation provides the
level of sensible data integration required, now
exposed as smart web services - understanding of coordinate systems,
relationships, data synthesis, transformations,
etc. - returns independent variables and related
parameters - A broader range of potential users (PhD
scientists, students, professional research
associates and those from outside the fields)
278 Proof
- The logical foundations of the Semantic Web allow
us to construct proofs that can be used to
improve transparency, understanding, and trust - Proof and Trust are on-going research areas for
the Semantic Web e.g., See PML and Inference Web
hasCalibration
FlatField
Critical Dataset
hasPeerReview
Solar Physics Paper
Critical Dataset has been calibrated with a
flat field program that is published In the peer
reviewed literature.
28Inference Web
- Framework for explaining reasoning tasks by
storing, exchanging, combining, annotating,
filtering, segmenting, comparing, and rendering
proofs and proof fragments provided by multiple
distributed reasoners. - OWL-based Proof Markup Language (PML)
specification as an interlingua for proof
interchange - IWExplainer for generating and presenting
interactive explanations from PML proofs
providing multiple dialogues and abstraction
options - IWBrowser for displaying (distributed) PML proofs
- IWBase distributed repository of proof-related
meta-data such as inference engines/rules/language
s/sources - Integrated with theorem provers, text analyzers,
web services,
http//iw.stanford.edu
29Inference Web Infrastructure (McGuinness,
et.al., 2004 http//www.ksl.stanford.edu/KSL_Abstr
acts/KSL-04-03.html )
- Framework for explaining question answering tasks
by - abstracting, storing, exchanging,
- combining, annotating, filtering, segmenting,
- comparing, and rendering proofs and proof
fragments - provided by question answerers.
30SW Questions Answers
- Users can explore extracted entities and
relationships, create new hypothesis, ask
questions, browse answers and get explanations
for answers. -
A context for explaining the answer
A question
An answer
An abstracted explanation
(this graphical interface done by Batelle
supported by KSL)
31Browsing Proofs
- The proof associated with an answer can be
browsed in multiple formats.
Menu to switch between Graphical/HTML Proof Styles
Proof Rendered in Graphical Style
Provenance Information associated with a selected
NodeSet
32Developing ontologies
- Use cases and small team (7-8 2-3 domain
experts, 2 knowledge experts, 1 software
engineer, 1 facilitator, 1 scribe) - Identify classes and properties (leverage
controlled vocab.) - Start with narrower terms, generalize when needed
or possible - Data integration - often requires broader terms
- Adopt a suitable conceptual decomposition (e.g.
SWEET) - Import modules when concepts are orthogonal
- Minimal properties to start, add only when needed
- Go Lite as much as possible, then DL and only
if you have to Full - balancing expressibility
vs. implementability - Mid-level to depth - i.e. neither top-down nor
bottom-up - Review, review, review, vet, vet, vet, publish -
www.planetont.org (experiences, results, lessons
learned, AND your ontologies AND discussions) - Only code them (in RDF or OWL) when needed (CMAP,
) - Ontologies small and modular
33Creating OntologiesSimple tools, CMAP, UML
- White board, text file
- CMAP Ontology Editor (concept mapping tool from
IHMC) - Drag/drop visual development of classes, subclass
(is-a) and property relationship - Read and writes OWL
- Formal convention (OWL/RDF tags, etc.)
- New release of ODM/MOF
- Ontology Definition Metamodel/Meta Object
Facility (OMG) for UML - Provides standardized notation
- Available from OMG - http//www.omg.org/technology
/documents/modeling_spec_catalog.htm - Books likely to be available soon
34Semantic Data Integration Concept map for
volcano and atmosphere
- Volcano concept map after the workshop - some
linked concepts are circled
35Is OWL the only option?
- There are also a number core vocabularies (not
necessarily OWL based) - SKOS Core about knowledge systems
- Dublin Core about information resources, digital
libraries, with extensions for rights,
permissions, digital right management - FOAF about people and their organizations
- DOAP on the descriptions of software projects
- MusicBrainz on the description of CDs, music
tracks, ... - SIOC Semantically-Interlinked Online
Communities... - GRDDL for gleaning from vocabularies
- Common Logic (CL), PENG, Rabbit - lack of tools
36(No Transcript)
37(No Transcript)
38What about Earth Science?
- SWEET (Semantic Web for Earth and Environmental
Terminology) - http//sweet.jpl.nasa.gov
- based on GCMD terms
- modular using faceted and integrative concepts
- VSTO (Virtual Solar-Terrestrial Observatory)
- http//vsto.hao.ucar.edu
- captures observational data (from instruments)
- modular using domains
- MMI
- http//marinemetadata.org
- captures aspects of marine data, ocean observing
systems - partly modular, mostly by developed project
- GeoSciML
- http//www.opengis.net/GeoSciML/
- is a GML (Geography ML) application language for
Geoscience - modular, in packages
39CloudCondensationNuclei
Rotation, ThermalProcess
Cyclone
LowerAtmosphere
NaturalHazard
PotentialVorticity
WeatherResearchForecastModel
LowerBound
40Summary
- Semantics/Ontologies can help with
- Controlled vocabularies with unambiguous term
meanings - Mapping/Merging support for data integration
- Ontology-enhanced search
- Meta-data descriptions
- Consistency Checking
- Completion
- Structured, surgical comparative customized
search -
- VSTO and GEON are leading-edge examples of
success, others are following - Communities can help each other by pooling
resources over scientific ontology creation, use,
evaluation, evolution, and environment development
41Spare room
42Virtual Observatories
- Make data and tools quickly and easily accessible
to a wide audience. - Operationally, virtual observatories need to find
the right balance of data/model holdings, portals
and client software that researchers can use
without effort or interference as if all the
materials were available on his/her local
computer using the users preferred language
i.e. appear to be local and integrated - Likely to provide controlled vocabularies that
may be used for interoperation in appropriate
domains along with database interfaces for access
and storage and smart tools for evolution and
maintenance.
43Early days of VxOs
VO2
VO3
VO1
DBn
DB2
DB3
DB1
44The Astronomy approach data-types as a service
- VOTable
- Simple Image Access Protocol
- Simple Spectrum Access Protocol
- Simple Time Access Protocol
VO App2
VO App3
VO App1
OGC WFS, WCS, WMS and SWE SOS, SPS, SAS use
the same approach
VO layer
DBn
DB2
DB3
DB1
45VO API
Web Serv.
VO Portal
Query, access and use of data
- Mediation Layer
- Ontology - capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and
associated classes, properties) and Service
Classes - Maps queries to underlying data
- Generates access requests for metadata, data
- Allows queries, reasoning, analysis, new
hypothesis generation, testing, explanation, etc.
Semantic mediation layer - VSTO - low level
Metadata, schema, data
DBn
DB2
DB3
DB1
46Virtual Solar Terrestrial Observatory
- a distributed, scalable education and research
environment for searching, integrating, and
analyzing observational, experimental, and model
databases. - subject matter covers the fields of solar,
solar-terrestrial and space physics - it provides virtual access to specific data,
model, tool and material archives containing
items from a variety of space- and ground-based
instruments and experiments, as well as
individual and community modeling and software
efforts bridging research and educational use - 3 year NSF-funded (OCI/SCI) project in its third
year
47VSTO achievements
- Conceptual model and architecture developed by
combined team KR experts, domain experts, and
software engineers - Semantic framework developed and built with a
small, cohesive, carefully chosen team in a
relatively short time (deployments in 1st year) - Production portal released, includes security,
etc. with community migration (and so far
endorsement) - VSTO ontology version 1.2, (vsto.owl)
- Web Services encapsulation of semantic interfaces
- More Solar Terrestrial use-cases are driving the
completion of the ontologies - filling out the
instrument ontology - Using ontologies in other applications
(volcanoes, climate, )
48Content Coupling Energetics and Dynamics of
Atmospheric Regions WEB
Community data archive for observations and
models of Earth's upper atmosphere and
geophysical indices and parameters needed to
interpret them. Includes browsing capabilities
by periods, instruments, models,
49Content Mauna Loa Solar Observatory
Near real-time data from Hawaii from a variety of
solar instruments. Source for space weather,
solar variability, and basic solar physics Other
content used too CISM Center for Integrated
Space Weather Modeling
50Science and technical use cases
- Find data which represents the state of the
neutral atmosphere anywhere above 100km and
toward the arctic circle (above 45N) at any time
of high geomagnetic activity. - Extract information from the use-case - encode
knowledge - Translate this into a complete query for data -
inference and integration of data from
instruments, indices and models - Provide semantically-enabled, smart data query
services via a SOAP web for the Virtual
Ionosphere-Thermosphere-Mesosphere Observatory
that retrieve data, filtered by constraints on
Instrument, Date-Time, and Parameter in any order
and with constraints included in any combination.
51 Translating the Use-Case - non-monotonic?
GeoMagneticActivity has ProxyRepresentation Geophy
sicalIndex is a ProxyRepresentation (in Realm of
Neutral Atmosphere) Kp is a GeophysicalIndex
hasTemporalDomain daily hasHighThreshold
xsd_number 8 Date/time when KP gt 8
Specification needed for query to
CEDARWEB Instrument Parameter(s) Operating
Mode Observatory Date/time Return-type data
- Input
- Physical properties State of neutral atmosphere
- Spatial
- Above 100km
- Toward arctic circle (above 45N)
- Conditions
- High geomagnetic activity
- Action Return Data
52 Translating the Use-Case - ctd.
NeutralAtmosphere is a subRealm of
TerrestrialAtmosphere hasPhysicalProperties
NeutralTemperature, Neutral Wind,
etc. hasSpatialDomain 0,360,0,180,100,150 h
asTemporalDomain NeutralTemperature is a
Temperature (which) is a Parameter
Specification needed for query to
CEDARWEB Instrument Parameter(s) Operating
Mode Observatory Date/time Return-type data
Input Physical properties State of neutral
atmosphere Spatial Above 100km Toward arctic
circle (above 45N) Conditions High geomagnetic
activity Action Return Data
FabryPerotInterferometer is a Interferometer,
(which) is a Optical Instrument (which) is a
Instrument hasFilterCentralWavelength
Wavelength hasLowerBoundFormationHeight
Height ArcticCircle is a GeographicRegion hasLati
tudeBoundary hasLatitudeUpperBoundary
GeoMagneticActivity has ProxyRepresentation Geophy
sicalIndex is a ProxyRepresentation (in Realm of
Neutral Atmosphere) Kp is a GeophysicalIndex
hasTemporalDomain daily hasHighThreshold
xsd_number 8 Date/time when KP gt 8
53(No Transcript)
54(No Transcript)
55http//dataportal.ucar.edu/schemas/vsto_all.owl
56Semantic Web Services
57Semantic Web Services
OWL document returned using VSTO ontology - can
be used both syntactically or semantically
58Semantic Web Services
59Semantic Web Services
60Issues for Virtual Observatories
- Scaling to large numbers of data providers
- Crossing disciplines
- Security, access to resources, policies
- Branding and attribution (where did this data
come from and who gets the credit, is it the
correct version, is this an authoritative
source?) - Provenance/derivation (propagating key
information as it passes through a variety of
services, copies of processing algorithms, ) - Data quality, preservation, stewardship, rescue
- Interoperability at a variety of levels (3)
Semantics can help with many of these
61Semantic Data Integration Concept map for
volcano and atmosphere
- Volcano concept map after the workshop - some
linked concepts are circled
62Semantic Information Integration Concept map for
educational use of science data in a lesson plan
63(No Transcript)
64Terminology
- Closed World - where complete knowledge is known
(encoded), AI relied on this - Open World - where knowledge is incomplete/
evolving, SW promotes this - Languages
- OWL - Web Ontology Language (W3C)
- RDF - Resource Description Framework (W3C)
- OWL-S/SWSL - Web Services (W3C)
- WSMO/WSML - Web Services (EC/W3C)
- SWRL - Semantic Web Rule Language, RIF- Rules
Interchange Format - PML - Proof Markup Language
- Editors Protégé, SWOOP, Medius, SWeDE,
- Reasoners
- Pellet, Racer, Medius KBS, FACT, fuzzyDL,
KAON2, MSPASS, QuOnto - Query Languages
- SPARQL, XQUERY, SeRQL, OWL-QL, RDFQuery
- Other Tools for Semantic Web
- Search SWOOGLE swoogle.umbc.edu
- Collaboration www.planetont.org
- Other Jena, SeSAME/SAIL, Mulgara, Eclipse,
KOWARI - Semantic wiki OntoWiki, SemanticMediaWiki
65Editors
- Protégé (http//protégé.stanford.edu)
- SWOOP (http//mindswap.org/2004/SWOOP see also
http//www.mindswap.org/downloads/) - Altova SemanticWorks (http//www.altova.com/downlo
ad/semanticworks/semantic_web_rdf_owl_editor.html)
- SWeDE (http//owl-eclipse.projects.semwebcentral.o
rg/InstallSwede.html), goes with Eclipse - Medius (www.sandsoft.com)
- TopBraid Composer and other commercial tools
- CMAP Ontology Editor (COE) (http//cmap.ihmc.us/co
e)
66Software development tools
- Protégé, w/ plug-ins - some better than others
- SWOOP (OWL analyzer, partitioner)
- Jena (http//jena.sourceforge.net/)
- Eclipse (full integrated development environment
for Java http//www.eclipse.org/) - Top Quadrant suite
- Sandsoft
- see Semantic Technologies 2007
67Triple Stores
- Jena (http//jena.sourceforge.net/)
- SeSAME/SAIL (http//www.openrdf.org/)
- KOWARI (http//www.kowari.org/) -gt
- Mulgara (http//www.mulgara.org/)
- Redland (http//librdf.org/index.html)
- Oracle (!)
- Many others (relational, object-relational)
68(No Transcript)
69(No Transcript)
70Semantic Web Services
- Ontologies of services, provides
- What does the service provide for prospective
clients? The "profile," which is used to
advertise the service. Each instance of the class
Service presents a ServiceProfile. - How is it used? The "process model, captured by
the ServiceModel class. Instances of the class
Service use the property describedBy to refer to
the service's ServiceModel. - How does one interact with it? The "grounding,"
provides the needed details about transport
protocols. Instances of the class Service have a
supports property referring to a ServiceGrounding.
71SW Services, not standard
- Submissions to W3C
- OWL-S - http//www.w3.org/Submission/OWL-S
- SWSO/F/L - Semantic Web Services
Ontology/Framework/Language - http//www.w3.org/S
ubmission/SWSF/ - WSMO/X/L - Web Services Modeling
Ontology/Exection/Language - http//www.w3.org/Sub
mission/WSMX/ www.wsmo.org, www.wsmx.org - SAWSDL - http//www.w3.org/2002/ws/sawsdl/
72More Information
- Virtual Solar Terrestrial Observatory (VSTO)
http//vsto.hao.ucar.edu, http//www.vsto.org - Semantically-Enalbed Science Data Integration
(SESDI) http//sesdi.hao.ucar.edu - Semantic Knowledge Integration Framework (SKIF)
http//skif.hao.ucar.edu - Semantic Web for Earth and Environmental
Terminology (SWEET) http//sweet.jpl.nasa.gov - Geosciences Network (GEON) http//www.geongrid.or
g - W3Cs Web Ontology Language (OWL) -
http//www.w3.org/TR/owl-features/ - Conferences ISWC 2007, CIKM 2007, SemTech 2008,
IEEE ICSC 2007, KDD 2007, AAAI/IAAI 2007 - Peter Fox pfox_at_ucar.edu