Title: Adding Value While Having Fun With EPA Data
1Adding Value While Having Fun With EPA Data!
- Brand Niemann
- Co-Chair, Semantic Interoperability Community of
Practice (SICoP) - Best Practices Committee (BPC), CIO Council, and
- Enterprise Architecture Team, Office of
Environmental Information - U.S. Environmental Protection Agency
- February 23, 2005, and March 2, 2005
2Data Interoperability Paradigm Shifts
- Organizational
- The National Infrastructure for Community
Statistics (NICS) Community of Practice (CoP)
wants to make its data NICS Ready by publishing
it to the Web in such a way that others can
easily reuse it! (Like buying a new TV that is
HDTV Ready.) - Technical
- The conceptualization of new technical systems
suffers from technological presbyopia the
condition of being able to envisage things more
clearly the farther they are from the present
realization even though the prospective users
may grow weary and skeptical while waiting for
the future to arrive. - Semantic
- Ontology and ontology patterns are the applied
use of two basic tenets of software design and
architecture, indirection and abstraction (see
Appendix).
3Overview
- Data is
- Data can be re-purposed
- Data can be mined
- Data can be modeled
- Data can be integrated published
- Data standards can evolve
- Data architecture can be implemented in
ontology-driven information systems - Appendix on Indirection Abstraction
This is a semantic approach!
4Data is
- Unstructured
- Example Web pages
- Semi-structured
- Example Documents (with Table of Contents)
- Structured
- Example Databases (tables)
- Demonstrations in the next section.
- Notes
- This is part of the new Data Reference Model
(DRM) work. - But this was there in XML from the beginning -
everything is a document (all three types).
5Data can be re-purposed
- Unstructured to Semi-structured
- e.g. Region 4 2004 Press Releases.
- Semi-structured to Structured
- e.g. Region 4 Strategic Plan 2004.
- Structured to Even More Structured
- e.g. TRI data table.
- Demonstrations of Each from the Region 4 Pilot.
- Notes
- We Can Add Metadata and Interoperability At the
Same Time! - We Can Even Add Semantic Metadata and
Interoperability! - This changes the way we look at portals, document
and content management systems, registries and
repositories there really is no difference
re-purposed data serves all these functions!
6Data can be mined
- The TRI database is about 8 GB and requires
industrial-strength tools and analyses for data
mining, indexing, conversion to XML, and storage
and retrieval with XML Web Services. This pilot
demonstrated that large EPA databases can be
data mined and repurposed into XML
repositories. - February 25-26, 2003, Data Mining Technology for
Military and Government Applications Conference,
XML Web Services for Data Mining and Repository
US EPA Toxics Release Inventory, Brand Niemann,
US EPA, and Data Mining Technology, Jim Walters,
Insightful Corporation. - April 9, 2004, Presentations and Demonstrations
to the EPA Business Intelligence and Analytics
(BIA) User Group Meeting.
7Data can be mined
http//www.insightful.com/
8Data can be mined
I-Miner on TRI 2000 Public Release Data
Pipeline Architecture and Visual Workflow
9Data can be mined
I-Miner on TRI 2000 Public Release Data
Histogram Plots
Rationale Toxic chemical releases to different
media should be correlated outliers suggest
need to follow up with reporting facilities.
10Data can be modeled
- EPA Region 4 Pilot
- Phase 1
- October 1, 2004, Implementing a Service-Oriented
Architecture Pilot Project Design and Initial
Results. - October 19 and 25, 2004, Building Enterprise
Architecture through Web Services for Remote
Portlets (WSRP) in a Sample EPA Regional Portal,
Rex Brooks, OASIS, and Ali Naizi, Oracle. - November 18, 2004, Connecting the Dots CAP-WSRP
Presented at the XML 2004 Conference as Part of
the OASIS Interoperability Demos That Showed the
Integration of Web Services for Remote Portlets
with the Common Access Protocol and Including
GeoResponse for Use in the Federal Region 4
Semantic Interoperability Pilot. - Phase 2
- February 10, 2005, Semantic Interoperability
Community of Practice Enablement (SCOPE) for the
EPA Region 4 Pilot.
11Current Ontology for Region 4 Pilot
12Data can be modeled
- The Roadmap
- Re-purpose the 2004 EPA Region 4 Press Releases
(unstructured to semi-structured) and extract the
semantic metadata on word usage and frequency and
use at the top of document. - Creates a Semantic Web Services node that can be
integrated with other EPA nodes and Non-EPA
nodes! - Compare to what we have now with the EPA Web site
and the Exchange Network. - Need ontologies to supplement their topologies
(topics and Web Services nodes, respectively).
13Data can be modeled
Demo
Ontology-based search by month, geography, topic,
water semantics.
14Data can be modeled
Note that search can be focused by both node
hierarchy and/or semantics.
15Data can be modeled
Ontology (left) is designed by access to the
semantics (right) which is updated automatically
with the addition of new content (not like ETL).
16Data can be modeled
EPA Web Site
The results is 42 documents out of 533 which then
have to be read and digested!
17Data can be integrated published
- LandView 6 is integration of
- Census 2000
- EPA EnviroFacts
- USGS Geographic Names Information (GNIS)
features and - FGDC Standard Metadata.
- LandView is publishing with
- Interagency collaboration at very low cost
- Public domain software that is easy to use
- Database and software component reuse by design
and - Geo-processing (e.g. population estimation) and
Web-services out of the box (FileMaker Pro)
(see next slide).
18Data can be integrated published
Population Estimator
LandView Home Screen
http//www.census.gov/geo/landview/
19Data can be integrated published
http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
20Data standards can evolve
- ISO 11179
- EPA Date
- The Date Data Standard provides for a standard
representation of calendar date in data files for
data interchange. - Suggested Upper Merged Ontology (SUMO)
- Date
- According to WordNet, the noun "date" has 8 sense
(s) (see next slide). - SUMO is written in the SUO-KIF language
(declarative semantics and machine processible)
which has been translated to OWL Web Ontology
Language. - See http//www.ontologyportal.org/
21Data standards can evolve
- Date
- 1. The specified day of the month "what is the
date today?". - 2. A particular day specified as the time
something will happen "the date of the election
is set by law". - 3. A meeting arranged in advance "she asked how
to avoid kissing at the end of a date". - 4. A particular but unspecified point in time
"they hoped to get together at an early date". - The present "they are up to date" "we haven't
heard from them to date". - 5. The present "they are up to date" "we
haven't heard from them to date". - 6. A participant in a date "his date never
stopped talking". - 7. The particular year (usually according to the
Gregorian calendar) that an event occurred "he
tried to memorizes all the dates for his history
class". - 8. Sweet edible fruit of the date palm with a
single long woody seed.
22A Bit of Semantic Humor
- Enterprise Architecture
- Enterprise A Star Trek Spaceship
- Architecture Blueprints
- So, Blueprints of the Spaceship Enterprise!
23Data architecture can implemented in
ontology-driven information systems
- Ontology-Driven Information Systems
- Methodology Side the adoption of a highly
interdisciplinary approach - Analyze the structure at a high level of
generality. - Formulate a clear and rigorous vocabulary.
- Architectural Side the central role in the main
components of an information system - Information resources.
- User interfaces.
- Application programs.
See for example Nicola Guarino, Formal Ontology
and Information Systems, Proceedings of FOIS 98,
Trento, Italy, 6-8 June 1998.
24Data architecture can implemented in
ontology-driven information systems
- The Roadmap
- Basic Requirements For an Ontology.
- Generic Process.
- Process for the Indicator Pilot.
- Explanation of the Ontology.
- Schematic of the Ontology.
- Demonstration.
- Semantic Technology Profiles for the Data
Reference Model (DRM).
25Data architecture can implemented in
ontology-driven information systems
- Basic Requirements For an Ontology
- 1. Finite controlled (extensible) vocabulary.
- 2. Unambiguous interpretation of classes and term
relationships. - 3. Strict hierarchical subclass relationships
between classes. - 4. Few others
Source Deborah McGuiness, Ontologies Come of
Age, in the Semantic Web Why, What, and How, MIT
Press, 2002, page 6.
26Data architecture can implemented in
ontology-driven information systems
- Generic Process
- 1. Identify Scope
- 2. Review Existing Ontologies
- 3. Knowledge Acquisition
- 4. Conceptualization
- 5. Encode
- 6. Test
- 7. Iterate on Steps 3-6.
Source Kathy Lesh, Standards Vocabularies in
Health Care, Kevric, Collaborative Expedition
Workshop 37, NSF, December 9, 2004.
27Data architecture can be implemented in
ontology-driven information systems
- Process for the Pilot
- 1. Identify Scope Indicators.
- 2. Review Existing Ontologies SUMO-WordNet.
- 3. Knowledge Acquisition Recent GAO Report
Considered to Be An Excellent Starting Point. - 4. Conceptualization See Next Two Slides.
- 5. Encode Added to Semantic View of Structured
Document (See Slide 30). - 6. Test Inviting Feedback.
Source DRAFT National Infrastructure for
Community Statistics (NICS) Initial Ontology for
Structuring the Community of Practice and Its
Business Case and Products, By Brand Niemann,
January 20, 2005.
Informing Our Nation Improving How to
Understand and Assess the USAs Position and
Progress, GAO, Report to the Chairman,
Subcommittee on Science, Technology, and Space,
Committee on Commerce, Science, and
Transportation, U.S. Senate, November 2004,
GAO-05-1.
28Data architecture can implemented in
ontology-driven information systems
- Explanation of the Ontology
- Key Terminology
- Source Summary-Background Footnotes
- Topics
- Source Figure 1 and Appendix I
- Level of Jurisdiction
- Source Table 4 of Comprehensive Systems
- Organizational Types
- Source Table 10 of All Systems Studied
Footnotes
Note Meets McGuiness Requirements in Slide 25.
29Data architecture can implemented in
ontology-driven information systems
Schematic of the Ontology
Indicators
Topics
Organizations
Jurisdictions
Publicly led
U.S. local/regional level
The Economy
Privately led
U.S. state level
Society Culture
Led by public-private partnership
National level outside the United States
The Environment
Supranational level
Cross-Cutting
Note that each of these classes can and do have
multiple instances underneath them, etc.
30Data architecture can implemented in
ontology-driven information systems
Demo
See Pilot Projects at http//web-services.gov
Note the folder names are either the ontology or
the knowledgebase instances.
31Data architecture can implemented in
ontology-driven information systems
- Semantic Technology Profiles for the Data
Reference Model (DRM) - Region 4 (See Slide 33).
- Enterprise Architecture FEA-Reference Model
Ontology (FEA-RMO). - Taxonomies Formal Ontologies (Michael Daconta
Recently Published Paper See Next Slide). - Indicators See previous.
- Community Statistics In process with NICS.
- FHA/NHIN In process with Ontolog Forum.
- ISO 11179 In process with DHS, xmdr.org,
Ontolog Forum, etc. - More to be announced as part of the SICoP Module
3 White Paper Development Implementing the
Semantic Web.
32Formal Taxonomies for the U.S. Government
- OWL Listing
- lt?xml version"1.0"?gt ltrdfRDF xmlnsrdf"http//w
ww.w3.org/1999/02/22-rdf-syntax-ns"
xmlnsxsd"http//www.w3.org/2001/XMLSchema"
xmlnsrdfs"http//www.w3.org/2000/01/rdf-schema"
xmlnsowl"http//www.w3.org/2002/07/owl"
xmlnsdaml"http//www.daml.org/2001/03/damloil"
xmlns"http//www.owl-ontologies.com/unnamed.owl
" xmlnsdc"http//purl.org/dc/elements/1.1/"
xmlbase"http//www.owl-ontologies.com/unnamed.ow
l"gt ltowlOntology rdfabout""/gt ltowlClass
rdfID"Transportation"/gt ltowlClass
rdfID"AirVehicle"gt ltrdfssubClassOf
rdfresource"Transportation"/gt lt/owlClassgt
ltowlClass rdfabout"GroundVehicle"gt
ltrdfssubClassOf rdfresource"Transportation"/gt
lt/owlClassgt ltowlClass rdfabout"Automobile"gt
ltrdfssubClassOfgt ltowlClass rdfID"GroundVehicle
"/gt lt/rdfssubClassOfgt Etc.
Transportation Class Hierarchy
Source Formal Taxonomies for the U.S.
Government, Michael Daconta, Metadata Program
Manager, US Department of Homeland Security,
XML.Com, http//www.xml.com/pub/a/2005/01/26/formt
ax.html
33Data architecture can implemented in
ontology-driven information systems
- Additional Demos
- Building Enterprise Architecture through Web
Services for Remote Portlets (WSRP) in a Sample
EPA Regional Portal (Oracle 10g) - Use of Common Alerting Protocol (CAP)
- For example Real-time water quality monitoring
data from Florida. - GeoResponse.com
- Voice GIS Multimodal Notification,
Distributed XML Web Services - For example January 6, 2005, Norfolk Southern
Graniteville Train Derailment Scenario. - Emergency Response Architecture
- Semantic Web Applications for National Security
(SWANS) Conference, April 7-8, 2005, Trade Show
Demonstration.
34Appendix on Indirection Abstraction
- Ontology and ontology patterns are the applied
use of long-time, fundamental engineering
patterns of indirection and abstraction. - Chapter 7 in Adaptive Information Improving
Business Through Semantic Interoperability, Grid
Computing, and Enterprise Integration, Pollock
and Hodgson, Wiley Inter-science, 2004.
35Appendix on Indirection Abstraction
- Selected tidbits
- Ontology is simply the enabler for software
engineers and architects to apply core problem
solving patterns in new and innovative ways. - Indirection is a concept that is used to plan for
future uncertainty. - Simply put, indirection is when two things need
to be coupled, but instead of coupling them
directly, a third thing is used to mediate
direct, brittle connections between them. - By leveraging indirection in the fundamental
aspects of the technology, semantic
interoperability is built for change, and this
built-in flexibility differentiates semantic
technologies from other information-driven
approaches.
36Appendix on Indirection Abstraction
- Architects of both software and physical
structures routinely use the principle of
abstraction to isolate complex components and
reduce the scope of a problem to be solved (see
the forest for the trees). By definition,
ontology is abstraction and is the ultimate
abstraction tool for information. - Example Imagine a scenario of using a pivot data
model without abstraction it would require the
aggregation of all of the data elements in a
particular community the result could be the a
community of 500 applications, each application
with approximately 100 data elements, requiring a
pivot model with about 50,000 data elements an
abstracted model could conceivably be capable of
representing this information in far fewer than
about 100 data elements! - See Demonstrations of SICoP Pilot Projects for
EPA Managers, August 16, 2004, Semantic
Information Management (Unicorn) Integrating
Health and Environmental Information to Protect
American Children, at http//web-services.gov