Adding Value While Having Fun With EPA Data - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Adding Value While Having Fun With EPA Data

Description:

Enterprise Architecture Team, Office of Environmental Information ... Enterprise Architecture: Enterprise: A Star Trek Spaceship. Architecture: Blueprints ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 37
Provided by: brandn
Category:
Tags: epa | adding | data | enterprise | fun | having | star | trek | value

less

Transcript and Presenter's Notes

Title: Adding Value While Having Fun With EPA Data


1
Adding Value While Having Fun With EPA Data!
  • Brand Niemann
  • Co-Chair, Semantic Interoperability Community of
    Practice (SICoP)
  • Best Practices Committee (BPC), CIO Council, and
  • Enterprise Architecture Team, Office of
    Environmental Information
  • U.S. Environmental Protection Agency
  • February 23, 2005, and March 2, 2005

2
Data Interoperability Paradigm Shifts
  • Organizational
  • The National Infrastructure for Community
    Statistics (NICS) Community of Practice (CoP)
    wants to make its data NICS Ready by publishing
    it to the Web in such a way that others can
    easily reuse it! (Like buying a new TV that is
    HDTV Ready.)
  • Technical
  • The conceptualization of new technical systems
    suffers from technological presbyopia the
    condition of being able to envisage things more
    clearly the farther they are from the present
    realization even though the prospective users
    may grow weary and skeptical while waiting for
    the future to arrive.
  • Semantic
  • Ontology and ontology patterns are the applied
    use of two basic tenets of software design and
    architecture, indirection and abstraction (see
    Appendix).

3
Overview
  • Data is
  • Data can be re-purposed
  • Data can be mined
  • Data can be modeled
  • Data can be integrated published
  • Data standards can evolve
  • Data architecture can be implemented in
    ontology-driven information systems
  • Appendix on Indirection Abstraction

This is a semantic approach!
4
Data is
  • Unstructured
  • Example Web pages
  • Semi-structured
  • Example Documents (with Table of Contents)
  • Structured
  • Example Databases (tables)
  • Demonstrations in the next section.
  • Notes
  • This is part of the new Data Reference Model
    (DRM) work.
  • But this was there in XML from the beginning -
    everything is a document (all three types).

5
Data can be re-purposed
  • Unstructured to Semi-structured
  • e.g. Region 4 2004 Press Releases.
  • Semi-structured to Structured
  • e.g. Region 4 Strategic Plan 2004.
  • Structured to Even More Structured
  • e.g. TRI data table.
  • Demonstrations of Each from the Region 4 Pilot.
  • Notes
  • We Can Add Metadata and Interoperability At the
    Same Time!
  • We Can Even Add Semantic Metadata and
    Interoperability!
  • This changes the way we look at portals, document
    and content management systems, registries and
    repositories there really is no difference
    re-purposed data serves all these functions!

6
Data can be mined
  • The TRI database is about 8 GB and requires
    industrial-strength tools and analyses for data
    mining, indexing, conversion to XML, and storage
    and retrieval with XML Web Services. This pilot
    demonstrated that large EPA databases can be
    data mined and repurposed into XML
    repositories.
  • February 25-26, 2003, Data Mining Technology for
    Military and Government Applications Conference,
    XML Web Services for Data Mining and Repository
    US EPA Toxics Release Inventory, Brand Niemann,
    US EPA, and Data Mining Technology, Jim Walters,
    Insightful Corporation.
  • April 9, 2004, Presentations and Demonstrations
    to the EPA Business Intelligence and Analytics
    (BIA) User Group Meeting.

7
Data can be mined
http//www.insightful.com/
8
Data can be mined
I-Miner on TRI 2000 Public Release Data
Pipeline Architecture and Visual Workflow
9
Data can be mined
I-Miner on TRI 2000 Public Release Data
Histogram Plots
Rationale Toxic chemical releases to different
media should be correlated outliers suggest
need to follow up with reporting facilities.
10
Data can be modeled
  • EPA Region 4 Pilot
  • Phase 1
  • October 1, 2004, Implementing a Service-Oriented
    Architecture Pilot Project Design and Initial
    Results.
  • October 19 and 25, 2004, Building Enterprise
    Architecture through Web Services for Remote
    Portlets (WSRP) in a Sample EPA Regional Portal,
    Rex Brooks, OASIS, and Ali Naizi, Oracle.
  • November 18, 2004, Connecting the Dots CAP-WSRP
    Presented at the XML 2004 Conference as Part of
    the OASIS Interoperability Demos That Showed the
    Integration of Web Services for Remote Portlets
    with the Common Access Protocol and Including
    GeoResponse for Use in the Federal Region 4
    Semantic Interoperability Pilot.
  • Phase 2
  • February 10, 2005, Semantic Interoperability
    Community of Practice Enablement (SCOPE) for the
    EPA Region 4 Pilot.

11
Current Ontology for Region 4 Pilot
12
Data can be modeled
  • The Roadmap
  • Re-purpose the 2004 EPA Region 4 Press Releases
    (unstructured to semi-structured) and extract the
    semantic metadata on word usage and frequency and
    use at the top of document.
  • Creates a Semantic Web Services node that can be
    integrated with other EPA nodes and Non-EPA
    nodes!
  • Compare to what we have now with the EPA Web site
    and the Exchange Network.
  • Need ontologies to supplement their topologies
    (topics and Web Services nodes, respectively).

13
Data can be modeled
Demo
Ontology-based search by month, geography, topic,
water semantics.
14
Data can be modeled
Note that search can be focused by both node
hierarchy and/or semantics.
15
Data can be modeled
Ontology (left) is designed by access to the
semantics (right) which is updated automatically
with the addition of new content (not like ETL).
16
Data can be modeled
EPA Web Site
The results is 42 documents out of 533 which then
have to be read and digested!
17
Data can be integrated published
  • LandView 6 is integration of
  • Census 2000
  • EPA EnviroFacts
  • USGS Geographic Names Information (GNIS)
    features and
  • FGDC Standard Metadata.
  • LandView is publishing with
  • Interagency collaboration at very low cost
  • Public domain software that is easy to use
  • Database and software component reuse by design
    and
  • Geo-processing (e.g. population estimation) and
    Web-services out of the box (FileMaker Pro)
    (see next slide).

18
Data can be integrated published
Population Estimator
LandView Home Screen
http//www.census.gov/geo/landview/
19
Data can be integrated published
http//www.census.gov/geo/landview/lv6help/pop_est
imate.html
20
Data standards can evolve
  • ISO 11179
  • EPA Date
  • The Date Data Standard provides for a standard
    representation of calendar date in data files for
    data interchange.
  • Suggested Upper Merged Ontology (SUMO)
  • Date
  • According to WordNet, the noun "date" has 8 sense
    (s) (see next slide).
  • SUMO is written in the SUO-KIF language
    (declarative semantics and machine processible)
    which has been translated to OWL Web Ontology
    Language.
  • See http//www.ontologyportal.org/

21
Data standards can evolve
  • Date
  • 1. The specified day of the month "what is the
    date today?".
  • 2. A particular day specified as the time
    something will happen "the date of the election
    is set by law".
  • 3. A meeting arranged in advance "she asked how
    to avoid kissing at the end of a date".
  • 4. A particular but unspecified point in time
    "they hoped to get together at an early date".
  • The present "they are up to date" "we haven't
    heard from them to date".
  • 5. The present "they are up to date" "we
    haven't heard from them to date".
  • 6. A participant in a date "his date never
    stopped talking".
  • 7. The particular year (usually according to the
    Gregorian calendar) that an event occurred "he
    tried to memorizes all the dates for his history
    class".
  • 8. Sweet edible fruit of the date palm with a
    single long woody seed.

22
A Bit of Semantic Humor
  • Enterprise Architecture
  • Enterprise A Star Trek Spaceship
  • Architecture Blueprints
  • So, Blueprints of the Spaceship Enterprise!

23
Data architecture can implemented in
ontology-driven information systems
  • Ontology-Driven Information Systems
  • Methodology Side the adoption of a highly
    interdisciplinary approach
  • Analyze the structure at a high level of
    generality.
  • Formulate a clear and rigorous vocabulary.
  • Architectural Side the central role in the main
    components of an information system
  • Information resources.
  • User interfaces.
  • Application programs.

See for example Nicola Guarino, Formal Ontology
and Information Systems, Proceedings of FOIS 98,
Trento, Italy, 6-8 June 1998.
24
Data architecture can implemented in
ontology-driven information systems
  • The Roadmap
  • Basic Requirements For an Ontology.
  • Generic Process.
  • Process for the Indicator Pilot.
  • Explanation of the Ontology.
  • Schematic of the Ontology.
  • Demonstration.
  • Semantic Technology Profiles for the Data
    Reference Model (DRM).

25
Data architecture can implemented in
ontology-driven information systems
  • Basic Requirements For an Ontology
  • 1. Finite controlled (extensible) vocabulary.
  • 2. Unambiguous interpretation of classes and term
    relationships.
  • 3. Strict hierarchical subclass relationships
    between classes.
  • 4. Few others

Source Deborah McGuiness, Ontologies Come of
Age, in the Semantic Web Why, What, and How, MIT
Press, 2002, page 6.
26
Data architecture can implemented in
ontology-driven information systems
  • Generic Process
  • 1. Identify Scope
  • 2. Review Existing Ontologies
  • 3. Knowledge Acquisition
  • 4. Conceptualization
  • 5. Encode
  • 6. Test
  • 7. Iterate on Steps 3-6.

Source Kathy Lesh, Standards Vocabularies in
Health Care, Kevric, Collaborative Expedition
Workshop 37, NSF, December 9, 2004.
27
Data architecture can be implemented in
ontology-driven information systems
  • Process for the Pilot
  • 1. Identify Scope Indicators.
  • 2. Review Existing Ontologies SUMO-WordNet.
  • 3. Knowledge Acquisition Recent GAO Report
    Considered to Be An Excellent Starting Point.
  • 4. Conceptualization See Next Two Slides.
  • 5. Encode Added to Semantic View of Structured
    Document (See Slide 30).
  • 6. Test Inviting Feedback.

Source DRAFT National Infrastructure for
Community Statistics (NICS) Initial Ontology for
Structuring the Community of Practice and Its
Business Case and Products, By Brand Niemann,
January 20, 2005.
Informing Our Nation Improving How to
Understand and Assess the USAs Position and
Progress, GAO, Report to the Chairman,
Subcommittee on Science, Technology, and Space,
Committee on Commerce, Science, and
Transportation, U.S. Senate, November 2004,
GAO-05-1.
28
Data architecture can implemented in
ontology-driven information systems
  • Explanation of the Ontology
  • Key Terminology
  • Source Summary-Background Footnotes
  • Topics
  • Source Figure 1 and Appendix I
  • Level of Jurisdiction
  • Source Table 4 of Comprehensive Systems
  • Organizational Types
  • Source Table 10 of All Systems Studied
    Footnotes

Note Meets McGuiness Requirements in Slide 25.
29
Data architecture can implemented in
ontology-driven information systems
Schematic of the Ontology
Indicators
Topics
Organizations
Jurisdictions
Publicly led
U.S. local/regional level
The Economy
Privately led
U.S. state level
Society Culture
Led by public-private partnership
National level outside the United States
The Environment
Supranational level
Cross-Cutting
Note that each of these classes can and do have
multiple instances underneath them, etc.
30
Data architecture can implemented in
ontology-driven information systems
Demo
See Pilot Projects at http//web-services.gov
Note the folder names are either the ontology or
the knowledgebase instances.
31
Data architecture can implemented in
ontology-driven information systems
  • Semantic Technology Profiles for the Data
    Reference Model (DRM)
  • Region 4 (See Slide 33).
  • Enterprise Architecture FEA-Reference Model
    Ontology (FEA-RMO).
  • Taxonomies Formal Ontologies (Michael Daconta
    Recently Published Paper See Next Slide).
  • Indicators See previous.
  • Community Statistics In process with NICS.
  • FHA/NHIN In process with Ontolog Forum.
  • ISO 11179 In process with DHS, xmdr.org,
    Ontolog Forum, etc.
  • More to be announced as part of the SICoP Module
    3 White Paper Development Implementing the
    Semantic Web.

32
Formal Taxonomies for the U.S. Government
  • OWL Listing
  • lt?xml version"1.0"?gt ltrdfRDF xmlnsrdf"http//w
    ww.w3.org/1999/02/22-rdf-syntax-ns"
    xmlnsxsd"http//www.w3.org/2001/XMLSchema"
    xmlnsrdfs"http//www.w3.org/2000/01/rdf-schema"
    xmlnsowl"http//www.w3.org/2002/07/owl"
    xmlnsdaml"http//www.daml.org/2001/03/damloil"
    xmlns"http//www.owl-ontologies.com/unnamed.owl
    " xmlnsdc"http//purl.org/dc/elements/1.1/"
    xmlbase"http//www.owl-ontologies.com/unnamed.ow
    l"gt ltowlOntology rdfabout""/gt ltowlClass
    rdfID"Transportation"/gt ltowlClass
    rdfID"AirVehicle"gt ltrdfssubClassOf
    rdfresource"Transportation"/gt lt/owlClassgt
    ltowlClass rdfabout"GroundVehicle"gt
    ltrdfssubClassOf rdfresource"Transportation"/gt
    lt/owlClassgt ltowlClass rdfabout"Automobile"gt
    ltrdfssubClassOfgt ltowlClass rdfID"GroundVehicle
    "/gt lt/rdfssubClassOfgt Etc.

Transportation Class Hierarchy
Source Formal Taxonomies for the U.S.
Government, Michael Daconta, Metadata Program
Manager, US Department of Homeland Security,
XML.Com, http//www.xml.com/pub/a/2005/01/26/formt
ax.html
33
Data architecture can implemented in
ontology-driven information systems
  • Additional Demos
  • Building Enterprise Architecture through Web
    Services for Remote Portlets (WSRP) in a Sample
    EPA Regional Portal (Oracle 10g)
  • Use of Common Alerting Protocol (CAP)
  • For example Real-time water quality monitoring
    data from Florida.
  • GeoResponse.com
  • Voice GIS Multimodal Notification,
    Distributed XML Web Services
  • For example January 6, 2005, Norfolk Southern
    Graniteville Train Derailment Scenario.
  • Emergency Response Architecture
  • Semantic Web Applications for National Security
    (SWANS) Conference, April 7-8, 2005, Trade Show
    Demonstration.

34
Appendix on Indirection Abstraction
  • Ontology and ontology patterns are the applied
    use of long-time, fundamental engineering
    patterns of indirection and abstraction.
  • Chapter 7 in Adaptive Information Improving
    Business Through Semantic Interoperability, Grid
    Computing, and Enterprise Integration, Pollock
    and Hodgson, Wiley Inter-science, 2004.

35
Appendix on Indirection Abstraction
  • Selected tidbits
  • Ontology is simply the enabler for software
    engineers and architects to apply core problem
    solving patterns in new and innovative ways.
  • Indirection is a concept that is used to plan for
    future uncertainty.
  • Simply put, indirection is when two things need
    to be coupled, but instead of coupling them
    directly, a third thing is used to mediate
    direct, brittle connections between them.
  • By leveraging indirection in the fundamental
    aspects of the technology, semantic
    interoperability is built for change, and this
    built-in flexibility differentiates semantic
    technologies from other information-driven
    approaches.

36
Appendix on Indirection Abstraction
  • Architects of both software and physical
    structures routinely use the principle of
    abstraction to isolate complex components and
    reduce the scope of a problem to be solved (see
    the forest for the trees). By definition,
    ontology is abstraction and is the ultimate
    abstraction tool for information.
  • Example Imagine a scenario of using a pivot data
    model without abstraction it would require the
    aggregation of all of the data elements in a
    particular community the result could be the a
    community of 500 applications, each application
    with approximately 100 data elements, requiring a
    pivot model with about 50,000 data elements an
    abstracted model could conceivably be capable of
    representing this information in far fewer than
    about 100 data elements!
  • See Demonstrations of SICoP Pilot Projects for
    EPA Managers, August 16, 2004, Semantic
    Information Management (Unicorn) Integrating
    Health and Environmental Information to Protect
    American Children, at http//web-services.gov
Write a Comment
User Comments (0)
About PowerShow.com