Title: A1258689817HFYIM
1The Semantic Web And Health Information Systems
Parsa Mirhaji, MD The University of Texas Health
Science Center at Houston Robert Coyne,
PhD TopQuadrant Inc.
SICoP Conference 2 (April 25 2007)
2Information Integration Dilemma
Semantic Drift and Schema Change
3Translational Bio-Informatics
4Public Health Preparedness
5Context is important
6State of the art
7Situation Awareness Reference Architecture
(SARA) Framework for Design and Evaluation of
Public Health Preparedness Systems
8Dimensions of SARA
9Dimensions of SARA
10Services and Ontology ModelsBuilding Blocks of
Future Systems
11SAPPHIRESituation Awareness and Preparedness for
Public Health Incidents using Reasoning Engines
12SAPPHIRE high level model
13 14Data Sources - 1
- Triage Data
- Patient Demographics (Age, Ethnicity, Gender)
- Vital Signs (T, RR, PR, PO2)
- Chief Complaints
- Nurse Notes
- Vital Signs,
- Complete Review of Systems General, Respiratory,
Neurological, Gastrointestinal, Dermatological,
etc - Past Medical and Surgical HX
- Medications, Past Medications, Home Medications
- Interventions, Procedures
- Outcome
- Discharge and Disposition
- Past Medical and Surgical HX
- From 8 community hospitals and 16 different IT
implementations - Structured, semi-structured, non structured
entries - Automated submissions through HTTP
- Accounts for about 30 Houston ED visits
- Data transmission every 10 minutes or less
- Over 250,000 concepts, 82 million instances and
growing
Data Sources - 2
15(No Transcript)
16Data Sources - 2
- Texas Commission for Environmental Quality (TCEQ)
- - Pollution Parameters
- CO,SO2,H2S,NO, NO2, O3, TNMOC, CH4, ...
- - Meteorological Parameters
- Temperature (Outdoor , Dew Point)
- Relative Humidity,
- Radiation (Solar, Ultraviolet, Net Radiation)
- Barometric Pressure,
- Precipitation,
- - Chromatography Data
- Ethane, Methylcyclopentane, 1,2,4-Trimethylbenzene
, Ethylene, 2,4-Dimethylpentane
- From 18 locations 2 sensors each
- Data Transmission from TCEQ hourly
- 250 concepts on each message
- Air Quality indices calculated twice daily
17- Data Provisioning Layer
- Data Collection and Refinement Layer
18Automated ontology learning
XML Data
XML Data
XML Data
19Clinical Text Understanding
- A generalizable and extensible method of clinical
text understanding - To extract what is important from the non
structured clinical text - Signs, Symptoms, Disease or Illnesses, Procedures
and Medical Interventions, Findings - To represent relevant context according to the
whole text Anatomic, Patho-physiological
Context, Chronicity (Chronic, Acute Problem),
Quantities, Qualities (Large, Solid, Severe),
Temporal Aspects (For how long, since when?),
Modifiers (Negation, Uncertainties etc),
Presenter (who says so), Causative Context
(Social, Physical etc)
20Semantic Structure
21Survey On Demand System SODS
- SODS enables ontology driven and just in time
survey design and implementation - Deploys surveys on all platforms, online or
offline, wired or wireless - Integrates automatically all information from
manual entries to automated submissions - Integrates all information collection activities
under one integrative approach
22Other integration models
- Ontology for clinical information and EHR
- Ontology for Environmental Safety and Protection
- LOINC and NNDS ontology
- An ontology visualization and navigation
- PHIN-LDM ontology
- UMLS Semantic Net An OWL Translation
- UMLS Vocabulary Services Using Web Services
- MMTX Text to UMLS Web Services
Preliminary Studies
23- Data Provisioning Layer
- Data Refinement Layer
- Classification Layer
24ILI- Logical Model
Modifier
Quantifier
25- Data Provisioning Layer
- Data Refinement Layer
- Classification Layer
- Signal and Cluster Detection
26Ontology-enabled Processing for Analytical
Layers - OPAL
- Multidimensional analytics and data mining for
Semantic Web infrastructure - Abstraction layer with strong semantics for data
warehousing - Enables ontological modeling for OLAP cubes
27OPAL Reference Architecture
External data feeds
Fact Store
COPLAO
inferences
RDF
RDB
OPAL Reasoner
RDF Archive
Query Agents
Future extension
28Hurricane Katrina Relief Efforts at Houston
Manvel, TX
User Profiler
UTHSC
UTHSC- BSIRC
Integrator
NLP, Term Resolver
- UT-Clinics
- Houston-DHHS
- GR Brown
Governance, Authentication
Information Acquisition
HTML entry
29SAPPHIRE Implementation
We Are Here
30Implementation Platform
- 1- TopBraid Composer as Ontology Management Tool
- 4- Oracle 11g Beta 4 (Linux) as Semantic
Repository - 2- Jena from HP as API for Semantic Web
- 3- Eclipse Java Development Environment
- 5- Pellet and Jena OWL Micro Reasoner
- 6- Services Oriented Architecture
- 7- Microsoft SQL Server 2005 XML archive and
Analysis Services - 8- IBM Dual Xeon 2.8GH/3GB RAM Blade Server
- 9- EqualLogic iSCSI SAN (4 TB)
- 10- GB Ethernet LAN
31Challenges
- State of the frameworks
- Maturity of Tools
- Knowledge Engineering and Ontology Development
- Reasoning and Rules Support
- Scalability and Performance
Academic-Industrial Partnership
32Outline
University of Texas Health Science Center and
TopQuadrant
- Challenges for large scale semantic web systems
- Practicing effective Academic Industry
partnership - Sampling of ways we are cooperatively addressing
key challenges - Forging shared understanding / value propositions
with stakeholders -- Solution Envisioning - Using Reference Architectures and Capability
Models to inform ontology architectures and
staged initiatives - Tackling scalability and performance
Engineering inferencing and rules
33Adoption of Semantic Technology
Knowledge/Experience
Adoption
Current State
Confidence in ability to implement and scale
Advocacy
2005
Positive experiences of the power of RDF/OWL
Enthusiasm
2003
People are now asking How questions as opposed
to Why and What.
Curiosity
2002
Skepticism
Increase in attendance at trainings and more
evidence of coverage at conferences
Commitment to Action
34Ontology Engineering Lifecycle
Stakeholder Analysis
Scenarios
Creating
Capabilities
Competency Questions
Evolving/
Model Architecture
Maintaining
Populating
Knowledge Sources
Validating
Deploying
Solution Development
Competency Questions
35Perspective from Recent IEEE Article
- Building ontologies is inherently a social
process constrained by technical, social,
economic and legal bottlenecks. That means
that researchers must bring the same interest
they do to purely technical issues to addressing
the other challenges reality imposes on ontology
projects."
From "Possible Ontologies, How Reality
Constrains the Development of Relevant
Ontologies", Martin Hepp, Digital Enterprise
Research Institute, University of Innsbruck, IEEE
Internet Computing, 1089-7801/07, Editor Charles
Petrie, petrie_at_nrc.standford.edu
36TopSAIL Workproduct Dependency Map
Situation Modeling
Capability Analysis
Capability Modeling
Ontology Analysis
Ontology Modeling
Ontology Architecture
Ontology Environment
Stakeholder Model
Capability Model
Capability Architecture
Enterprise Situation Model
Ontology Patterns
Ontology Schemas
Knowledge Map
Systems in Context
Ontology Content
Ontology Modeling Guidance
Knowledge Sources
Domain Exemplars
Controlled Vocabulary
TopSAIL TopQuadrants Semantic Application
Integrated Lifecycle method
37Forging Shared Understanding and Value
PropositionsSolution Envisioning Artifacts
from SARA/SAPPHIRE and OPAL Projects
University of Texas Health Science Center and
TopQuadrant
38Public Health Preparedness (PHP) Value Net
39(No Transcript)
40WP Reference Architecture, Capability Model and
Ontology Listing
41Capability Maps for Value PropositionsBuilding
Lines of Reasoning
Capabilities serve as enablers that overcome
Barriers and Challenges to attain desired
Results and Outcomes
42Shared Line of Reasoning Illustrative Thread
Capability OPAL (motivation for and goal of)
Challenge Integrating and enhancinganalytics
services seamlessly with semantic-driven
applications
isOvercomeBy
creates
Barrier Conventional analytic capabilities are
inadequate for semantic-driven systems,
especially with respect to scalability
Force Domains/applications exist that require
integration of complex and massive amounts of
data coupled with analytics processing where the
analytics dimensions cannot all be foreseen
upfront or are constantly changing.
encounters
43OLAP As-Is Process Challenges
Process cycle is human labor intensive and
requires multiple transformations of intent and
meaning (which is not explicitly represented)
Currently, no direct connection of this process
to ontology-based systems is possible
Opportunity for improvement (technology
possibility) utilize ontology to model this
part of the process.
Changing business needs or changing data requires
going through the whole cycle again.
The OLAP Cube is overloaded to perform not only
its operational role, but to represent the design
rationale for itself
An analytics model typically must become more
elaborated over time with multiple manual steps.
44From As-Is ?? To-BeOPAL To-Be Process Benefits
Direct connection to ontology-enabled
applications is possible
Business needs change
45Use of Reference Architectures and Capability
Models to Inform Ontology architecture and
Staging and Evolution of Deployed Solutions
University of Texas Health Science Center and
TopQuadrant
46Ontology Architecture
- Modular ontologies designed for reuse and
layering - Definition and scope of the models need to be
accessible to all interested parties
47OPAL must Support Specific Capabilities in SARA
48Federal Enterprise Architecture
Performance Reference Model (PRM)
- Government-wide Performance Measures Outcomes
- Line of Business-Specific Performance Measures
Outcomes
Business Reference Model (BRM)
- Lines of Business
- Agencies, Customers, Partners
Business-Driven Approach (Citizen-Centered Focus)
Service Component Reference Model (SRM)
Component-Based Architectures
- Service Layers, Service Types
- Components, Access and Delivery Channels
Technical Reference Model (TRM)
- Service Component Interfaces, Interoperability
- Technologies, Recommendations
Data Reference Model (DRM)
- Business-focused data standardization
- Cross-Agency Information exchanges
49Example of a RegistryShowing DOD extensions to
FEA
Agency-specific extensions shown green
Hot links to TRM areas
50(No Transcript)
51(No Transcript)
52PHIN Modeling Levels and Respective Roadmap of
Capabilities and Value Propositions
PHIN Preparedness Documentation
53Scalability and Performance Engineering
Inferencing and Rules
University of Texas Health Science Center and
TopQuadrant
54Example of a Hybrid Solution Strategy for
Engineering Inferences for Query Interactions
55Toward an integrated best of breed inference
platform for large scale Semantic web systems
- Goal develop a high-performance integrated
inferencing infrastructure for OWL/RDF and Rules. - Status The state of research at the moment is
ahead of the state of practice in inference
platform products. - Focus solve the engineering problems that stand
between the theoretically best known inference
solution and its implementation - Strategy integrated best of breed approach
56Summary Academic-Industry Partnership
- A large space of challenges exists for large
scale systems enabled by semantic web technology - Two key interrelated dimensions of challenges
are - Technological / Technical
- Social / Organizational
- Solution Envisioning practices and modular
ontology architectures (based on a Reference
Architecture) can help to mitigate these
challenges. - Partnership is needed to build scalable,
industrial strength semantic applications and
systems in complex domains.
57EXTRA
University of Texas Health Science Center and
TopQuadrant
58TopQuadrant Products and Solution Areas
Tools/Products TopBraid Composer and Ontology
Engineering Method TopSail IT Strategy nd
Design Method Solution Envisioning with
Capability Cases
Robert Coyne
- PhD in Computer Aided Design, Carnegie-Mellon U.
- Use Case author/trainer and OO Method expert in
IBMs Object Technology Practice, 1995-98 - CTO, Solution Technology International, 1998-2002
- Exec. Partner of TopQuadrant, 2002-Present
- Semantic Integration
- Search, Collaboration and e-Workspace
- Enterprise Architecture
- Semantic Web Services
59Scalability-Performance Problem Statement
- Most semantic technology products today are
either databases with inferencing capabilities
added on, or inference engines with data stores
added on. - For high-performance inferencing, there needs to
be an intimate relationship between these two
components. - For example, the data store must offer access
mechanisms tuned to the needs of the inference
engine. - The combined system must also have a synchronized
strategy for mediating between real time and
cached inferences. - By "high-performance", we are referring to three
types of measures, all of which interact. - size How many triples can the system store, and
make inferences on? - query speed How quickly can queries be answered?
How does this degrade with different levels of
inferencing capability? - throughput How many con-current requests can the
system handle at one time?
6010 directions for research (source Ralph Hodgson
circa Aug07)
- Multi-paradigm reasoning - datalog, temporal,
bayesian and other logics etc - Ontology modularity - how to do partial imports
for example - Ontology generation - how to build proto
ontologies - Structure/Semantic - preserving transformation
systems - aka OSERA -) - Information Flow Frameworks
- Explanation-Based Systems
- Ontology Merging Support
- Distributed Reasoning Systems
- Scalability and Performance
- Functional Programming and Ontologies - how to
put behavior into the mix
61Ontology Architecture Requirements Specification
(OARS)
- Ontology of Ontologies
- Stakeholders
- Systems
- Competency Questions
- Capability Questions
- Architecture Dependencies
- Ontology Reuse
62OARS ExampleSAPPHIRE Ontologies
63Government / Regulatory (from field work,
Candidate Forces gt Challenges gt Capabilities
gt Outcomes)
Capabilities Advisor for OMB-FEA Compliance
On-line Consultable Enterprise Architecture
64High-lighted line-of-reasoning from the
previous slide.
enables
enables
enables
On-line Consultable Enterprise Architecture
Capabilities Advisor for OMB-FEA Compliance
is overcome by
is overcome by
creates
creates
encounters
encounters
65Capability Case Capabilities Advisor for Federal Agencies
Intent To provide a system that can advise Federal agencies on who has or intends to have what capabilities in support of services within lines of business. Uses the FEA reference model to advise on capabilities that are available or are being built to support particular services and lines-of-business. By having consultable models of FEA, the system can make connections between requirements and capabilities and give advise based on inferences.
Solution Stories 1 (Summary)
1. eGov FEA-Based Capabilities and Partnering
Advisor for FEA-OMB Compliance
The FEA Capabilities Advisor uses inferencing
across FEA models and their linkages to
supporting portfolio management across agencies.
In any reuse initiative that attempts to save
money through collaboration, having timely and
accurate information is crucial for efficiency
and effectiveness. The Advisor enables an
up-to-date representation of the structure,
services and IT capabilities of government
agencies. This federated approach to IT Portfolio
Management can help to solve interoperability,
integration, capability reuse, accountability and
policy governance issues in and across agencies
66CATWOE for the OPAL Project
- Customers or Clients Anyone who wants to
integrate the use of semantic technology (for
solving complex problems) in conjunction with the
use of OLAP type of analytics processing (e.g.
Parsa, Jack Smith) - Actors or Agents
- Transformation Improved process cycle for
deploying analytics capabilities for
ontology-driven (semantic technology-enabled)
systems based on creative use of and without
modification of mature OLAP services. Adding a
complementary, ontology-driven abstraction layer
on top of existing analytics services. - Worldview There are classes of problems that
require the use of semantic technology (e.g., new
medical discoveries need a model of the system
that explicitly represents the semantics.) and
also require the use of sophisticated analytics
processing that cannot readily be achieved
through direct use of conventional analytics
services. - Owner Dr. Parsa Mirhaji, U of TX Health Science
Center, and his associated research sponsors. - Environment current state of the world where
semantic technology value propositions, solution
architectures, components and tools are still
emerging, and where OLAP technology, experience,
product offerings and solutions are mature.