Title: NCI caDSR
1NCI caDSR Semantic IntegrationPresented
toLawrence Berkeley National Labs
- Denise Warzel
- Associate Director, caDSR
- NCICB
- March 14, 2005
2Presentation Outline
- Putting NCIs semantic integration into context
- Driving factors behind NCIs metadata repository
- NCI Metadata repository (caDSR) role in caCORE
- caDSR Semantic Integration
- Role of concept mapping
- Metadata repository vs Vocabulary Services
- Concept linkage in caDSR
- UML Class diagrams represented in caDSR metadata
- caDSR tooling (if time)
- Is this what you want to hear?
- Priority?
3Credits
- NCICB
- Peter Covitz
- Denise Warzel
- Oracle
- Edmond Mulaire
- Ram Chilukuri
- Prerna Aggarwal
- Dan Ladino
- Christophe Ludet
- Shaji Kakkodi
- Jane Jiang
- ScenPro
- Bill McCurry
- Tom Phillips
- Robert Harding
- Jennifer Brush
- Larry Hebel
- Smita Hastak
- ISO
- ISO/IEC 11179 Parts 1-6
4Current User Base
- Cancer Biomedical Informatics Grid (caBIG)
820/466/180/ 61 - Center for Cancer Research (CCR) 821/573/506/
12 - Clinical Data Interchange Standard Consortium
(CDISC) - 3/0 - Center for Cancer Imaging (CIP) - 238/151/148/ 2
- Cancer Therapy Evaluation Program (CTEP)
8029/2432/2428/ .1 - Division of Cancer Prevention (DCP)
427/321/286/ 11 - National Heart Lung and Blood Institute (NHLBI)
0/0 - Early Detection Research Network (EDRN)
121/1/1/ 100 - Divisions of Population Sciences and Cancer
Control (PS CC) 85/9 - Specialized Programs of Research Excellence
(SPOREs) 719/197/120/ 39 - Cancer Ontologic Research Environment (caCORE)
1028/810/810 0 - Total CDEs in this Context / Released
workflow status / Released and developed by
this context / Reused from other contexts -
5NCIs Semantic Integrationcontext
- Sharable data, aggregatable across research
domains - Unambiguous data characteristics
- to convey semantic, syntactic and lexical meaning
- Human and Machine understandable
- EMPHASIS ON MACHINE UNDERSTANDABLE
- Tools to create, maintain, deploy data standards
- Widely and publicly accessible
- Self-harmonizing
6caCORE Components
- caCORE is the open-source foundation upon which
the NCICB builds its research information
management systems
Bioinformatics Objects
Data Standards
Enterprise Vocabulary
7EVS and caDSR Distinctions
- caDSR is a metadata repository
- maintains metadata to permit a user to locate the
correct defining characteristics of a piece of
datum, an instance of a specific concept, in
sufficient detail collected and stored on a
computer - EVS is a terminology server
- provides services for synonymy, mapping between
vocabularies, hierarchical structures,
subconcepts, superconcepts, broader, narrower,
roles, semantic type, etc.
8caCORE Infrastructure wiring
9Why ISO/IEC 11179?
- What is this datum?
- Provides concrete guidance on the creation and
maintenance of discrete data element attributes
and metadata (semantics) enabling the formulation
of data elements in a consistent, standard manner
- Metadata Repository/Registry
- Framework for data element standardization and
registration allows the creation of a shared data
environment in much less time and with much less
effort than it takes using conventional data
management methodologies. - Adoption of 11179 Allowed us to Get on with it
10Why ISO/IEC 11179?
- Using this framework
- what is it?
- how do I want to display it?
- categorize it?
- message it?
- where is it used and by whom?
- what is its history? (lifecycle management)
11ISO/IEC 11179 Administered Item Administration
Record and Common Attributes
- Unique Identifier
- Data id version
- (all NCI contents shares common RAI)
- Administrative Status
- Workflow status
- Registration Status
- Creation Date
- Administrative Note(s)
- Effective Date
- Change Date(s)
- Change Description(s)
- Origin
- Until Date
- Created By
- Modified By
- Name(s)
- Definition(s)
- Stewardship Information
- Submitter Information
- Reference Document(s)
- Classifications
12ISO/IEC 11179 and Extensions
Form
Concept Class
The Concept Class (coming in new 11179
specification) Provides Semantic Linkage
Derivation_Rule
13Why vocabularies/ontology important?
- Goal Semantically unambiguous,
interoperability - For Humans
- Words could be enough within a specific context
or domain where common lexicon is already used - For Machines
- Words are not immutable, absent a specific
context, difficult or impossible to ensure
consistent and repeatable
14Implementation
- Are names and words for definitions enough to
create unambiguous, interoperable,
self-harmonizing metadata? - No
- Within different domains same words mean
different things - site trial agent
- Synonyms?
- Phraseology?
- Not computable
- Was ISO/IEC 11179 flawed?
- No, not if you have a central body creating
metadata. - We needed to support simultaneous development of
data elements in different research domains that
could be harmonized later with minimal effort. - Draw from standard terminologies ? incorporate
into a Cancer terminology
15Challenges
- Data Element curators are not necessarily
vocabulary experts - ISO/IEC 11179 provides the framework
- But how to make it something that could be self
harmonizing and computed without a human having
to read and interpret definitions?
16The Solution?
- Leverage EVS
- Separate the curation of concepts from the
curation of ISO/IEC 11179 metadata - Leverage semantics of ISO/IEC 11179
- Start with the building blocks of Administered
items - Link to controlled vocabulary in the form of
concept codes - During metadata curation
- right place
- right time
- Naming and defining
- Applying naming conventions to build up the
subsequent components
17Summary caDSR Semantic Integration
Conceptual Domain Agent
Object Agent
Valid Values Cyclooxygenase Inhibitor Doxercalcife
rol Eflornithine Ursodiol
Data Element Concept Chemopreventive Agent
Value Domain Chemopreventive Agent Name
Classification Schemes caDSRTraining
Property Chemopreventive
Representation Name
Data Element Chemopreventive Agent Name
Context caCORE
183.0 caDSR Implementation
- Enhance Semantic Integration
- Concept Class enabled and concept relationship to
data model - Replace Alternate Names concept linkage
- Add rule for linking concepts together
- Order of concepts conveys semantic meaning
- Add concept linkage to support Value Domains
- Referenced Parent Concept non-enumerated
- Changes to UML to caDSR mapping
- Changes to UML Loader
19UML Classes as ISO/IEC 11179 Metadata
20Workflow and Tools
5. Post Load Curation
1. Create UML Diagram with EA or similar UML
tool.
2. Export to XMI.
- Create appropriate conceptual domains
- Create enumerated value domains
3. Run Semantic Connector
4. Run UML Loader
21UML Loader Mapping
UML Model
caDSR Metadata
Data
Element
Data
Value
Element
Domain
Concept
Property
Property
EVS
EVS
Concept(s)
Concept(s)
22Mapping UML Models to caDSR
caDSR
UML Model
Data Element
Value Domain
Concept
Data Element Concept
Permissible Value/Meaning
Object Class
(
associated to
Class
C12345)
Property
(
associated to
C54321)
23UML Domain Model Example
24Gene Class in Detail
25Gene Class in Detail
- Class Concept Tagged Values
- ConceptCode C16612
- ConceptPreferredName Gene
- ConceptDefinition The functional ...
- ConceptDefinitionSource NCI-GLOSS
26Gene Class in Detail
- Class Concept Tagged Values
- ConceptCode C16612
- ConceptPreferredName Gene
- ConceptDefinition The functional ...
- ConceptDefinitionSource NCI-GLOSS
- Attribute Concept Tagged Values
- ConceptCode Cxxxxxx
- ConceptPreferredName OMIMId
- ConceptDefinition The identifier
- ConceptDefinitionSource NCI-GLOSS
- Attribute Concept Tagged Values
- ConceptCode Cxxxxxx
- ConceptPreferredName Symbol
- ConceptDefinition An arbitrary sign...
- ConceptDefinitionSource NCI-GLOSS
27Concept Mapping
- Concepts are created if they do not already exist
- If a Concept exists but with a different
definition source, an alternate definition is
created for that concept.
28Model metadata and Classification Schemes
- UML domain model mapped to a classification
scheme (CS) (type Project) - Versioning, lifecycle statuses, reference
doucments, etc. - A UML domain model could optionally be organized
into multiple packages (CSI) (type UML Package) - Each package may correspond to a sub-project
- UML Loader can be configured to create a CSI
corresponding to each package in the UML domain
model
29Semantic Self-Harmonization
- Concept code and order are compared to determine
whether or not two entities are equivalent - Reuse registered by Classifications
- Concept codes can be used to search caDSR for
content with relationships at any level of
ISO/IEC 11179 metamodel - Object Class, Property, Value domain, Value
Meaning, etc.
30Challenges
- Vocabulary shifts
- merged/split, more granularity, new terms
- Jan. 2005
- Primary Concept Breast Cancer Ccode1
- Qualifier Concept Lobular Ccode2
- March 2005
- Lobular Breast Cancer Ccode3
- ??
- Approach
- caDSR metadata maintenance
- Lexical and concept code
31Introduction to caDSR Tools
- CDE Browser to Search for and Download
- Form Builder to Create user specified collections
of CDEs - Side-by-Side Compare
- CDE Curation Tool to Create Data Elements
- Admin Tool to Curate and Administer caDSR -
Power Users - Sentinel Tool (3.0)
- Generates end user Alerts triggered by metadata
changes - Batch Load to import Administered Items
- Excel Loader (MS Excel)
- UML Loader (XMI)
- Case Report Form Loader (MS Excel)
Access, Develop, Manage, Consume
32CDE Browser
CONTEXT Browsing
- View, Search, Download
- Shopping cart feature
- FormBuilder to Build / Download Forms and Data
Elements - Context Browsing Tree
- By Classification Schemes
- By Forms
- CDE Basic Search Criteria
- Google-like search
- Sortable search results by clicking on column
headings
Basic Search
33CDE Browser
- Advanced Search Criteria
- Leverages ISO attributes
- Find all with 18254-3 permissible value
- Find all with Gene
- Find all with Released workflow status
- Find all with Standard Registration status
- Etc.
Advanced Search
34Form Builder
- Create and Manage Forms
- Organize CDEs into modules within a Form
- Attach pdf or word format
- Classify Forms into groupings for specific end
user communities - Publish Un-Publish for Browser Catalog
visibility - Printer Friendly version
- Download CDEs
35CDE Side-by-Side Compare
- CDE Side-by-Side Compare
- Build shopping cart, compare CDE metadata side by
side - Download to excel spreadsheet
36Curation Tool
- To Create, Edit or Version
- Data Element Concepts
- Value Domains
- Data Elements
- ISO 11179 Wizard
- Construct ISO compliant Data Elements by building
up the pieces - Builds Names and Definitions from underlying
components. - Get Associated
- Leverage ISO to retrieve related CDEs
- Block Edit
- shopping cart
- Assign classification schemes
- Versioning
37Administration Tool
- System Administration
- User Accounts and Security
- Lists of Values (LOVs) used in content creation
- Create Framework
- Conceptual Domains
- Classification Schemes (basis for organizing CDEs
in Browser) - Protocols
38Sentinel Tool
- Create Alerts
- User defined triggers based on data element
metadata attributes - notify me of any change to the Value Domain for
any CDE on the Adverse Event Form - Generates and emails a report of changes matching
Alert criteria
39Batch Loading
- Excel Loaders
- Formatted MS Worksheet
- Administered Item
- Form
- UML Loader
- XMI representation of a UML Class Diagram
- Class ?Object Class
- Attribute ?Property
- Data Element Concept, Value Domain and Data
Element derived from the above
40Exploring
- National Institute of Neurological Disorders and
Stroke (NINDS) - National Icelandic Center for Oncology
- Cancergrid UK
- Canadian Center for Health Informatics