NCI caDSR - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

NCI caDSR

Description:

Presented to. Lawrence Berkeley National Labs ... Tom Phillips. Robert Harding. Jennifer Brush. Larry Hebel. Smita Hastak. ISO. ISO/IEC 11179 Parts 1-6 ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 41
Provided by: Ani546
Category:
Tags: nci | cadsr

less

Transcript and Presenter's Notes

Title: NCI caDSR


1
NCI caDSR Semantic IntegrationPresented
toLawrence Berkeley National Labs
  • Denise Warzel
  • Associate Director, caDSR
  • NCICB
  • March 14, 2005

2
Presentation Outline
  • Putting NCIs semantic integration into context
  • Driving factors behind NCIs metadata repository
  • NCI Metadata repository (caDSR) role in caCORE
  • caDSR Semantic Integration
  • Role of concept mapping
  • Metadata repository vs Vocabulary Services
  • Concept linkage in caDSR
  • UML Class diagrams represented in caDSR metadata
  • caDSR tooling (if time)
  • Is this what you want to hear?
  • Priority?

3
Credits
  • NCICB
  • Peter Covitz
  • Denise Warzel
  • Oracle
  • Edmond Mulaire
  • Ram Chilukuri
  • Prerna Aggarwal
  • Dan Ladino
  • Christophe Ludet
  • Shaji Kakkodi
  • Jane Jiang
  • ScenPro
  • Bill McCurry
  • Tom Phillips
  • Robert Harding
  • Jennifer Brush
  • Larry Hebel
  • Smita Hastak
  • ISO
  • ISO/IEC 11179 Parts 1-6

4
Current User Base
  • Cancer Biomedical Informatics Grid (caBIG)
    820/466/180/ 61
  • Center for Cancer Research (CCR) 821/573/506/
    12
  • Clinical Data Interchange Standard Consortium
    (CDISC) - 3/0
  • Center for Cancer Imaging (CIP) - 238/151/148/ 2
  • Cancer Therapy Evaluation Program (CTEP)
    8029/2432/2428/ .1
  • Division of Cancer Prevention (DCP)
    427/321/286/ 11
  • National Heart Lung and Blood Institute (NHLBI)
    0/0
  • Early Detection Research Network (EDRN)
    121/1/1/ 100
  • Divisions of Population Sciences and Cancer
    Control (PS CC) 85/9
  • Specialized Programs of Research Excellence
    (SPOREs) 719/197/120/ 39
  • Cancer Ontologic Research Environment (caCORE)
    1028/810/810 0
  • Total CDEs in this Context / Released
    workflow status / Released and developed by
    this context / Reused from other contexts

5
NCIs Semantic Integrationcontext
  • Sharable data, aggregatable across research
    domains
  • Unambiguous data characteristics
  • to convey semantic, syntactic and lexical meaning
  • Human and Machine understandable
  • EMPHASIS ON MACHINE UNDERSTANDABLE
  • Tools to create, maintain, deploy data standards
  • Widely and publicly accessible
  • Self-harmonizing

6
caCORE Components
  • caCORE is the open-source foundation upon which
    the NCICB builds its research information
    management systems

Bioinformatics Objects
Data Standards
Enterprise Vocabulary
7
EVS and caDSR Distinctions
  • caDSR is a metadata repository
  • maintains metadata to permit a user to locate the
    correct defining characteristics of a piece of
    datum, an instance of a specific concept, in
    sufficient detail collected and stored on a
    computer
  • EVS is a terminology server
  • provides services for synonymy, mapping between
    vocabularies, hierarchical structures,
    subconcepts, superconcepts, broader, narrower,
    roles, semantic type, etc.

8
caCORE Infrastructure wiring
9
Why ISO/IEC 11179?
  • What is this datum?
  • Provides concrete guidance on the creation and
    maintenance of discrete data element attributes
    and metadata (semantics) enabling the formulation
    of data elements in a consistent, standard manner
  • Metadata Repository/Registry
  • Framework for data element standardization and
    registration allows the creation of a shared data
    environment in much less time and with much less
    effort than it takes using conventional data
    management methodologies.
  • Adoption of 11179 Allowed us to Get on with it

10
Why ISO/IEC 11179?
  • Using this framework
  • what is it?
  • how do I want to display it?
  • categorize it?
  • message it?
  • where is it used and by whom?
  • what is its history? (lifecycle management)

11
ISO/IEC 11179 Administered Item Administration
Record and Common Attributes
  • Unique Identifier
  • Data id version
  • (all NCI contents shares common RAI)
  • Administrative Status
  • Workflow status
  • Registration Status
  • Creation Date
  • Administrative Note(s)
  • Effective Date
  • Change Date(s)
  • Change Description(s)
  • Origin
  • Until Date
  • Created By
  • Modified By
  • Name(s)
  • Definition(s)
  • Stewardship Information
  • Submitter Information
  • Reference Document(s)
  • Classifications

12
ISO/IEC 11179 and Extensions
Form
Concept Class
The Concept Class (coming in new 11179
specification) Provides Semantic Linkage
Derivation_Rule
13
Why vocabularies/ontology important?
  • Goal Semantically unambiguous,
    interoperability
  • For Humans
  • Words could be enough within a specific context
    or domain where common lexicon is already used
  • For Machines
  • Words are not immutable, absent a specific
    context, difficult or impossible to ensure
    consistent and repeatable

14
Implementation
  • Are names and words for definitions enough to
    create unambiguous, interoperable,
    self-harmonizing metadata?
  • No
  • Within different domains same words mean
    different things
  • site trial agent
  • Synonyms?
  • Phraseology?
  • Not computable
  • Was ISO/IEC 11179 flawed?
  • No, not if you have a central body creating
    metadata.
  • We needed to support simultaneous development of
    data elements in different research domains that
    could be harmonized later with minimal effort.
  • Draw from standard terminologies ? incorporate
    into a Cancer terminology

15
Challenges
  • Data Element curators are not necessarily
    vocabulary experts
  • ISO/IEC 11179 provides the framework
  • But how to make it something that could be self
    harmonizing and computed without a human having
    to read and interpret definitions?

16
The Solution?
  • Leverage EVS
  • Separate the curation of concepts from the
    curation of ISO/IEC 11179 metadata
  • Leverage semantics of ISO/IEC 11179
  • Start with the building blocks of Administered
    items
  • Link to controlled vocabulary in the form of
    concept codes
  • During metadata curation
  • right place
  • right time
  • Naming and defining
  • Applying naming conventions to build up the
    subsequent components

17
Summary caDSR Semantic Integration
Conceptual Domain Agent
Object Agent
Valid Values Cyclooxygenase Inhibitor Doxercalcife
rol Eflornithine Ursodiol
Data Element Concept Chemopreventive Agent
Value Domain Chemopreventive Agent Name
Classification Schemes caDSRTraining
Property Chemopreventive
Representation Name
Data Element Chemopreventive Agent Name
Context caCORE
18
3.0 caDSR Implementation
  • Enhance Semantic Integration
  • Concept Class enabled and concept relationship to
    data model
  • Replace Alternate Names concept linkage
  • Add rule for linking concepts together
  • Order of concepts conveys semantic meaning
  • Add concept linkage to support Value Domains
  • Referenced Parent Concept non-enumerated
  • Changes to UML to caDSR mapping
  • Changes to UML Loader

19
UML Classes as ISO/IEC 11179 Metadata
20
Workflow and Tools
5. Post Load Curation
1. Create UML Diagram with EA or similar UML
tool.
2. Export to XMI.
  • Create appropriate conceptual domains
  • Create enumerated value domains

3. Run Semantic Connector
4. Run UML Loader
21
UML Loader Mapping
UML Model
caDSR Metadata
Data
Element
Data
Value
Element
Domain
Concept
Property
Property
EVS
EVS
Concept(s)
Concept(s)
22
Mapping UML Models to caDSR
caDSR
UML Model
Data Element
Value Domain
Concept
Data Element Concept
Permissible Value/Meaning
Object Class
(
associated to
Class
C12345)
Property
(
associated to
C54321)
23
UML Domain Model Example
24
Gene Class in Detail
25
Gene Class in Detail
  • Class Concept Tagged Values
  • ConceptCode C16612
  • ConceptPreferredName Gene
  • ConceptDefinition The functional ...
  • ConceptDefinitionSource NCI-GLOSS

26
Gene Class in Detail
  • Class Concept Tagged Values
  • ConceptCode C16612
  • ConceptPreferredName Gene
  • ConceptDefinition The functional ...
  • ConceptDefinitionSource NCI-GLOSS
  • Attribute Concept Tagged Values
  • ConceptCode Cxxxxxx
  • ConceptPreferredName OMIMId
  • ConceptDefinition The identifier
  • ConceptDefinitionSource NCI-GLOSS
  • Attribute Concept Tagged Values
  • ConceptCode Cxxxxxx
  • ConceptPreferredName Symbol
  • ConceptDefinition An arbitrary sign...
  • ConceptDefinitionSource NCI-GLOSS

27
Concept Mapping
  • Concepts are created if they do not already exist
  • If a Concept exists but with a different
    definition source, an alternate definition is
    created for that concept.

28
Model metadata and Classification Schemes
  • UML domain model mapped to a classification
    scheme (CS) (type Project)
  • Versioning, lifecycle statuses, reference
    doucments, etc.
  • A UML domain model could optionally be organized
    into multiple packages (CSI) (type UML Package)
  • Each package may correspond to a sub-project
  • UML Loader can be configured to create a CSI
    corresponding to each package in the UML domain
    model

29
Semantic Self-Harmonization
  • Concept code and order are compared to determine
    whether or not two entities are equivalent
  • Reuse registered by Classifications
  • Concept codes can be used to search caDSR for
    content with relationships at any level of
    ISO/IEC 11179 metamodel
  • Object Class, Property, Value domain, Value
    Meaning, etc.

30
Challenges
  • Vocabulary shifts
  • merged/split, more granularity, new terms
  • Jan. 2005
  • Primary Concept Breast Cancer Ccode1
  • Qualifier Concept Lobular Ccode2
  • March 2005
  • Lobular Breast Cancer Ccode3
  • ??
  • Approach
  • caDSR metadata maintenance
  • Lexical and concept code

31
Introduction to caDSR Tools
  • CDE Browser to Search for and Download
  • Form Builder to Create user specified collections
    of CDEs
  • Side-by-Side Compare
  • CDE Curation Tool to Create Data Elements
  • Admin Tool to Curate and Administer caDSR -
    Power Users
  • Sentinel Tool (3.0)
  • Generates end user Alerts triggered by metadata
    changes
  • Batch Load to import Administered Items
  • Excel Loader (MS Excel)
  • UML Loader (XMI)
  • Case Report Form Loader (MS Excel)

Access, Develop, Manage, Consume
32
CDE Browser
CONTEXT Browsing
  • View, Search, Download
  • Shopping cart feature
  • FormBuilder to Build / Download Forms and Data
    Elements
  • Context Browsing Tree
  • By Classification Schemes
  • By Forms
  • CDE Basic Search Criteria
  • Google-like search
  • Sortable search results by clicking on column
    headings

Basic Search
33
CDE Browser
  • Advanced Search Criteria
  • Leverages ISO attributes
  • Find all with 18254-3 permissible value
  • Find all with Gene
  • Find all with Released workflow status
  • Find all with Standard Registration status
  • Etc.

Advanced Search
34
Form Builder
  • Create and Manage Forms
  • Organize CDEs into modules within a Form
  • Attach pdf or word format
  • Classify Forms into groupings for specific end
    user communities
  • Publish Un-Publish for Browser Catalog
    visibility
  • Printer Friendly version
  • Download CDEs

35
CDE Side-by-Side Compare
  • CDE Side-by-Side Compare
  • Build shopping cart, compare CDE metadata side by
    side
  • Download to excel spreadsheet

36
Curation Tool
  • To Create, Edit or Version
  • Data Element Concepts
  • Value Domains
  • Data Elements
  • ISO 11179 Wizard
  • Construct ISO compliant Data Elements by building
    up the pieces
  • Builds Names and Definitions from underlying
    components.
  • Get Associated
  • Leverage ISO to retrieve related CDEs
  • Block Edit
  • shopping cart
  • Assign classification schemes
  • Versioning

37
Administration Tool
  • System Administration
  • User Accounts and Security
  • Lists of Values (LOVs) used in content creation
  • Create Framework
  • Conceptual Domains
  • Classification Schemes (basis for organizing CDEs
    in Browser)
  • Protocols

38
Sentinel Tool
  • Create Alerts
  • User defined triggers based on data element
    metadata attributes
  • notify me of any change to the Value Domain for
    any CDE on the Adverse Event Form
  • Generates and emails a report of changes matching
    Alert criteria

39
Batch Loading
  • Excel Loaders
  • Formatted MS Worksheet
  • Administered Item
  • Form
  • UML Loader
  • XMI representation of a UML Class Diagram
  • Class ?Object Class
  • Attribute ?Property
  • Data Element Concept, Value Domain and Data
    Element derived from the above

40
Exploring
  • National Institute of Neurological Disorders and
    Stroke (NINDS)
  • National Icelandic Center for Oncology
  • Cancergrid UK
  • Canadian Center for Health Informatics
Write a Comment
User Comments (0)
About PowerShow.com