Title: The NCI Thesaurus,
1 The NCI Thesaurus, Overview and Utilization
Gilberto Fragoso NCI Center for
Bioinformatics SOFG2, Oct 26, 2004
2 Enterprise Vocabulary Services
- Services and resources that address NCI's needs
for controlled vocabulary (http//ncicb.nci.nih.go
v/core/EVS) - An NCI collaboration
- NCI Office of Communications
- Cancer Information Products and Systems
- PDQ and Cancer.gov
- NCI Center for Bioinformatics
- caCORE
- Community portals
- Vocabulary Products
- NCI Thesaurus standalone terminology
- NCI Metathesaurus maps vocabularies
- External vocabularies
3NCI Thesaurus
- Reference Terminology for NCI
- Broad coverage of cancer domain
- Findings and Disorders
- Anatomy
- Drugs, chemicals
- Oncogenes, gene products, biological processes
- Cancer models - murine
- Research techniques, management
4NCI Thesaurus - Content
- 38,000 Concepts (115,000 terms)
- Partitioned in 20 Kinds ( disjoint classes)
- Hierarchically organized
- Polyhierarchy but only within kinds
- e.g. Neoplasms in the Findings_and_Disorders_Kind
- Properties state facts about a Concept
- Clarify meaning of concepts
- Application support
- Roles, binary relations between Concepts
5(No Transcript)
6NCI Thesaurus - Content
- 38,000 Concepts (115,000 terms)
- Partitioned in 20 Kinds ( disjoint classes)
- Hierarchically organized
- Polyhierarchy but only within kinds
- e.g. Neoplasms in the Findings_and_Disorders_Kind
- Properties state facts about a Concept
- Clarify meaning of concepts
- Application support
- Roles, binary relations between Concepts
7NCI Thesaurus - Properties
8NCI Thesaurus - Roles
- Roles, binary relations between Concepts
- Utilized by the DL classifier to determine
subsumption - Used to express semantic relations
- Role selection in the NCI Thesaurus
- Define the minimal number of Roles that will
allow concepts in a Kind to be declared defined - Specific roles to support dependent applications
- Domain knowledge e.g. Diseases
9Roles in the Disease Domain
10Roles in the NCI Thesaurus
ftp//ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusSem
antics/Tbox04Mar.png
11Various Kinds in the Thesaurus
ftp//ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusSem
antics/KindDefinitions.pdf
12Sample Disease Concept
Gastric Mucosa-Associated Lymphoid Tissue
Lymphoma  Molecular abnormalities Disease_May_H
ave_Cytogenetic_Abnormality Trisomy
3 Disease_May_Have_Cytogenetic_Abnormality
Trisomy 18 Role group 1 Disease_May_Have_Cytog
enetic_Abnormality t(1118)(q21q21) Disease_Ma
y_Have_Molecular_Abnormality AP12-MLT fusion
protein expression Histogenesis Disease_Has_Norm
al_Cell_Origin Post-germinal center marginal
zone B-lymphocyte  Pathology Disease_Has_Abnorm
al_Cell Centrocyte-like cell Disease_May_Have_Ab
normal_Cell Neoplastic monocytoid
B-lymphocyte Disease_May_Have_Abnormal_Cell
Neoplastic plasma cell Disease_May_Have_Finding
Lymphoepithelial lesion  Anatomy Disease_Has_Pr
imary_Anatomic_Site Stomach Disease_Has_Normal_T
issue_Origin Gut associated lymphoid
tissue  Clinical information Disease_May_Have_F
inding Indolent clinical course Disease_May_Have
_Associated_Disease Hepatitis C
13Retrieval of Indexed Documents
C1, C2, .., CX
User
Concepts used for retrieval
Search Engine
D1ltC1, C2gt
D2ltC1, C3, C4 gt
Relevant documents
Document
Indexing terms
14History-Enhanced Retrieval of Indexed Documents
Thesaurus version
pre-indexed documents
R1
Version 1
new
R2
Version 2
modify
R3
merge
Version 3
retire
split
R4
Version 4
Traverse the history net to find the right
indexing terms for retrieval
Search Engine
Concepts used for retrieval
15Representation of Edit Actions in History Table
16caCORE Infrastructure
- Vocabulary Services - NCI Thesarus - NCI
Metathesaurus - caBIO API access
http//ncicb.nci.nih.gov/core
17caCORE Infrastructure
- - Standard object models
- Uniform programmatic
- interface access
18caBIO Architecture
19caCORE Infrastructure
- - ISO 11179
- Metadata Registry
- CDEs
- UML Models
- Utilizes EVS
- caBIO API access
20(No Transcript)
21caCORE Infrastructure
CTEP DCP CIP SPORES
caImage caMod caArray
22WebTree App in caImage
23NCI Thesaurus Distribution Access
- Programmatic access via caBIO API
- Web browsable (http//nciterms.nci.nih.gov)
- Various file formats (ftp//ftp1.nci.nih.gov/pub/c
acore/EVS/) - Flat
- Ontylog xml (native)
- OWL Lite
24Terminology Development in Protégé/OWL?
- - Merge
- Split
- Retirement
- History
- Codes
- Search
- Workflow
25Additional Integration?
- Publication of ontologies in OWL
- Terminology
- Chute Solbrig (Mayo pers. comm.)
- Concept History
26EVS Team
- NCI Office of Communications
- Margaret Haber
- Larry Wright
- NCI Center for Bioinformatics
- Frank Hartel
- Sherri de Coronado
- Gilberto Fragoso
- Ken Buetow, NCICB Director
- Peter Covitz, caCORE Director
- Apelon, Inc. Northrup Grumman, Inc.
- Aspen, Inc. Kevric Corporation
- SAIC J. Oberthaler Consulting
- Protégé/SMI