Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain

1 / 16
About This Presentation
Title:

Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain

Description:

Ravindra N. Chittimoori, Diane J. Cook, Lawrence B. Holder ... Needs to identify relationships between the molecular structure and the toxicity ... –

Number of Views:29
Avg rating:3.0/5.0
Slides: 17
Provided by: sxs3
Category:

less

Transcript and Presenter's Notes

Title: Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain


1
Applying the SUBDUE Substructure Discovery System
to the Chemical Toxicity Domain
  • Ravindra N. Chittimoori, Diane J. Cook,
  • Lawrence B. Holder
  • Department of Computer Science and Engineering
  • University of Texas at Arlington
  • http//cygnus.uta.edu/subdue/

2
Motivation and Goal
  • Ever-increasing number of chemical compounds in
    use today (100,000).
  • Needs to identify relationships between the
    molecular structure and the toxicity of a
    chemical compound.
  • Apply knowledge discovery to the U.S. National
    Toxicology Program (NTP) to identify such
    relationships.

3
Knowledge Discovery in SUBDUE
  • Structural discovery system
  • Graph-based input representation
  • Beam search through substructure (subgraph) space
  • Graph compression heuristic based on minimum
    description length
  • Inexact, polynomial graph match

4
SUBDUE Example
5
Chemical Toxicity Domain
  • Database of 367 chemicals
  • Levels of evidence assigned by NTP
  • CE clear evidence of cancerous activity
  • SE some evidence
  • E equivocal evidence
  • NE no evidence

6
Predictive Toxicology Evaluation
  • Predictive Toxicology Evaluation (PTE) challenge
  • PTE-2 ended November 1998
  • http//dir.niehs.nih.gov/dirlecm/pte2.htm
  • PTE-3 scheduled for July 1999 - July 2000

7
Chemical Toxicity Data
  • Atoms (name, type, partial charge)
  • Bonds (type)
  • Chemical groups
  • Alcohol, amine, amino, benzene, ester, ether,
    ketone, methanol, methyl, nitro, phenol and
    sulfide

8
Chemical Toxicity Data
  • Carcinogenicity-related tests
  • Ames
  • Chromex
  • Chromaberr
  • Drosophilia
  • Mouse-Lymph
  • Salmonella Assay

9
Chemical Compound Representation
10
Input Representation
  • Sample Atomic Structure
  • SUDBUE graph input

C
H
1
v 1 atom v 2 C v 3 atom v 4 H d 1 2
name d 3 4 name u 1 3 1
11
Methodology
  • Training set further divided into learning and
    testing sets
  • Find best substructures in learning-set positives
    not prevalent in negatives
  • Find occurrences of substructure in testing

12
Results
10
3
0.062
0.057
c
br
t
p
t
p
n
n
atom
atom
1
  • Learning set 268
  • Positive compounds 134/143
  • Negative compounds 24/125
  • Testing set 30
  • Positive compounds 15/19
  • Negative compounds 4/11

13
Results
1
10
32
0.34
0.211
h
?0.778
t
p
c
n
t
p
t
p
1
n
atom
1
atom
n
atom
n
1
0.36
h
1
t
p
n
atom
  • Learning set 268
  • Positive compounds 60/143
  • Negative compounds 0/125
  • Testing set 30
  • Positive compounds 8/19
  • Negative compounds 0/11

14
Discussion
  • Consistent with results obtained by ILP system
    PROGOL (Srinivasan et al., ILP-97).
  • Groups discovered by SUBDUE (e.g., Amino) are
    unique substructures found only in compounds
    which test positive on carcinogenicity.

15
Conclusion
  • SUBDUE has the ability to discover interesting
    patterns (substructures) that might be helpful in
    predicting carcinogenicity.
  • SUBDUE is suitable for knowledge discovery in the
    chemical toxicity domain.

16
Future Research
  • Applying concept-learning SUBDUE to the chemical
    toxicity database
  • Find substructures compressing positive graph,
    but not negative graph
  • Incorporate more domain knowledge
  • PTE-3 challenge (July 1999)
Write a Comment
User Comments (0)
About PowerShow.com