Biological Data Mining - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Biological Data Mining

Description:

Biological Data Mining. A comparison of Neural ... Dr Shuang Cang (Mar - Sept 2000) Dr Abul Azad (Jan 2001 - ) Dr Antony Browne, London Guildhall University. ... – PowerPoint PPT presentation

Number of Views:330
Avg rating:3.0/5.0
Slides: 12
Provided by: Salt86
Category:
Tags: biological | cang | data | mining

less

Transcript and Presenter's Notes

Title: Biological Data Mining


1
Biological Data Mining
  • A comparison of Neural Network and Symbolic
    Techniques
  • http//www.cmd.port.ac.uk/biomine/

2
People
  • Centre for Molecular Design, University of
    Portsmouth
  • Professor Martyn Ford
  • Dr David Whitley
  • Dr Shuang Cang (Mar - Sept 2000)
  • Dr Abul Azad (Jan 2001 - )
  • Dr Antony Browne, London Guildhall University.
  • Professor Philip Picton, University College
    Northampton.

3
1. Objectives
  • The project aims
  • to develop and validate techniques for extracting
    explicit information from bioinformatic data
  • to express this information as logical rules and
    decision trees
  • to apply these new procedures to a range of
    scientific problems related to bioinformatics and
    cheminformatics

4
2. Methods for Extracting Information
  • Artificial Neural Networks
  • good predictive accuracy
  • hard to decipher
  • often regarded as black boxes
  • Decision Trees
  • symbolic rules easier to interpret
  • more likely to reveal relationships in the data
  • allow behaviour of individual cases to be
    explained

5
3. Extracting Decision Trees
  • The Trepan procedure (Craven,1996) extracts
    decision trees from a neural network and a set of
    training cases by recursively partitioning the
    input space.
  • The decision tree is built in a best-first
    manner, expanding the tree at nodes where there
    is greatest potential for increasing the fidelity
    of the tree to the network.

6
4. Splitting Tests
  • The splitting tests at the nodes are m-of-n
    expressions, e.g. 2-of-x1, x2, x3, where the
    xi are Boolean conditions.
  • Start with a set of candidate tests
  • binary tests on each value for nominal features
  • binary tests on thresholds for real-valued
    features
  • Use a beam search with a beam width of two.
  • Initialize the beam with the candidate test that
    maximizes the information gain.

7
5. Splitting Tests (II)
  • To each m-of-n test in the beam and each
    candidate test, apply two operators
  • m-of-n1 e.g. 2-of-x1, x2 gt 2-of-x1, x2, x3
  • m1-of-n1 e.g. 2-of-x1, x2 gt 3-of-x1, x2,
    x3
  • Admit new tests to the beam if they increase the
    information gain and are significantly different
    (chi-squared) from existing tests.

8
6. Example Substance P Binding to NK1 Receptors
  • Substance P is a neuropeptide with amino acid
    sequence
  • H-Arg-Pro-Lys-Pro-Gln-Gln-Phe-Phe-Gly-Leu-Met-NH2
  • Wang et al. (1993) used the multipin technique to
    synthesize 512 29 stereoisomers generated by
    systematic replacement of L- by D-amino acids at
    9 positions, and measured binding potencies to
    central NK1 receptors.
  • The objective was to identify the positions at
    which stereo-chemistry affects binding strength.

9
7. Application of Trepan
  • A series of networks with 991 architectures
    were trained using 90 of the data as a training
    set.
  • For each network a decision tree was grown using
    Trepan.
  • The positions identified agree with the FIRM
    (Formal Inference-based Recursive Modelling)
    analysis of Young and Hawkins (1999).

10
8. A Typical Trepan Tree
11
9. Future Work
  • Complete the implementation of the Trepan
    algorithm.
  • model the distribution of the input data and
    generate from this a set of query instances that
    are classified using the network and used as
    additional training cases during extraction of
    the tree.
  • Extend the algorithm to enable the extraction of
    regression trees.
  • Provide a Bayesian formulation for the decision
    tree extraction algorithm.
  • Compare the performance of these algorithms with
    existing symbolic data mining techniques
    (ID3/C5).
  • Apply Trepan to ligand-receptor binding problems.
Write a Comment
User Comments (0)
About PowerShow.com