Transcriptional Diagnosis by Bayesian Network - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Transcriptional Diagnosis by Bayesian Network

Description:

A nonlinear, concave discriminative surface. 15. Harvard Medical School ... A dumbbell discriminative surface achieves 80% classification accuracy. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 17
Provided by: hsun6
Category:

less

Transcript and Presenter's Notes

Title: Transcriptional Diagnosis by Bayesian Network


1
Transcriptional Diagnosis by Bayesian Network
  • Hsun-Hsien Chang and Marco F. Ramoni

Childrens Hospital Informatics
Program Harvard-MIT Division of Health Sciences
and Technology Harvard Medical School March 17,
2009
2
Background
  • Microarray technology enables profiling
    expression of thousands of genes in parallel on a
    single chip.
  • Comparative analysis of gene expression across
    tissue states extracts signature genes for
    disease diagnosis.
  • Challenge
  • Number of variables (i.e., genes) is much greater
    than the number observations (i.e., biological
    samples), inducing the problem of overfitting.
  • Existing methods
  • Gene selection compute statistics (eg.,
    t-statistics, SNR, PCA) of individual genes and
    select high rank genes.
  • Classification model create a classification
    function of selected genes.

3
Proposed Approach
  • Issues
  • Assumption on gene independencies is inadequate.
  • Other genes may be collinearly expressed with the
    signature.
  • Selection and classification are two
    non-integrated steps. Need a cut-off threshold to
    select high rank genes.
  • Proposed strategies
  • Adopt system biology approach to infer the
    functional dependence among genes.
  • Use the dependence network for tissue
    discrimination.
  • Integrate gene selection and classification model
    in Bayesian network framework.

4
Data Representation by Bayesian Network
  • Bayesian networks are directed acyclic graphs
    where
  • Node corresponds to random variables.
  • Directed arcs encode conditional probabilities of
    the target nodes on the source nodes.

5
Gene Selection by Bayes Factor
6
Collinearity Elimination via Network Learning
7
Sample Classification
  • The phenotype variable is independent of the blue
    genes, given the green genes.
  • Technically, the green genes are under the Markov
    blanket of the phenotype variable, and they are
    the signature genes used for phenotype
    determination.
  • Tissue classification

8
Algorithm Summary
Gene Selection by Bayes Factor
Collinearity Elimination
Sample Classification
(sensitivity analysis)
9
Discriminate Lung Carcinoma Subtypes
  • Adenocarcinoma (AC) and squamous cell carcinoma
    (SCC) are major subtypes of lung cancer
  • AC and SCC are distinct in survival, chances of
    metastasis, and responses to chemotherapy and
    targeted therapy.
  • Physicians lack confidence in correct recognition
    when there are multiple primary carcinomas.
  • Training
  • 58 ACs and 53 SCCs.
  • 77 genes selected in the network.
  • 25 signature genes.

10
Bayesian Network for Lung Carcinoma
11
Large-Scale Testing on Independent Samples
  • 422 samples (232 ACs and 190 SCCs) aggregated
    from 7 cohorts (including Caucasians,
    African-Americans, Chinese).
  • Accuracy 95.2 AUROC.

12
Comparisons with Other Popular Methods
  • Higher classification accuracy.
  • Small-sized signature to avoid overfitting.

13
KRT6 Family Characterizes the Lung Carcinoma
Discrimination
14
KRT6 Family Characterizes the Lung Carcinoma
Discrimination
  • Keratin-6 family genes (KRT6A, KRT6B, KRT6C) are
    important for distinguishing lung cancer subtypes.
  • Accounting for 95 of the accuracy of the whole
    25-gene signature.
  • Located on chromosome 12q12-q13.
  • A nonlinear, concave discriminative surface.

15
Verification by Chr12q12-q13 Aberrations
  • Investigate DNA copy number changes in
    comparative genomic hybridization (CGH) array.
  • 12 ACs and 13 SCCs from Vrije University Medical
    Center, Netherland.
  • A dumbbell discriminative surface achieves 80
    classification accuracy.
  • Treat average CGH values of genes occupying q12,
    q13, and q12-13 respectively as three features to
    construct a Naïve Bayes Classifier.

16
Conclusion
  • Reverse engineer regulatory network information
    for tissue classification.
  • Adopt the system biology approach to infer gene
    dependencies network.
  • Select genes by Bayes factor.
  • Eliminate collinearity via network learning.
  • Integrate gene selection and classification model
    in a single Bayesian network framework.
  • Demonstrate the promising translational value of
    the system biology approach in clinical study.
Write a Comment
User Comments (0)
About PowerShow.com