Part II: Discriminative Margin Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Part II: Discriminative Margin Clustering

Description:

Expression pattern of genes related to characteristics of tissue type ... Some of the most effective treatments for breast cancers ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 19
Provided by: kame51
Category:

less

Transcript and Presenter's Notes

Title: Part II: Discriminative Margin Clustering


1
Part IIDiscriminative Margin Clustering
  • Joint work with
  • Rob Tibshirani, Dept of Statistics
  • Patrick O. Brown, School of Medicine
  • Stanford University

2
Gene Expression
  • Micro-array technology
  • Find expression values of all genes in a tissue
  • Expression pattern of genes related to
    characteristics of tissue type
  • Gene expression is combinatorial
  • Many factors need to combine for expression of a
    gene
  • Combinations of expressions lead to certain
    phenotypes
  • Poorly understood

3
Feature Sets for Tumors
  • Set of genes with higher expression in a cancer
    type compared to every normal tissue type in the
    body
  • Combinatorial gene expression signature
  • Potential use in diagnostics and drug treatments
  • If these genes encode cell surface proteins
  • can target them using antibodies
  • Kills tumor cells
  • Does not harm normal cells

4
Feature Set Definition
Convex combination of genes which gives maximum
separation in expression values
Constraint w1w2 1
w1xw2y
Expression value for Gene y
Tumor t
Margin m
Around 100 samples
Normal Set N
Expression Value for Gene x
5
Computing the Feature Set
Definition naturally extends to collections of
tumor samples
6
Example
Gene T N1 N2
g1 100 50 10
g2 100 10 50
w1g1w2g2 100 30 30
w1 0.5
w2 0.5
Margin 100 30 70
7
Contrast with Previous Work
  • Previous work focused just on classifiers
  • Separating tumor class from corresponding normal
    class
  • Separating tumor from all other tumor tissues
  • Linear and quadratic Support Vector Machines
  • Brown et al. , Moler et al. , Ramaswamy et
    al. , Su et al., Grate et al.
  • Problem Many cancers have poorly understood
    subtypes
  • We focus on two combined aspects
  • Classifiers separating tumor from all normal
    tissue classes
  • Clustering tumors based on this paradigm of
    separation

8
Traditional Clustering
  • Cluster tissues based on similarity of gene
    expression patterns
  • Similar tissues have correlated gene expressions
  • Eisen, et al. PNAS 1998
  • Problem Genes driving the clustering
  • Large classes of genes that are all regulated
    together
  • Cell cycle and cell proliferation
  • Protein biosynthesis and cell growth
  • Respiration
  • We need to weight these gene classes appropriately

9
Our Results
  • Feature sets for tumor samples very small
  • Picks only one from a correlated set of genes
  • Genes with different functions expressed in
    different normal tissues
  • Hierarchically cluster tumor samples
  • Similarity metric for two tumor sets Combined
    Margin
  • Tumor samples with similar feature sets group
    together
  • Identify natural clusters of tumor samples
  • Construct feature sets for each cluster
  • Biological significance

10
Clustering Hardness
  • Given
  • Set of n tumors
  • Margin M
  • Find largest tumor subset with margin ? M
  • Problem is n1-? hard to approximate
  • Reduction from maximum clique problem

11
Clustering Algorithm
G
F
m2
m1
H
Gene y
E
Tumors
Margin m2
A
A
B
D
C
G
H
F
E
D
B
C
Margin m1
Normal
Gene x
12
Cluster Boundaries
  • Each node in tree labeled with combined margin of
    tumor samples in sub-tree
  • Margin reduces as we move up the tree
  • Chop tree at a chosen margin cut-off
  • Sub-trees are the clusters
  • Breast cancer samples group into three clusters
  • ERBB2 (ERBB2 and GRB7)
  • Luminal A type (ESR1, NAT1 and GATA3)
  • Basal cell type(?) (Keratin, Fibrillin and
    Fibronectin)

13
Properties of Feature Sets
  • Feature set for a tumor cluster
  • Has at most 20 genes
  • Most of the weight concentrated on a few genes

Tumor Cluster Genes Fraction of weight
ERBB2 Breast ERBB2 65
Luminal A Breast ESR1, NAT1, GATA3 55
Prostate sub-type AMACR 40
Ovarian sub-type MSLN, PAX8, COL1A2 65
14
Quality of Clustering
  • Random partitioning of tumor samples
  • Divide tumor samples randomly into training and
    test groups
  • Cluster training group
  • Find cluster with best feature set margin for
    test sample
  • Label the sample with the tumor type for that
    cluster
  • Classifies unknown tumor samples accurately
  • At least 75 accuracy in categorizing test
    samples
  • At least 90 accuracy for CNS, Breast, Kidney,
    Ovary and Prostate cancers

15
Discussion
  • Small feature sets for a tumor class
  • Based only on discriminating it versus normal
    tissues
  • Property Also discriminates it from other tumor
    classes
  • Highly expressed genes unique to the tumor class
  • Biological validation of our method
  • ERBB2 and ESR1 can be targeted by monoclonal
    antibodies
  • Some of the most effective treatments for breast
    cancers
  • AMACR is recently recognized prostate cancer
    marker
  • Function not very well understood
  • MSLN is a well studied ovarian cancer marker

16
Expanding Feature Sets
  • Consider weighted combinations which have close
    to optimal margin
  • Let optimal margin M
  • P(?) Polytope of feature sets with margin ? M
    - ?
  • Find weight vector with min Euclidean norm in
    P(?)
  • Intuition
  • Manhattan norm of any weight vector 1
  • Minimizing Euclidean norm spreads the weights
  • Around 100 genes in feature set

17
Genes in Larger Feature Sets
  • Genes with similar expression patterns
  • Example ERBB2 and GRB7
  • Genes expressed across cancer types
  • Not very strongly expressed
  • Do not drive the clustering
  • Example Proliferation and cell cycle related
    genes
  • C20ORF1, CENPF, NUF2R, TOPK, L2DTL, KNSL1,
  • Example Possible alterations to chromosome 22
  • PRAME

18
Future Work
  • Identify cell surface proteins in feature sets
  • Possible use in chemotherapy and diagnostics
  • Findings for Ovarian and Pancreatic cancers being
    tested in the laboratory
  • Identify genes highly expressed across cancer
    types
  • Examples TFAP2A, ADAM12 and LOX
  • Biological significance?
  • Succinct representations for biological
    functions
  • Examples Cell cycle, respiration,
  • Applications in clustering and modeling gene
    expression
Write a Comment
User Comments (0)
About PowerShow.com