Microarray Data Set - PowerPoint PPT Presentation

About This Presentation
Title:

Microarray Data Set

Description:

Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array. Characteristics of Microarray Data High dimensionality of gene ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 7
Provided by: buf64
Learn more at: https://cse.buffalo.edu
Category:
Tags: data | microarray | select | set

less

Transcript and Presenter's Notes

Title: Microarray Data Set


1
Microarray Data Set
  • The microarray data set we are dealing with is
    represented as a 2d numerical array.

2
Characteristics of Microarray Data
  • High dimensionality of gene space, low
    dimensionality of sample space.
  • Thousands to tens of thousands of genes, tens to
    hundreds of samples.
  • Features (genes) correlation.
  • Genes collaborate to function. Gene correlation
    characterizes how the system works.
  • A plethora of domain knowledge.
  • Tons of knowledge accumulated about genes in
    question.

3
Microarray Data Analysis
  • Analysis from two angles
  • sample as object, gene as attribute
  • gene as object, sample/condition as attribute

4
Supervised Analysis
  • Select training samples (hold out)
  • Sort genes (t-test, ranking)
  • Select informative genes (top 50 200)
  • Cluster based on informative genes

Class 1
Class 2
g1 g2 . . . . . . . g4131 g4132
1 1 1 0 0 0
1 1 1 0 0 0
1 1 1 0 0 0
g1 g2 . . . g4131 g4132
1 1 1 0 0 0
0 0 0 1 1 1
0 0 0 1 1 1
0 0 0 1 1 1
0 0 0 1 1 1
5
Phenotype Structure Mining
samples
4 5 6 7
8 9 10
1 2 3
  • An informative gene is a gene which manifests
    samples' phenotype distinction.
  • Phenotype structure sample partition
    informative genes.

6
Existing Feature Selection and Extraction
Algorithms
  • The characteristic of microarray data set makes
    feature selection a critical process.
  • Too many features, too few samples.
  • Existing feature selection/extraction algorithms
    include
  • Single gene based discriminative scores, such as
    t-test score, S2N, etc.
  • Redundancy removal based FSS algorithms.
  • General feature selection algorithms. (Relief
    family, Float selection, etc.).
  • General feature extraction algorithms PCA, SVD,
    FLD etc. Havent witnessed specific feature
    extraction algorithms.
Write a Comment
User Comments (0)
About PowerShow.com