Introduction%20to%20Microarrays - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction%20to%20Microarrays

Description:

Risk of over-fitting the data: may have a perfect discriminator for the data set ... Once a final model has been developed, the prediction rule is applied to the ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 29
Provided by: kellie3
Category:

less

Transcript and Presenter's Notes

Title: Introduction%20to%20Microarrays


1
Introduction to Microarrays
  • Kellie J. Archer, Ph.D.
  • Assistant Professor
  • Department of Biostatistics
  • kjarcher_at_vcu.edu

2
Microarrays
A snapshot that captures the activity pattern of
thousands of genes at once.  
Affymetrix GeneChip
Custom spotted arrays
3
Spotted Microarray Process
CTRL
TEST
4
Affymetrix GeneChip Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, fluorescently labeled DNA target
24µm
Oligonucleotide probe
1.28cm
Each probe cell or feature contains millions of
copies of a specific oligonucleotide probe
Over 250,000 different probes complementary to
genetic information of interest
Image of Hybridized Probe Array
BGT108_DukeUniv
5
Applications of microarrays
  • Cancer research Molecular characterization of
    tumors on a genomic scale more reliable
    diagnosis and effective treatment of cancer
  • Immunology Study of host genomic responses to
    bacterial infections
  • Model organisms Multifactorial experiments
    monitoring expression response to different
    treatments and doses, over time or in different
    cell types
  • etc.

6
Applications of Microarrays
  • Compare mRNA transcript levels in different type
    of cells, i.e., vary
  • Tissue (liver vs. brain)
  • Treatment (Drugs A, B, and C)
  • State (tumor vs. normal)
  • Organism (yeast, different strains)
  • Timepoint
  • etc.

7
(No Transcript)
8
Affymetrix Design
11 20 Probe Pairs interrogate each gene
PM
GCGCCGGCTGCAGGAGCAGGAGGAG
GCGCCGGCTGCACGAGCAGGAGGAG
MM
9
Image Analysis Pixel Level Data
6 x 6 matrix of pixels for each PM and MM
probe HG-U133A GeneChip
10
Expression Quantification
PM and MM intensities are combined to form an
expression measure for the probe set (gene)
PM
GCGCCGGCTGCAGGAGCAGGAGGAG
GCGCCGGCTGCACGAGCAGGAGGAG
MM
11
Expression Quantification
  • Initially, Affymetrix signal was calculated as

  • where j indexes
    the probe pairs for each probe set A. This is
    known as the Average Difference method.
  • Problems
  • Large variability in PM-MM
  • MM probes may be measuring signal for another
    gene/EST
  • PM-MM calculations are sometimes negative

12
Expression Quantification
  • The mean of a random variable X is a measure of
    central location of the density of X.
  • The variance of a random variable is a measure of
    spread or dispersion of the density of X.
  • Var(X)E(X-?)2 E(X2) - ?2
  • Standard deviation ?


13
Expression QuantificationIllustration Average
Difference.xls
14
(No Transcript)
15
Sources of Obscuring Variation in Microarray
Measurements
  • Sample handling (degree of physical manipulation,
    time from extripation to freezing)
  • Microarray manufacture
  • Sample processing (extraction procedure, RNA
    integrity purity, RNA labeling)
  • Processing differences (hybridization chambers,
    washing modules, scanners)
  • Personnel differences
  • Random differences in signal intensity in a data
    set which co vary with the biological process

16
Normalization
  • The purpose of normalization is to remove
    experimental artifacts of no direct interest,
    that is, the removal of systematic effects other
    than differential expression. Normalization
    procedures often include
  • background subtraction,
  • detection of outliers,
  • and removal of variation due to
  • differences in sample preparation,
  • array differences,
  • differences in dye labeling efficiencies,
  • and scanning differences.

17
16 Replicate HG-133A GeneChips, Before
normalization
18
16 Replicate HG-133A GeneChips, After
normalization
19
(No Transcript)
20
Taxonomy of Microarray Data Analysis Methods
  • Unsupervised Learning The statistical analysis
    seeks to find structure in the data without
    knowledge of class labels.
  • Supervised Learning Class or group labels are
    known a priori and the goal of the statistical
    analysis pertains to identifying differentially
    expressed genes (AKA feature selection) or
    identifying combinations of genes that are
    predictive of class or group membership.

21
Unsupervised Learning
  • Unsupervised learning or clustering involves the
    aggregation of samples into groups based on
    similarity of their respective expression
    patterns without knowledge of class labels.
  • Examples of Unsupervised Learning methods include
  • Hierarchical clustering
  • k-means
  • k-medoids
  • Self Organizing Maps
  • Principal Components
  • Multidimensional Scaling

22
Supervised Learning
  • Example methods for Class comparison/ Feature
    selection include
  • T-test / Wilcoxon rank sum test
  • F-test / Kruskal Wallis test
  • etc.
  • Example methods for Class Prediction include
  • Weighted voting
  • K nearest neighbors
  • Compound Covariate Predictors
  • Classification trees
  • Support vector machines
  • etc.

23
Supervised Learning Class Prediction
  • Risk of over-fitting the data may have a perfect
    discriminator for the data set at hand but the
    same model may perform poorly on independent data
    sets.
  • Most prediction methods are intended for large
    n (samples) small p (covariates) datasets.
  • Process is to
  • Fit model
  • Check model adequacy
  • Make an inference

24
Class Prediction Checking model Adequacy
  • Regardless of algorithm used, it is essential
    that once the prediction rule has been defined,
    an unbiased estimate of the true error rate must
    be calculated.

25
Class Prediction Checking Model Adequacy
  • In a data rich situation,
  • randomly divide the dataset into two parts,
    representing a training and test dataset.
  • Build the prediction algorithm using the training
    dataset
  • Once a final model has been developed, the
    prediction rule is applied to the test dataset to
    estimate the misclassification error

26
Class Prediction Checking Model Adequacy
  • For small sample sizes, withholding a large
    portion of the data for validation purposes may
    limit the ability of developing a prediction
    rule. Therefore, use cross-validation techniques
    to assess the error.

27
Class Prediction Checking Model Adequacy
  • K-fold cross-validation requires one to randomly
    split the dataset into K equally sized groups.
  • Thereafter, the model is fit to K-1 parts of the
    data and the generalization error is calculated
    using the Kth remaining part of the data.
  • This procedure is repeated so that the
    generalization error is estimated for each of the
    K parts of the data, providing an overall
    estimate of the generalization error and its
    associated standard error.

28
Class Prediction Checking Model Adequacy
1 2 3 4 5 6 7 8 9 10
  • Leave out data in group 3
  • Fit the model to the data in groups 1 2, 4
    10 (learning
  • dataset)
  • Calculate the error using observations in group
    3 as the
  • test dataset
  • Do this for each of the 10 partitions
Write a Comment
User Comments (0)
About PowerShow.com