Analysis of Affymetrix Microarray Data - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Analysis of Affymetrix Microarray Data

Description:

Probe: A 25mer oligo complemetary to a sequence of interest, attached to a glace ... These packages include: DMT, Spotfire, Genespring, STATA, Gene Data ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 48
Provided by: nasc5
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Affymetrix Microarray Data


1
Analysis of Affymetrix Microarray Data John
Okyere (NASC)
2
Overview of Presentation
  • Experiment Design
  • Data Normalization and Expression Value
    Calculation
  • Statistical Analysis
  • Data Interpretation

3
Experiment Design
4
(No Transcript)
5
Affymetrix Terminology
Probe A 25mer oligo complemetary to a
sequence of interest, attached to a glace
surface on the probe array
Perfect Match (PM) Probes that are
complementary to the sequence of interest.
Mismatch (MM) Probes that are complementary to
the sequence of interest except for
homomeric base
change (A-T or G-C) at the 13th position
Probe Pair (PP) A combination of a PM and MM
11-16 probe pairs/ probe set
Probe Cell A single feature size can be 18X18
or 20X20u
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
Experimental Design Flow
Simplified Data Analysis
Pilot Study
Full Scale Experiment
Publication
Bioinformatics
Data Validation
Complete Analysis
10
Advantages of a Pilot Study
  • Estimate experimental variability
  • Refine laboratory methods/techniques
  • Refine experimental design
  • Allows for rapid screening
  • Provides preliminary data for project funding

11
Three Sources of Variability
  • Biological Differences between samples
  • - The ultimate goal of the research
  • Technical Sample preparation
  • - Protocols and operator
  • System Probe Array analysis
  • - Arrays, instruments, reagents

12
Controlling Biological Variability
  • Biological variability contributes more to
    experimental variability
  • than technical variability.
  • To mitigate biological variability-
  • - Consider all potential variables as part
    of the experiment design
  • - Increase the number of biological
    replicates until Coefficient of
  • Variation (CV) stabilizes

13
Examples of Biological Variability
  • Cell Cycle Patterns- What time of day were the
    samples isolated?
  • Circadian Rhythm- What is the time interval
    between time course samples?
  • Nutrient- Media types will affect expression
    levels
  • Tissue- Each cell type has different expression
    pattern
  • Temperature- Growth room temperature may vary
    within a 24h period
  • Disease- Defense genes will alter global gene
    expression pattern
  • Germination time- Different seed batches will
    alter gene expression pattern

14
Practical Questions to Consider
  • How much variability does your system have?
  • - Understand and minimize variation
  • What level of significance is needed?
  • - More replicates needed for subtle changes
  • How many treatments? How many controls?
  • - Comparative analysis (one experimental
    condition) or serial analysis
  • design (multiple experimental conditions)?

15
Percentage CV as Estimate of Variability
  • CV is a measure of variance amongst replicates
    of a single condition
  • Defined as the standard deviation divided by the
    mean multiplied by 100
  • Example 5 signal values representing 5
    replicates
  • - 230.4, 241.7, 252.9, 338.8, 178.9
  • - Mean 248.56 ? 57.9 CV 23.29

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Experimental Replicates
  • Technical replicates from the same sample
    reproduce the contribution
  • from the bench effects to the overall
    variability
  • Biological replicates True replicates that
    reproduce biological conditions
  • explored in the experimental design
  • - Permit the use of formal statistical tests
  • - Also allows the interrogation of technical
    variability

20
RNA Sample Pooling
  • Can increase sample quantity
  • A common variance mitigation strategy
  • Can result in irreversible loss of information
    by introducing a bias
  • If necessary pool a minimum of three or a
    maximum of five RNAs
  • Equal pooling of RNA samples is essential

21
Data Normalization
22
Why Normalize ?
  • To correct for systematic measurement error and
    bias in data
  • - Differences in probe labeling
  • - Target concentration
  • - Hybridization efficiency
  • - Scanner noise
  • Allows for data comparison

23
Data Normalization Methods
  • Scaling Factor (linear) normalization
  • - Global or selected gene set
  • - Works well when data quality metrics are
    consistent
  • - Simplifies database construction
  • - Weakness assumes error is uniform across all
    genes
  • assumes total mRNA is the
    same for all cells
  • Non-linear
  • - Can provide higher precision, especially at
    the extremes
  • - Requires selected gene (invariant) set
  • - May give false confidence in poor data

24
Normalization Curves
Not normalized
Normalized
25
Scaling Data to a Target Intensity
Exp. 4
Exp. 2
Exp. 6
Target Intensity (100)
Exp. 3
Exp. 1
Exp. 5
Exp. 7
TGT Average intensity x Scaling Factor
  • If scaling factor is lt 3 fold, comparison can be
    made between all experiments

26
Expression Value Calculation (Signal)
  • The signal represents the amount of transcript
    in solution
  • Signal is calculated as follows
  • - Cell intensities are preprocessed for
    global background
  • - An ideal mismatch value is calculated and
    subtracted to adjust PM intensity
  • - The adjusted PM intensities are log
    transformed to stabilize the variance
  • - The Tukeys biweight estimator is used to
    provide a robust mean of the signal
  • - Signal is output as the antilog of the mean
    signal value
  • - Finally the signal is scaled to generated a
    normalized data

27
Expression Value Calculation (Signal)
Method Specific Background (SB)
SB Tbi( log2 (PM) log2
(MM)) ? IM Probe Value (PV) and Signal Log
Value (SLV) V
max(PM IM) PV
log2(V) SLV
Tbi(PV1 PVn)
28
Statistical Analysis
29
Statistical Software
  • Affymetrix data files are accessible to many
    statistical packages
  • These packages include DMT, Spotfire,
    Genespring, STATA, Gene Data
  • Gene Maths, dChip, RMA, S, R, Ominiviz, etc
  • For information regarding these products please
    contact the manufacturers

30
Microarray Data Distribution
  • Are the data approximately normally distributed
    with each group
  • having equal variance?
  • Yes Parametric Analysis
  • - Assumes equal variance in data in
    order to determine
  • significance between data sets
  • No Non-Parametric Analysis
  • - Use ranks of numerical data in order
    to determine
  • significance between data sets

31
Normally Distributed Data
  • Single symmetrical peak at the mean
  • Continues on horizontal to infinity
  • 68 of the data lie within one standard
  • deviation from the mean
  • 95 of the data lie with two standard
  • deviation from the mean
  • The mean and median are approximately
  • equal

32
Types of Statistical Analysis
  • Two Sample Comparison
  • - Parametric Students T-test
  • - Non-Parametric Mann-Whitney
  • Multivariate Analysis
  • - Parametric Analysis of Variance (ANOVA)
  • - Non-Parametric Krustal-Wallis

33
Students T-test
  • Compares the means and standard deviations of
    two populations
  • Populations must be normally distributed
  • Computes a p-value to test null hypothesis

34
Students T-test
  • Unpaired T-test
  • - Compares expression patterns of genes in two
    groups of samples
  • - More common analysis in experiments using
    expression data
  • - The two groups can be different sizes

35
Mann-Whitney Rank Test
  • Non-parametric, non-paired, two sample rank test
  • Ignores distribution of the data
  • Sorts the data values and assigns ranks to them
  • Compares the sum of ranks of two data sets
  • Computes a p-value to test the null hypothesis

36
Analysis of Variance (ANOVA)
  • Parametric, multiple comparison test
  • Population must be normally distributed
  • Compares means and variance among groups
  • Computes a p-value
  • Determines whether the mean and varainces of the
    populations are the same

37
Krustal-Wallis
  • Non-parametric, multiple comparison test
  • Ignores distribution of data
  • Sorts the data values and assigns ranks to them
  • Compares the sum of ranks of more than two data
    sets
  • Computes a p-value

38
Multiple Comparison Corrections
39
Bonferroni Correction
  • Conservative error correction method
  • Works well when nlt8

40
Statistical Analysis Flow Diagram
41
Data Interpretation
42
Clustering Gene Expression Data
  • Summarize genes by co- or anti- correlation of
    expression profiles
  • Employ guilt-by-association functional
    prediction
  • Search for regulatory elements in promoters of
    co-expressed genes
  • Help identify interesting genes

43
Clustering Algorithms
  • Hierarchical
  • K-means
  • Self-Organizing Maps

44
Hierarchical Clustering
Agglomerative
Divisive
45
K-Means Clustering
  • Non hierarchical
  • User defines number of cluster (K)
  • Data partitioned into K number of clusters
  • Cluster relation is undetermined

46
Self-Organizing Maps (SOM)
  • Similar to K-means but constrained to a two
    dimensional grid
  • User chooses topology of Map and hence number of
    clusters
  • Objects are iteratively pulled towards clusters
  • An object can belong to only one cluster unlike
    K-means

47
Summary of Data Analysis
Write a Comment
User Comments (0)
About PowerShow.com