Variability

1 / 19
About This Presentation
Title:

Variability

Description:

red (Cy5) green dye (Cy3): dye swap. log (base 2) transformation ... want a two sided t-test (no preconceived idea about which group is more highly expressed) ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 20
Provided by: joha3

less

Transcript and Presenter's Notes

Title: Variability


1
Variability Statistical Analysisof Microarray
DataGCAT Georgetown July 2004
  • Jo Hardin
  • Pomona College
  • jo.hardin_at_pomona.edu

2
Variability
  • key to statistics
  • within slide vs. between slide
  • replication
  • red (Cy5) gt green dye (Cy3) dye swap
  • log (base 2) transformation

3
Example
  • Variation in Gene Expression Patterns in
    Follicular Lymphoma and the Response to
    Rituximab, by Bohen, Troyanskaya, Alter, Warnke,
    Botstein, Brown, and Levy
  • 2 groups those who responded to treatment, and
    those who did not respond to treatment.
  • Cy5 dye used on malignant lymphoid tissues, Cy3
    dye used on mRNA derived from cell lines
  • Biopsies obtained before treatment of Rituximab
  • Are there differences in gene expression across
    those who responded to treatment and those who
    didnt?

4
Data Cleaning
  • Individual points were median centered for each
    cDNA clone and filtered for data quality.
  • Data values are either

5
The Data
6
Differential Expression Across Two Groups
  • Fold Change
  • t-test
  • Wilcoxon Rank-Sum Test
  • SAM

7
Fold Change
  • Of mean? Of median?
  • Across treatment groups? vs. reference group?
  • Small vs. large values
  • What about how variable the groups are?

8
An Example using one gene
9
t-test
  • Test statistic
  • p-value probability of seeing your data or more
    extreme if there is no difference in the groups

10
t-test in Excel
  • Syntax TTEST(array1,array2,tails,type)
  • Example
  • first group is in cells c3 k3
  • second group is in cells l3 v3
  • we want a two sided t-test (no preconceived idea
    about which group is more highly expressed)
  • we assume the variance is unequal
  • in cell w3 type ttest(c3k3,l3v3,2,3)

11
Wilcoxon Rank Sum Test
  • Instead of comparing averages, this test compares
    rankings (or medians)
  • In order to discount influential points, we
    replace the data values with their appropriate
    rankings.
  • We compute a z-test (sister of the t-test) on the
    ranked data.

12
Up regulated genes Down regulated genes
13
Technical Details
  • Replace values with ranks
  • Sum the ranks in the first group
  • Calculate hypothesized mean1 n1(n1n21)/2
  • Calculate hypothesized standard deviation1
    sqrt(n1n2(n1n21)/12)
  • Calculate test statistic (sum ranks hyp
    mean1) / hyp stdev1
  • Find the p-value using the normal distribution
    (probability of being greater than the test
    statistic if there are no differences in the two
    groups)

14
Wilcoxon Rank Sum in Excel
  • Using the rank function, translate your data into
    ranks
  • Y3 RANK(C3,C3V3) this finds the rank of C3
    in the range C3-V3
  • (youll probably get a value here, thats OK
    because C3 is empty for gene IMAGE253507)
  • Repeat this command for Z3 to AR3 keeping the
    second half of the function always C3V3
  • Copy the row from Y3 to AR3 and paste from Y4 to
    AR2366
  • AS2 SUMIF(Y3AG3,"gt0",Y3AG3) (sum rank
    grp1)
  • AT2 COUNT(Y3AG3)(COUNT(Y3AR3)1)/2 (mean1)
  • AU2 SQRT(COUNT(Y3AG3)COUNT(AH3AR3)
    (COUNT(Y3AR3)1)/12) (stdev1)
  • AV2 (AS3-AT3)/AU3 (zscore1 test stat)
  • AW2 2(1-NORMDIST(ABS(AV3),0,1,TRUE))
  • (p-value)

15
SAM (Significance Analysis of Microarrays)
  • is a statistical technique for finding
    significant genes in a set of microarray
    experiements
  • can be used in a comparison experiment
  • can also be used with a quantitative response
    (like tumor size) or with one class data

16
Technical Details
  • For the ith gene, comparing two groups, the test
    statistic is
  • Rank the di and keep as test statistics
  • Permute the data labels 100 times, and calculate
    expected values for the di given no structure.
  • Plot observed di vs. expected di

17
False Discovery Rate
  • We know that the expected di were computed with
    no group structure.
  • Any large expected di values will be false
    positives.
  • If we see 30 observed di above some cutoff and 10
    expected di above the same cutoff, we know that
    we probably have 10 false positives (though we
    can never know which genes are the false
    positives)

18
Features of SAM
  • Slider we can change the false discovery rate
  • Fold change in addition to the false discovery
    rate, we can require the genes to be at some fold
    change threshold (on average)
  • Gene lists gene lists are given along with
    corresponding significance levels
  • Web Link option for more information about
    particular genes

19
Imputation
  • Most microarray data has missing values
  • If background is bigger than foreground, the
    observed signal will be negative!
  • Poor quality spots are removed prior to analysis.
  • SAM needs a full data set which can be computed
    by
  • Substitution of the row average
  • Substitution using k-nearest neighbors
Write a Comment
User Comments (0)