Variability

About This Presentation

Title:

Description:

Number of Views:18

Avg rating:3.0/5.0

Slides: 20

Provided by: joha3

Transcript and Presenter's Notes

Title: Variability

1
Variability Statistical Analysisof Microarray
DataGCAT Georgetown July 2004

2
Variability

3
Example

Variation in Gene Expression Patterns in
Follicular Lymphoma and the Response to
Rituximab, by Bohen, Troyanskaya, Alter, Warnke,
Botstein, Brown, and Levy
2 groups those who responded to treatment, and
those who did not respond to treatment.
Cy5 dye used on malignant lymphoid tissues, Cy3
dye used on mRNA derived from cell lines
Biopsies obtained before treatment of Rituximab
Are there differences in gene expression across
those who responded to treatment and those who
didnt?

4
Data Cleaning

Individual points were median centered for each
cDNA clone and filtered for data quality.
Data values are either

5
The Data
6
Differential Expression Across Two Groups

7
Fold Change

8
An Example using one gene
9
t-test

Test statistic
p-value probability of seeing your data or more
extreme if there is no difference in the groups

10
t-test in Excel

Syntax TTEST(array1,array2,tails,type)
Example
first group is in cells c3 k3
second group is in cells l3 v3
we want a two sided t-test (no preconceived idea
about which group is more highly expressed)
we assume the variance is unequal
in cell w3 type ttest(c3k3,l3v3,2,3)

11
Wilcoxon Rank Sum Test

Instead of comparing averages, this test compares
rankings (or medians)
In order to discount influential points, we
replace the data values with their appropriate
rankings.
We compute a z-test (sister of the t-test) on the
ranked data.

12
Up regulated genes Down regulated genes
13
Technical Details

Replace values with ranks
Sum the ranks in the first group
Calculate hypothesized mean1 n1(n1n21)/2
Calculate hypothesized standard deviation1
sqrt(n1n2(n1n21)/12)
Calculate test statistic (sum ranks hyp
mean1) / hyp stdev1
Find the p-value using the normal distribution
(probability of being greater than the test
statistic if there are no differences in the two
groups)

14
Wilcoxon Rank Sum in Excel

Using the rank function, translate your data into
ranks
Y3 RANK(C3,C3V3) this finds the rank of C3
in the range C3-V3
(youll probably get a value here, thats OK
because C3 is empty for gene IMAGE253507)
Repeat this command for Z3 to AR3 keeping the
second half of the function always C3V3
Copy the row from Y3 to AR3 and paste from Y4 to
AR2366
AS2 SUMIF(Y3AG3,"gt0",Y3AG3) (sum rank
grp1)
AT2 COUNT(Y3AG3)(COUNT(Y3AR3)1)/2 (mean1)
AU2 SQRT(COUNT(Y3AG3)COUNT(AH3AR3)
(COUNT(Y3AR3)1)/12) (stdev1)
AV2 (AS3-AT3)/AU3 (zscore1 test stat)
AW2 2(1-NORMDIST(ABS(AV3),0,1,TRUE))
(p-value)

15
SAM (Significance Analysis of Microarrays)

is a statistical technique for finding
significant genes in a set of microarray
experiements
can be used in a comparison experiment
can also be used with a quantitative response
(like tumor size) or with one class data

16
Technical Details

For the ith gene, comparing two groups, the test
statistic is
Rank the di and keep as test statistics
Permute the data labels 100 times, and calculate
expected values for the di given no structure.
Plot observed di vs. expected di

17
False Discovery Rate

We know that the expected di were computed with
no group structure.
Any large expected di values will be false
positives.
If we see 30 observed di above some cutoff and 10
expected di above the same cutoff, we know that
we probably have 10 false positives (though we
can never know which genes are the false
positives)

18
Features of SAM

Slider we can change the false discovery rate
Fold change in addition to the false discovery
rate, we can require the genes to be at some fold
change threshold (on average)
Gene lists gene lists are given along with
corresponding significance levels
Web Link option for more information about
particular genes

19
Imputation