Title: D91623405 D92525008
1D91623405 ???D92525008 ???
- Simple statistics tools for gene expression
arrays - http//microarray.cpmc.columbia.edu/pavlidis
/pub/stats/web/ - NIA Array Analysis Tool.htm
- http//lgsun.grc.nia.nih.gov/ANOVA/index.html
- Perl module (StatisticsKruskalWallis,ttest)
- Mann-Whitney U Test 1http//eatworms.swmed.edu/
leon/core_2002/stats/formulas.htmutest - the Rank Products methods of Breitling et
alrankproducts_FDR.pl - Multiclass discovery in array data.htm
- http//www.thep.lu.se/markus/software/classdisco
verer/index.html.
2Simple statistics tools for gene expression
arrays-1
- Analysis of variance and t-tests
- ttest Do a two-sided ttest with or without Welch
correction. This program include some
nonparametric tests as options (the Mann-Whitney
"U" test and the rank-transformed t-test). - anova-oneway
- Do a one-way analysis of variance with a balanced
design. - anova-twoway-norep
- Do a two-way analysis of variance, when there
are no replicates. - anova-twoway-withrep
- Do a two-way analysis of variance, when there are
replicates with a balanced design.
3Command lines to run the analyses on the test
files
- ttest -r testdata.txt ttest-layout.txt gt
testdata-ttest.out - anova-oneway -r testdata.txt anova-oneway-layout
.txt gt testdata-anova-oneway.out - anova-twoway-withrep -r testdata.txt
anova-twoway-withrep-layout.txt gt
testdata-anova-twoway-withrep.out
4testdata.txt 12(sameple) x30(gene)
5Layout(ttest, anova-oneway, anova-twoway-withre
p)
6ttest.out ttest -r testdata.txt
ttest-layout.txt gt testdata-ttest.out
7ttest
- Perform various two-sided statistical analyses of
data that is divided into two groups, including
the Student's t-test. - -r format line needs to be removed
- -w use Welch approximate t
- -m do mann-whitney U (a.k.a. Wilcoxon) test
- -rank use rank transformation of the data
- -l log transform the data
- ttest -r -rank affydatafile.txt
affydatafile-layout /usr/local/bin/sort -gk 3
gt! test.rank.out
8anova-oneway.outanova-oneway -r testdata.txt
anova-oneway-layout.txt gt testdata-anova-oneway.o
ut
9anova-twoway-withrep.outanova-twoway-withrep
-r testdata.txt anova-twoway-withrep-layout.txt gt
testdata-anova-twoway-withrep.out
10NIA Array Analysis Tool-anova_oneway
11NIA Array Analysis Tool-anova_oneway
- The major advantage of ANOVA versus simple t-test
is that variances are averaged over all factor
levels, thus the statistics become more stable. - In ANOVA we calculate the F-statistics which is
then used to estimate P-value and determine if
the variation between means is significant. - Testing multiple hypotheses with ANOVA (as in the
case of microarray data) may require some
modifications in ANOVA like variance averaging,
and FDR.
12Comparing 2 tissues with 1-dye arraysInput
Example (2 tissues, 1-dye arrays)
13Comparing multiple tissues with 2-dye
arraysInput Example (3 tissues, 2-dye arrays)
14Comparing 2 tissues with 2-dye arraysInput
Example (2 tissues, 2-dye arrays, dye swap)
15Comparing multiple tissues with 2-dye
arraysExample(3 tissues, 2-dye arrays, universal
reference(UR))
16Visualization of a data set with no
replicationsExample of an input file (1-dye
arrays)
Because your data has no replications, no
statistical analysis will be done. Only
visualization will be available by pair-wise
comparison of tissues, PAC, and hierarchical
clustering. Note that hierarchical clustering is
available only if you have not more than 20
tissues for comparison.
17Kruskal-Wallis test-1
- Perl module, use to test if differences exist
between 3 or more independant groups of unequal
sizes. - StatisticsKruskalWallis
- Also includes the post-hoc Newman-Keuls test, to
test if the differences between pairs of the
tested group are significant
18Kruskal-Wallis test-2input and output
- use StatisticsKruskalWallis use strict
- my _at_group_1 (6.4,6.8,7.2,8.3,8.4,9.1,9.4,9.7),
- _at_group_2 (2.5,3.7,4.9,5.4,5.9,8.1,8
.2), - _at_group_3 (1.3,4.1,4.9,5.2,5.5,8.2)
- my kw new StatisticsKruskalWallis
- kw-gtload_data('group 1',_at_group_1)
- kw-gtload_data('group 2',_at_group_2)
- kw-gtload_data('group 3',_at_group_3)
- my (H,p_value) kw-gtperform_kruskal_wallis_tes
t - print "Kruskal Wallis statistic is H\n"
- print "p value for test is p_value\n"
19Kruskal-Wallis test-3Newman-Keuls test
- (q,p) kw-gtpost_hoc('Newman-Keuls','group
1','group 2') - print "Newman-Keuls statistic for groups 1,2 is
q, p value p\n" - (q,p) kw-gtpost_hoc('Newman-Keuls','group
2','group 3') - print "Newman-Keuls statistic for groups 2,3 is
q, p value p\n"
20Kruskal-Wallis test-4 kruskalwallis.perl
- Jussi Karlgren, SICS, 2003. jussi_at_sics.se
- Usage kruskalwallis.perl -c ltCategorial_Columngt
-v ltValue_Columngt - require "getopts.pl"
- print "Kruskal-Wallis test statistic H - refer
to khi2 tables with df degrees of freedom.\n"
21Ttestperl module
- my ttest new StatisticsTTest
- ttest-gtset_significance(90)
- ttest-gtload_data(\_at_r1,\_at_r2)
- ttest-gtoutput_t_test()
- ttest-gtset_significance(99)
- ttest-gtprint_t_test()
22Mann-Whitney U Test 1http//eatworms.swmed.edu/
leon/core_2002/stats/formulas.htmutest
23Mann-Whitney U Test 2(Wilcoxon Rank Sum Test)
- http//eatworms.swmed.edu/leon/core_2002/stats/fo
rmulas.htmutest - Utest.c
- Utable.pl
24Multiclass discovery in array data-1http//www.th
ep.lu.se/markus/software/classdiscoverer/index.ht
ml.
25Multiclass discovery in array data-2
- An unsupervised classification method for
discovery of classes in array data. - For two classes the Wilcoxon test is used to
find discriminatory genes. For more than two
classes the Kruskal-Wallis test is used. - The Perl modules AlgorithmNumericalShuffle,
POSIX, StatisticsDistributions, Storable, and
TieRefHash.
26- For discovery of two classes, P values from
random permutation tests are stored in the file
'pvalues.data' in binary format using the CPAN
module Storable. If 'pvalues.data' is not
compatible with your system you have to generate
one using the included 'generate_pvalues.pl'
program. - The file 'pvalues.data' contains results for
which the total number of experiments is
maximally 100. If you are analysing a data set
with more than 100 experiments and do not want to
perform the permutation tests every time, you
have to modify the subroutine 'new' in
'WilcoxonTest.pm'. - Y. Liu and M. Ringner, Multiclass discovery in
array data, BMC Bioinformatics 5, 70 (2004) - class_discoverer.pl - multiclass discovery in
array data - Genes Exp_1 Exp_2 Exp_3 Exp_4
- Gene_1 0 0 0 0
- Gene_2 1 1 1
- Gene_3 -1 -1 -1 -1
- Gene_4 0 1 2 3