Test of significance for small samples - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Test of significance for small samples

Description:

Title: PowerPoint Presentation Author: damaratu Last modified by: j c Created Date: 2/26/2004 3:21:20 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 34
Provided by: dam102
Category:

less

Transcript and Presenter's Notes

Title: Test of significance for small samples


1
Test of significance for small samples Javier
Cabrera Director, Biostatistics Institute Rutgers
University Dhammika Amaratunga, Johnson
Johnson Pharmaceutical Research Development
2
Outline
  • Microarray Experiments and Differential
    expression
  • Small sample size issues
  • Conditional t approach
  • Comparison with other methods
  • Extensions
  • Reference Exploration and Analysis of DNA
  • Microarray and Protein Array Data. Wiley.2004.
  • Amaratunga, Cabrera.
  • Software DNAMR and DNAMRweb
  • http//www.rci.rutgers.edu/cabrera/DNAMR

3
(No Transcript)
4
Microarray experiment
cDNA or oligonucleotide preparation
Glass slide
Biological sample
mRNA
Reverse transcribe and label
Print or synthesize

Sample
Microarray
5k-50k genes arrayed in rectangular grid one
spot per gene
Hybridize, wash and scan
Image
Quantify spot intensities
Gene expression data
5
  • Differential gene expression
  • An organisms genome is the complete
  • set of genes in each of its cells. Given
  • an organism, every one of its cells has
  • a copy of the exact same genome, but
  • different cells express different genes
  • different genes express under different
    conditions
  • differential gene expression leads to
  • altered cell states

6
Differential Expression for small samples
C1 C2 C3 T1 T2 T3 G1
4.67 4.44 4.42 4.73 4.85 4.69 G2 3.13
2.54 1.96 0.97 2.38 3.36 G3 6.22 6.77
5.32 6.40 6.94 6.87 G4 10.74 10.81 10.69
10.75 10.68 10.68 G5 3.76 4.16 5.27 3.05
3.20 2.85 G6 6.95 6.78 6.33 6.81 6.95
7.01 G7 4.98 4.61 4.56 4.57 4.90 4.44 G8
2.72 3.30 3.24 3.22 3.42 3.22 G9 5.29
4.79 5.13 3.31 4.67 5.27 G10 5.12 4.85
3.79 4.13 3.12 4.79 G11 4.67 3.50 4.77
4.09 3.86 2.88 G12 6.22 6.42 5.02 6.38
6.54 6.80 G13 2.88 3.76 2.78 2.98 4.81
4.15 .......
  1. Preprocessed data.
  2. Perform a t-test for each gene.
  3. Select the most significant subset.

7
The pooled variances T-test
8
Plot t vs sp
Distribution of sp
300 21983
Differentially expressed genes have smaller sp.
Is this effect Statistical or Biological?
9
500 Simulation 1000 Genes 4 Controls 4
Treats iid Normal(0, ?2) 100 genes are
differentially express with mean diff 1 or -1
?21 CONSTANT, a0.05 False
Discoveries True Discoveries T-test
44 22 z-test
43 29
?2 from Chi-square(df3), a0.05 False
Discoveries True Discoveries T-test
43 28 z-test
53 13
10

The effect of small sample size
  • Often the sample size per group is small.
  • ? unreliable variances (inferences)
  • ? dependence between the test statistics (tg) and
    the standard error estimates (sg)
  • ? borrow strength across genes (LPE/EB)
  • ? regularize the test statistics (SAM)
  • ? work with tgsg (Conditional t).

11
  • Analysis results
  • Top 10 genes (sorted by t-test p-value)
  • Gene Fold Dir p p(Bonf)
  • G6546 2.36 D 0.000004 0.0964
  • G19945 3.25 U 0.000005 0.1102
  • G21586 1.64 U 0.000008 0.1765
  • G18970 2.52 U 0.000019 0.4220
  • G7432 3.70 D 0.000033 0.7248
  • G19057 1.85 U 0.000046 1.0000
  • G17361 4.34 D 0.000067 1.0000
  • G8525 5.57 D 0.000067 1.0000
  • G425 18.11 D 0.000078 1.0000
  • G8524 4.74 D 0.000109 1.0000

12
SAM Determining c
For each a
v1 (a) mad Tg
cv(?1) s1
cv(?2) s2
cv(?3) s3
cv(?4) s4
cv(?5) s5
cv(?6) s6
cv(?7) s7

cv(?)
v2(a) v3(a) v4(a) v5(a) v6(a) v7(a)
Tg
Min
sg
13
SAM Gene selection
D
Expected value of under permutations
14
Conditional t Basic Model
? Let Xgij denote the preprocessed intensity
measurement for gene g in array i of group j. ?
Model Xgij mgj sg egij ? Effect of
interest tg mg2 - mg1 ? Error model egij
F(location0, scale1) ? Gene mean-variance
model(mg1,sg2) Fm,s with marginals mg1 Fm
and sg2 Fs
15
Possible approaches
Parametric Assume functional forms for F and
Fm,s and apply either a Bayes or Empirical Bayes
procedure. Nonparametric 1.

or For small samples is not a
good estimator of F? Use method of moments
Target estimation 2. Proceed via resampling
and estimate the distribution
t sp
(Conditional t).
16
Procedure
17
Procedure (cont.)
18
Roadblock
Let Xij be a sample from the model with s2 Fs
and let the variance obtained from the Xij be
s2 Then Var(s2) gt Var(s2) For example, if we
assume that Fs c32, n4 and e N(0,1), then
Var(s2)6 and Var(s2)15. Fix by target
estimation Method of moments. Shrink
towards the center
19
Example Checking for the distribution of ?g
Compare the distr. of sg vs simulation with
1. Df0.5
2. Df2
1. Df0.5
Mice Data
3. Df6
2. Df2
3. Df6
20
Another Example
Compare the distr. of sg vs simulation with
Df0.5
Df0.5
Df3
Df6
Df3
Df6
Df3
Df6
21
Fixing the variance distribution
22
Fixing the variance distribution (contd)
Proceed as before
23
Plot t vs sp Differentially expressed genes may
have large sp
191 22092
24
500 Simulation 1000 Genes 4 Controls 4
Treats iid Normal(0, ?2) 100 genes are
differentially express with mean diff 1 or -1
?21 CONSTANT False Discoveries
True Discoveries T-test 44
22 z-test 43
29 C-t 45
30
?2 from Chi-square(df3) False
Discoveries True Discoveries T-test
43 28 z-test
53 13 C-t
42 38
25
Using 8 iid samples from Khan Data, we make
changes to 50 genes to make them differentially
expressed for high level.
T-test
SAM
Ct
26
Generating p-values
27
Extensions ? F test - Condition on the
sqrt(MSE) ? Multiple comparisons - Tukey,
Dunnett, Bump. - Condition on the
sqrt(MSE) ? Gene Ontology. - Test for the
significance of groups. - Use Hypergeometric
Statistic, mean t, mean p-value, or other.
- Condition on log of the number of genes per
group
28
Conditional F
29
GO Ontology Conditioning on log(n)
Abs(T)
Log(n)
30
The Details
  • Reference
  • Exploration and Analysis of DNA Microarray
  • and Protein Array Data. Wiley . Jan 2004.
  • Amaratunga, Cabrera.
  • Email
  • cabrera_at_stat.rutgers.edu
  • damaratu_at_prdus.jnj.com
  • Webpage for DNAMR and DNAMRweb
  • http//www.rci.rutgers.edu/cabrera/DNAMR

31
Target Estimation
  • Target Estimation
  • Cabrera, Fernholz (1999)
  • - Bias Reduction.
  • - MSE reduction.
  • Recent Applications
  • - Ellipse Estimation (Multivariate Target).
  • - Logistic Regression
  • Cabrera, Fernholz, Devas (2003)
  • Patel (2003) Target Conditional MLE (TCMLE)
  • Implementation in StatXact (CYTEL) and
  • logXact Procs in SAS(by CYTEL).

32
Target Estimation
33
Target Estimation

Algorithms - Stochastic approximation.
- Simulation and iteration. - Exact
algorithm for TCMLE
Write a Comment
User Comments (0)
About PowerShow.com