STATISTICAL TOOLS FOR SYNTHESIZING LISTS OF - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

STATISTICAL TOOLS FOR SYNTHESIZING LISTS OF

Description:

... novel early stress response genes in rodent models of lung injury', Am J Physiol ... ventilation-induced lung injury (VILI) on rodents (mice and rats) ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 2
Provided by: kate204
Category:

less

Transcript and Presenter's Notes

Title: STATISTICAL TOOLS FOR SYNTHESIZING LISTS OF


1
STATISTICAL TOOLS FOR SYNTHESIZING LISTS OF
DIFFERENTIALLY EXPRESSED FEATURES IN MICROARRAY
EXPERIMENTS
Marta Blangiardo and Sylvia Richardson1 1
Centre for Biostatistics, Imperial College, St
Marys Campus, Norfolk Place London W2 1PG,
UK. m.blangiardo_at_imperial.ac.uk
SCOPE OF THE WORK Consider two different but
related experiments, how to assess whether there
are more differentially expressed genes in common
than expected by chance?
All the computations have been performed in R and
are available on BGX website (www.bgx.org.uk)

R(q)1.0 q0.05 O11(q)18
RANKED LISTS Suppose we have two experiments,
each reporting a measure (e.g. p-value,) of
differential expression on a probability scale
Experiment A Experiment B
pA1 pB1
pA2 pB2

pAn pBn
Small p value MOST differentially
expressed Large p value NOT differentially
expressed
O1(q)
O1(q)
  • SIMULATION
  • We use three batches of simulations differing by
    level of association between experiments and
    percentage of DE genes. For every batch we
    simulate two lists of 2000 p-values (Allison et
    al.2002) averaging the results over 100
    simulations.

We rank the genes according to the probability
measures. For each cut off q we obtain a 2X2
table
Conditional Model Permutation Test Conditional Model Permutation Test Conditional Model Permutation Test Conditional Model Permutation Test Conditional Model Permutation Test Conditional Model Permutation Test Joint Model Bayesian Analysis Joint Model Bayesian Analysis Joint Model Bayesian Analysis Joint Model Bayesian Analysis Joint Model Bayesian Analysis
T(q) q O11(q) O1(q) O1(q) MC p-value R(q) 95 CI q O11(q) O1(q) O1(q)
0 , DE 10 1.1 0.040 10 115 120 0.550 1.0 0.4-1.5 0.050 18 125 130
0.25 , DE 10 5.7 0.010 6 49 50 0.060 5.0 2.2-10.6 0.020 8 59 59
0.25 , DE 20 3.0 0.019 11 82 82 0.030 2.9 1.5-4.9 0.026 17 105 106
0.25 , DE 30 2.5 0.023 21 125 126 0.002 2.4 1.4-3.6 0.030 28 148 150
Exp B DE DE
Exp A DE DE O11(q) O1(q)-O11(q) O1(q)-O11(q) n-O1(q)- O1(q)O11(q) O1(q) n- O1(q)
O1(q) n- O1(q) n
The number of genes in common by chance is The
number of genes observed in common is O11(q)
RATIO We propose to calculate the maximum of the
observed to expected ratio It is the maximal
deviation from the underneath independence model.
NO association is declared when the two lists are
not associated (MC p-value not significant, CI
include 1) When there is a TRUE association
Conditional Model
Joint Model
Increasing of DE genes
  • The ratio T(q) decrease
  • q, O1(q), O1(q), and O11(q) increase
  • MC p-value is more significant
  • The ratio R(q) decreases
  • q, O1(q), O1(q), and O11(q) increase
  • CI95 are narrower

R(q) is always smaller than T(q) and its q is
slightly bigger as it accounts for the additional
variability
  • By using the maximum ratio, multiple testing
    issues for different list sizes are avoided
  • Returns a single list of O11(q) genes for further
    biological investigation
  • APPLICATION analysis of deleterious effect of
    mechanical ventilation on lung gene expression
  • We re-analyse the experiment presented in Ma et
    al, 2005, investigating the deleterious effect of
    mechanical ventilation on lung gene expression
    through a model of mechanical ventilation-induced
    lung injury (VILI) on rodents (mice and rats).
  • We analyse separately the two dataset using
    Cyber-T (Baldi and Long, 2001)
  • We use RESOURCERER to reconstruct the list of
    orthologs for the two species
  • We apply the methodology described to the lists
    of 2969 p-values (ortholog genes)

PERMUTATION TEST Given a threshold q and fixed
margins But the distribution of T(q) is not
easily obtained since the tables are nested in
each other. We take advantage of the empirical
distribution for T(q) obtained via permutations.
  • 97 genes found in common between mice and rats
  • 15 genes in common with the original analysis
    (which highlighted 48 genes)
  • Two enriched pathways with our methodology
  • 1) MAPK signalling activity. 6 out of the
    significant orthologs are involved in this KEGG
    pathway (Fgfr1, Gadd45a, Hspa8, Hspa1a, Il1b,
    Il1r2) while only 4 were highlighted in the
    original one.
  • 2) Cytokine-Cytokine receptor interaction. 5 out
    of the significant orthologs are involved in
    this KEGG pathway (IL6, Il1b, Il1r2, CCL2, Kit)
    while only 4 were highlighted in the original
    one.

We perform a Monte Carlo test of T under the null
hypothesis of independence between the two
experiments using permutations. This returns a
Monte Carlo p-value.
Not associated P-value 0.8
Associated P-value lt0.001
  • LIMITATIONS OF THE TEST
  • The uncertainty of the margins is not taken into
    account
  • The size of the list of genes in common can be
    vary small (typically when the total number of DE
    genes is small) and this can cause an instability
    in the estimate of T(q)
  • We propose a Bayesian model treating also the
    margins as random variables

R(q) 1.43 q 0.01 CI95 1.13-1.75
T(q)1.44 q0.01 MC p-value lt0.001
O11(q) 97 O1(q) 393 O1(q) 886
BAYESIAN MODEL Starting with the 2x2 table we
specify a multinomial distribution of dimension 3
for the vector of joint frequencies
  • DISCUSSION
  • This is a simple procedure to evaluate if two (or
    more) experiments are associated
  • The permutation test gives a first look under the
    model where the marginal frequencies are fixed
  • The Bayesian model permits to enlarge the
    scenario introducing variability on all the
    components
  • It is very flexible and adaptable for comparisons
    of several experiments at different levels (gene
    level, biological processes level) and for
    different problems (e.g. comparison between
    species , comparison between platforms )
  • and the vector of parameters q is modelled as non
    informative Dirichlet
  • q Di(0.05,0.05,0.05,0.05)
  • The derived quantity of interest is the ratio of
    the probability that a gene is in common to the
    probability that a gene is in common by chance
  • Since the model is conjugated the posterior
    distribution for q is Dirichlet

REFERENCES Allison et al. (2002), A mixture
model approach for the analysis of microarray
gene expression data, Computational Statistics
And Data Analysis, 39, 1-20. Baldi and Long,
(2001) A Bayesian framework for the analysis of
microarray expression data regularized t-test
and statistical inferences of gene changes,
Bioinformatics, 17, 509-519. Ma et al., (2005)
Bioinformatics identification of novel early
stress response genes in rodent models of lung
injury, Am J Physiol Lung Cell Mol Physiol
289(3), 468-477.
  • DECISION RULE
  • We can obtain a sample from the posterior
    distribution of the derived quantities R(q) and
    calculate the credibility interval (CI) at 95
    for each threshold q. We define q as the value
    of the argument for which the median of R(q)
    attains its maximum value, only for the subset of
    credibility intervals which do not include 1
  • Then R(q) is the ratio associated to q.

ACKNOWLEDGEMENTS We would like to thank Natalia
Bochkina, Alex Lewin and Anne-Mette Hein for
helpful discussions. This work has been supported
by a Wellcome Trust Functional Genomics
Development Initiative (FGDI) thematic award
Biological Atlas of Insulin Resistance (BAIR)",
PC2910_DHCT
Write a Comment
User Comments (0)
About PowerShow.com