Analysis of Differential Expression - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis of Differential Expression

Description:

... have different X gene expression than control rats in ventral tegmental area? Design an experiment in which treatment rats (N 2) are exposed to nicotine and ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 65
Provided by: fenBilk
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Differential Expression


1
Analysis of Differential Expression
  • T-test
  • ANOVA
  • Non-parametric methods
  • Correlation
  • Regression

2
Research Question
  • Do nicotine-exposed rats have different X gene
    expression than control rats in ventral tegmental
    area?
  • Design an experiment in which treatment rats
    (Ngt2) are exposed to nicotine and control rats
    (Ngt2) are exposed to saline.
  • Collect RNA from VTA, convert to cDNA
  • Determine the amount of X transcript in each
    individual.
  • Perform a test of means considering the
    variability within each group.

3
Observed difference between groups
  • May be due to
  • Treatment
  • Chance

4
Hypothesis Testing
  • Null hypothesis There is no difference between
    the means of the groups.
  • Alternative hypothesis Means of the groups are
    different.

5
Hypothesis testing
  • You can not accept null hypothesis
  • You can reject it
  • You can support it

6
P-value
  • The P stands for probability, and measures how
    likely it is that any observed difference between
    groups is due to chance, alone.

7
P-value
  • there is a significant difference between groups
    if the P value is small enough (e.g., lt0.05).
  • P value equals to the probability of type I
    error.
  • Type I error wrongly concluding that there is a
    difference between groups (false positive).
  • Type II error wrongly concluding that there is
    no difference between groups (false negative).

8
Multiple tests on the same data
  • Expression data on multiple genes from the same
    individuals
  • Subsets of genes are coregulated thus they are
    not independent.
  • Such data requires multiple tests.

9
Why not do multiple t-tests? Or if you do, adjust
the p-values
  • Because it increases type I error
  • a study involving four treatments, there are six
    possible pairwise comparisons.
  • If the chance of a type I error in one such
    comparison is 0.05, then the chance of not
    committing a type I error is 1 0.05 0.95.
  • then the chance of not committing a type I error
    in any one of them is 0.956 0.74.
  • Cumulative type I error 1-0.740.26

10
Normal Distribution
  • it is entirely defined by two quantities its
    mean and its standard deviation (SD).
  • The mean determines where the peak occurs and
  • the SD determines the shape of the curve.

11
Curves same mean, different stds
12
Rules of normal distribution
  • 68.3 of the distribution falls within 1 SD of
    the mean (i.e. between mean SD and mean SD)
  • 95.4 of the distribution falls between mean 2
    SD and mean 2 SD
  • 99.7 of the distribution falls between mean 3
    SD and mean 3 SD.

13
Most commonly used rule
  • 95 of the distribution falls between mean 1.96
    SD and mean 1.96 SD
  • If the data are normally distributed, one can use
    a range (confidence interval) within which 95 of
    the data falls into.

14
A sample
  • Samples vary
  • Samples are collected in limited numbers
  • They are representatives of a population.
  • A sample
  • E.g., nicotine treated rat RNA

15
Sample means
  • Consider all possible samples of fixed size (n)
    drawn from a population.
  • Each of these samples has its own mean and these
    means will vary between samples.
  • Each sample will have their own distribution,
    thus their own std.

16
Population mean
  • The mean of all the sample means is equal to the
    population mean (?).
  • SD of the sample means measures the deviation of
    individual sample means from the population mean
    (?)

17
Standard error
  • It reflects the effect of sample size, larger the
    SE, either the variation is high or sample size
    is small.

18
Confidence Intervals
  • a confidence interval gives a range of values
    within which it is likely that the true
    population value lies.
  • It is defined as follows
  • 95 confidence interval (sample mean 1.96 SE)
    to (sample mean 1.96 SE).
  • a 99 confidence interval (calculated as mean
    2.56 SE)

19
T-distribution
  • The t-distribution is similar in shape to the
    Normal distribution, being symmetrical and
    unimodal, but is generally more spread out with
    longer tails.
  • The exact shape depends on a quantity known as
    the degrees of freedom, which in this context
    is equal to
  • the sample size minus 1.

20
T-distribution
21
One-sample t-test
  • Null hypothesis Sample mean does not differ from
    hypothesized mean, e.g., 0 (Ho ?0)
  • A t-statistics (t) is calculated.
  • t is the number of SEs that separate the sample
    mean from the hypothesized value.
  • The associated P value is obtained by comparison
    with the t distribution.
  • Larger the t-statistics, lower the probability of
    obtaining such a large value, thus p is smaller
    and more significant.

22
Paired t-test
  • Used with paired data.
  • Paired data arise in a number of different
    situations,
  • a matched casecontrol study in which individual
    cases and controls are matched to each other, or
  • A repeat measures study in which some measurement
    is made on the same set of individuals on more
    than one occasion

23
Paired t-test
24
Two-sample t-test
  • Comparison of two groups with unpaired data.
  • E.g., comparison of individuals of treatment and
    those of control for a particular variable.
  • Now there are two independent populations thus
    two STDs

25
Calculation of pooled STD
  • The pooled SD for the difference in means is
    calculated as follows

26
Calculation of pooled SE
  • the combined SE gives more weight to the larger
    sample size (if sample sizes are unequal) because
    this is likely to be more reliable. The pooled SD
    for the difference in means is calculated as
    follows

27
Two sample T-test
  • Comparison of means of two groups based on a
    t-statistics and its students t-distribution.
  • dividing the difference between the sample means
    by the standard error of the difference.

28
T-statistic
  • A P value may be obtained by comparison with the
    t distribution on n1 n2 2 degrees of freedom.
  • Again, the larger the t statistic, the smaller
    the P value will be.

29
Example
30
Calculation of SD
31
Calculation of SE
32
T-statistic
  • t (95-81)/2.41 14/2.41 5.81,
  • with a corresponding P value less than 0.0001.
  • Reject null hypothesis that states that sample
    means do not differ.

33
Analysis of Variance
  • ANOVA
  • A technique for analyzing the way in which the
    mean of a variable is affected by different types
    and combinations of factors.
  • E.g., the effect of three different diets on
    total serum cholesterol

34
Sample Experiment
Variance
35
Sum of squares calculations
total
within
between
36
Degrees of freedom
37
Sources of variation
P value of 0.0039 means that at least two of the
treatment groups are different.
38
Multiple Tests
  • Post hoc comparisons between pairs of treatments.
  • Overall type I error rate increases by increasing
    number of pairwise comparisons.
  • One has to maintain the 0.05 type I error rate
    after all of the comparisons.

39
Bonferroni Adjustment
  • 0.05/of tests
  • Too conservative

40
NonParametric methods
  • Many statistical methods require assumptions.
  • T-test requires samples are normally distributed.
  • They require transformations
  • Nonparametric methods require very little or no
    assumptions.

41
Wilcoxon signed rank test for paired data
42
Wilcoxon signed rank test
43
Central venous oxygen saturation on admission and
after 6 h into ICU.
  • Take the difference between the paired data
    points.
  • Patients have SvO2 values on admission and after
    6 hours.

44
Central venous oxygen saturation on admission and
after 6 h into ICU.
  • Rank differences regardless of their sign.
  • Give a sign to the ranked differences

45
Calculate
  • Sum of positive ranks (R)
  • Sum of negative ranks (R-)

46
Sum of positive and negative ranks
47
Critical values for WSR test when n 10
5
48
Wilcoxon sum or Mann-Whitney test
  • Wilcoxon signed rank is good for paired data.
  • For unpaired data, wilcoxon sum test is used.

49
Steps of Wilcoxon rank-sum test
50
Total drug doses in patients with a 3 to 5 day
stay in intensive care unit.
  • Rank all observations in the increasing order
    regardless of groupings
  • Use average rank if the values tie
  • Add up the ranks
  • Select the smaller value, calculate a p-value for
    it.

51
Critical values
52
Correlation and Regression
  • Correlation quantifies the strength of the
    relationship between two paired samples.
  • Regression expresses the relationship in the form
    of an equation.
  • Example whether two genes, X and Y are
    coregulated, or the expression level of gene X
    can be predicted based on the expression level of
    gene Y.

53
Product moment correlation
r lies between -1 and 1
54
Age and urea for 20 patients in emergency unit
55
Scattergram
r 0.62
56
Confidence intervals around r
57
Confidence of r
58
(No Transcript)
59
Misuse of correlation
  • There may be a third variable both of the
    variables are related to
  • It does not imply causation.
  • A nonlinear relationship may exist.

60
Regression
61
Method of least squares
  • The regression line is obtained using the method
    of least squares. Any line y a bx that we
    draw through the points gives a predicted or
    fitted value of y for each value of x in the
    dataset.
  • For a particular value of x the vertical
    difference between the observed and the fitted
    value of y is known as the deviation or residual.
  • The method least squares finds the values a and b
    that minimizes the sum of squares of all
    deviations.

62
Age and urea level
63
Residuals
64
Method of least squares
Write a Comment
User Comments (0)
About PowerShow.com