Biostatistics and Computer Applications - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Biostatistics and Computer Applications

Description:

We watered the plant and measured 4 plants, the sample mean biomass ... You want to prove that watering increased biomass. The null hypothesis is a 'straw man' ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 52
Provided by: dafen
Category:

less

Transcript and Presenter's Notes

Title: Biostatistics and Computer Applications


1
Biostatistics and Computer Applications
  • Hypothesis Test
  • one sample
  • Two sample independent
  • Paired t test
  • SAS Programming
  • 12/31/2002

2
Recap (sampling distribution)
Variable X
Sample mean
3
Recap (sampling distribution)
Sample mean
Sample variance
4
Statistical inference
  • Definition
  • The method to infer a population (parameter)
    from a sample (statistic) according to the
    probability theory and sampling distribution.
  • Hypothesis test
  • Parameter estimation

5
Parameter estimation
  • Point estimation and Interval estimation
  • Interval estimation According to the sampling
    distribution, set an interval for
    parameter ( of a
    population that the probability is within
    the interval is , i.e.
  • L1 and L2 are parameter s
    confidence limits L1, L2 is confidence
    interval (CI) is the confidence
    coefficient.

6
Hypothesis test
  • Hypothesis test
  • Based on certain knowledge, we set null and
    alternate hypotheses for a statistical
    population, then calculate a test statistic and
    determine the probability, and make a conclusion
    to either reject or accept certain hypothesis
    based on the probability.
  • Example
  • Test two fertilizers. We measured 20 plants per
    treatment. The mean biomasses are 502 g and
    550 g, the different in biomass is 48 g. Is
    this difference due to the difference in
    population means or just because the random
    sampling error?

7
Hypothesis test
  • Research hypotheses, as opposed to statistical
    hypotheses, are the research questions that drive
    the research
  • e.g., lowering the fat in a persons diet lowers
    the blood cholesterol levels
  • e.g., climate change warming increases soil
    respiration because more biomass produced by
    plants.

8
Hypothesis test
  • Statistical hypotheses are specific research
    hypotheses that are stated in such a way that
    they may be evaluated by appropriate statistical
    techniques.
  • Example We know the biomass (g) of a plant is
    normally distributed N( 300, 625). We
    watered the plant and measured 4 plants, the
    sample mean biomass ( 315 g). Is the
    difference (315-30015 g) caused by sample error?
  • Null hypothesis, H0.
  • Alternative hypothesis, HA.
  • Under H0, we have a sampling mean distribution,
  • So we can calculate

9
The ideas in hypothesis testing are based on
deductive reasoning - we assume that some
probability model is true and then ask What are
the chances that these observations came from
that probability model?.
10
Steps to Hypothesis Testing
Question We know the biomass (g) of a plant is
normally distributed N( 300, 625). We
watered the plant and measured 4 plants, the
sample mean biomass ( 315 g). Is the
difference (315-30015 g) caused by sample
error? 1. Develop hypotheses a. Null Hypothesis
Generally, the hypothesis that the unknown
parameter equals a fixed value. H0 ? ?0
300 g b. Alternative Hypothesis contradicts the
null hypothesis, there is a difference between
these two values.. HA ? ? ?0
Typically, the alternative hypothesis is the
thing you are trying to prove. You want to prove
that watering increased biomass. The null
hypothesis is a straw man.
11
Steps to Hypothesis Testing
2. Set significant level, ?0.05, z0.051.96 (or
0.01). 3. Determine the approximate test
statistic. Under null hypothesis, we assume this
sample is drawn from XN(300, 625), so
x-barN(300, 625/4). We can use z distribution to
calculate the probability (p) that the
difference of sampling mean and population mean
was caused by random sampling error. 4.
Compare p with alpha. If plt(? 0.05 or 0.01),
reject H0 and accept HA, call the difference
between miu and miu0 significant (or very
significant) otherwise (pgt ? 0.05), accept
null hypothesis H0. Here 0.2302gt0.05, i.e.
there is 23 chance to get a x-bar315 g from a
mean 300g population. We conclude watering did
not change biomass.
12
Steps to Hypothesis Testing
Probability theory used in this decision
making If an even has a very small probability,
it occurs at a certain probability with large
sampling but with only one sampling, this event
should not occur. If it indeed occurs, we
conclude that either the event is a small
probability and something unusual happened (with
probability ?) or, the event is not a small
probability event. Here we set ? to 0.05 or 0.01.
If plt ? , we reject H0, with a small chance that
we make mistake.
13
Steps (summary)
  • Development of hypothesis from knowledge base
  • Set up significant level
  • Determine and generate test statistic and
    calculate probability of the difference caused by
    random error
  • Make Conclusions

14
Example of Hypothesis Test
Packing machine Example The weight (kg) of sugar
bag packed by a machine XN(100,2) under normal
working conditions. One day, we sampled n4 and
get a sample mean of 101 kg. Was the machine
working properly that day? 1. Null hypothesis
H0 ? ?0 100 kg Mean of weight packed that
day was the same as under normal
condition. Alternative hypothesis HA ? ? ?0
(? ? 100 kg). Mean of weight packed that day was
different from the mean under good condition.
15
Example of Hypothesis Test
  • 2. Set significant level, ?0.05, Z0.051.96
  • 3. Calculate statistic Z value
  • 4. Because ZgtZ0.05, (this mean that the
    difference caused by sampling error is smaller
    than 0.05), we reject H0, accept HA, the machine
    was not working properly that day.

16
Determination of Statistical Significance
  • Two ways
  • Calculate the test statistic Z and compare it
    with the critical value at an ? level of
    .05.
  • If Z gt 1.96, then H0 is rejected and the
    results are declared statistically significant
    (i.e., p lt .05).
  • Otherwise, H0 is accepted and the results are
    declared not statistically significant (i.e., p ?
    0.05). We refer to this approach as the
    critical-value method.
  • (2) The exact p-value can be computed, and if p lt
    0..05, then H0 is rejected and the results are
    declared statistically significant . Otherwise,
    if p ? 0.05 then H0 is accepted and the results
    are declared not statistically significant . We
    will refer to this approach as the p-value method
    .

17
Significance Levels
  • Level of significance, ?, is the probability that
    the test statistic was declared significantly
    different.
  • May be different from biologically significant
  • Distribution of the test statistic is divided
    into rejection and acceptance regions
  • We want to keep the ? low
  • Typically 0.05 or 0.01

18
Guidelines for Judging the Significance
If p gt 0.05, then the results are considered not
statistically significant (sometimes denoted by
NS). If 0.01 lt p lt 0.05, then the results are
significant (denoted by ) If 0.001 lt p lt 0.01,
then the results are highly significant (denoted
by ). If p lt .001, then the results are very
highly significant (denoted by). However, if
.05 lt p lt .10, then a trend toward statistically
significance is sometimes noted.
19
Region of acceptance and rejection
Accept H0
Reject H0
Reject H0
X-bar
-1.96 0 1.96
z
  • Computed value of the test statistic that falls
    in the rejection region is said to be
    statistically significantor just significant

20
Two sides or one side test
  • Two-tailed test
  • There are two regions of rejection.
  • Most common, as y-bargtmiu or y-barltmiu
  • One-tailed test
  • When do we use one-tailed test?
  • Based on our knowledge, we can only reject H0 at
    one side. For example 1, the life time of a
    calculator. We want to test if . So
    Example 2.
    The toxins concentration in grain. We care if
    miugtmiu0. So
  • Only one region of rejection (H0 miugtmiu0 left
    side H0 miultmiu0, right side).
  • Same test as two-tailed test, U0.05 or t0.05
    value change from 1.96 to 1.645, easy to reject
    H0.

21
Two sides or one side test
Poison concentration in grain
Life of calculator
22
Interpreting Results of Hypothesis Testing
  • We cannot prove hypotheses, only provide
    support for either the null or alternative
  • i.e., we accept or fail to reject the null
    hypothesis if the test statistic indicates that
    the two groups may be from the same population or
    we reject the null hypothesis if the test
    statistic indicates that they may be from
    different populations
  • Hypothesis testing results are couched in
    probability terms since we can never be 100 sure

23
Types of Error
  • Type I error (? error), ?, is the probability of
    rejecting a true null hypothesis
  • Reflected in the level of significance
  • Typical values are 0.05 or 0.01
  • Type II error (? error), ?, is the probability of
    accepting a false null hypothesis
  • Only an issue if fail to reject the null
    hypothesis
  • Typical values are 0.10 or 0.20
  • Power of a test, 1-?, is the probability of
    correctly rejecting a false null hypothesis
  • Typical values are 0.80 or 0.90

24
Types of Error
Decision / Action Either H0 or HA must be true.
Based on the data we will choose one of these
hypotheses. But what if we choose wrong?! What is
the probability of that happening?
  • significance level
  • P(reject H0 H0 true)
  • 1-? power
  • P(reject H0 HA true)

25
Types of Error
?
? Set point
How to decrease both the alpha and beta
error? 1). Decrease standard error by good
experimental design and large sample size. 2).
Set alpha0.05. If p0.1, you may do more
experiment to further test your hypothesis rather
than accept H0.
26
Hypothesis test for means
  • Single mean
  • Two means
  • Independent samples
  • Paired test

27
Testing a Single Mean
  • Test to compare the mean of a normal distribution
    against a prespecified value, such as a
    population mean
  • Test statistic is
  • for H0 ??0 vs HA ???0
  • with ? unknown and nlt30.
  • Reject if t gt t(n-1,?), accept otherwise
  • t is called a test statistic
  • t?(n-1) is called a critical value

28
Example of one sample mean test
Cholesterol Example Suppose population mean is
211 220 mg/ml s 38.6 mg/ml n 25
(town) H0 m 211 mg/ml HA m ¹ 211
mg/ml For an a 0.05 test we use the critical
value determined from the t(24) distribution.
Since t 1.17 lt 2.064 the difference is not
statistically significant at the a 0.05 level
and we fail to reject H0.
29
Mathematic model of one sample mean
When ngt30 or variance is known, use z
test Otherwise, use t test.
30
Two-sample t-test for Independent Samples
  • Assume two independent samples
  • Sample 1. n1, x1-bar Sample 2. n2, x2-bar.
  • H0 H0 ?1?2 vs HA ?1??2

31
Two-sample t-test for Independent Samples
  • 1) s2 known,
  • use z statistics
  • 2) s2 unknown, but ngt30,
  • use z statistics
  • 3) s2 unknown and nlt30, but s12 s22,
  • use t statistics (dfn1-1n2-1)

32
Two-sample t-test for Independent Samples
  • 4) s2 unknown and nlt30, s12? s22,
  • use t statistics.

Satterthwaite approximation for d.f.
33
Example of two-sample t-test
We test the hypothesis of equal means for the
two populations, assuming a common variance. H0
m1 m2, HA m1 ¹ m2
34
Example of two-sample t-test
35
Example of two-sample t-test
When we arent willing to assume that the
variances are equal we can still test the
population means and use the sample variances.
We use the test statistics
Variances are equal or not?
36
Strategy for Testing Equality of Means
  • Test equality of variances
  • If not equal, use t-test for unequal variances
  • If equal, use t-test for equal variances

37
Mathematical model for two means of independent
samples
  • Two samples

Z test for known variance and t test
38
Paired t-Test
  • Test to deal with two observations with strong
    comparability (e.g. two treatments on the same
    individuals, or one individual Before vs. after
    treatment, very close plots)
  • Sample 1. X11, X12, , X1n
  • Sample 2. X21, X22, , X2n
  • Method
  • Calculate differences between two measurements
    for each individual di Xi1 Xi2
  • Calculate

39
Paired t-Test
40
Example of paired t-test
Test infectness of virus on tobacco leaves.
Number of death pots on leaves.
41
Advantage of paired test
  • 1). Usually , it is easy to
    find a true small difference.
  • 2). Do not need to consider if the variances of
    two populations are same or not.

42
Mathematic model of paired t test
43
SAS programming (hypothesis test)
  • One sample t-test (PROC MEANS)
  • Two samples t-test (independent, PROC TTEST) 
  • Two samples t-test (Paired samples, PROC MEANS) 

44
One sample t-test (PROC MEANS)
  • Example 1. We measured light level at the floor
    under a tree canopy 4 times 3.4, 2.8, 3.5, and
    4.1 klx. According to Beer-Lambert law, the
    theory value for this measurement is 3.0. Does
    the result here differ from the theoretical value?

45
One sample t-test (PROC MEANS)
  • PROC MEANS N MEAN STD T PRT
  • VAR dx
  • RUN
  • OPTIONS LINESIZE80 NODATE
  • DATA new
  • INPUT x
  • dxx-3
  • DATALINES
  • 3.4
  • 2.8
  • 3.5
  • 4.1

Analysis Variable
dx   N Mean Std
Dev t Value Pr gt t
--------------------------------------------------
-------------------- 4
0.4500000 0.5322906 1.69 0.1895
---------------------------------------
-----------------------------
46
Two samples t-test (independent, PROC TTEST) 
  •  
  • PROC TTEST data data set
  • CLASS variables / it identifies the variable(s)
    that divide the data set into
  • two groups. The variable(s) must have only two
    values
  • (numeric or character) /
  • VAR variables
  • PAIRED ab / a-b /
  • RUN

47
Two samples t-test (independent, PROC TTEST) 
  •  
  • PROC TTEST options
  • ALPHAp (p0.05)
  • DATASAS-data-set
  • H0m (m0)

48
Two samples t-test (independent, PROC TTEST) 
  • OPTIONS linesize78 nodate
  • TITLE t-test for independent samples
  • DATA mydata
  • INPUT treatment x _at__at_
  • DATALINES
  • C 80 C 93 C 83 C 89 C 98 T 100
  • T 103 T 104 T 99 T 102
  •  
  • PROC TTEST datamydata
  • CLASS treatment
  • VAR x RUN

49
Two samples t-test (independent, PROC TTEST) 
  •  
  • T-Tests
  •  
  • Variable Method Variances
    DF t Value Pr gt t
  •  
  • x Pooled Equal
    8 -3.83 0.0050
  • x Satterthwaite Unequal
    4.64 -3.83 0.0141
  •  
  •  
  • Equality of Variances
  •  
  • Variable Method Num DF Den DF
    F Value Pr gt F
  •  
  • x Folded F 4 4
    12.40 0.0318

50
Two samples t-test (Paired samples, PROC MEANS) 
  • Example 3 We want to test the light effect on
    sunflower plant photosynthesis. We set two light
    levels, 800 and 1200 mol photon m-2 s-1. We
    selected 6 leaves. For each leaf, we measured one
    side using 800 and another side using 1200 light
    levels. The data are shown as below. If the
    different light levels have different effects on
    sunflower photosynthesis.
  • DATA t_paired
  • INPUT p800 p1200
  • pp1200 - p800
  • DATALINES
  • 90 95
  • 87 92
  • 100 104
  • 80 89
  • 95 101
  • 90 105

51
Two samples t-test (Paired samples, PROC MEANS) 
  • PROC MEANS N MEAN STDERR T PRT
  • VAR p
  • RUN

Analysis
Variable p   N
Mean Std Error t Value Pr gt t
-------------------------------------
-------------------------- 6
7.3333333 1.6865481 4.35
0.0074 -----------------------
----------------------------------------
Write a Comment
User Comments (0)
About PowerShow.com