Biostatistics and Computer Applications

About This Presentation

Title:

Biostatistics and Computer Applications

Description:

We watered the plant and measured 4 plants, the sample mean biomass ... You want to prove that watering increased biomass. The null hypothesis is a 'straw man' ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 52

Provided by: dafen

Category:

more less

Transcript and Presenter's Notes

Title: Biostatistics and Computer Applications

1
Biostatistics and Computer Applications

Hypothesis Test
one sample
Two sample independent
Paired t test
SAS Programming
12/31/2002

2
Recap (sampling distribution)
Variable X
Sample mean
3
Recap (sampling distribution)
Sample mean
Sample variance
4
Statistical inference

Definition
The method to infer a population (parameter)
from a sample (statistic) according to the
probability theory and sampling distribution.
Hypothesis test
Parameter estimation

5
Parameter estimation

Point estimation and Interval estimation
Interval estimation According to the sampling
distribution, set an interval for
parameter ( of a
population that the probability is within
the interval is , i.e.
L1 and L2 are parameter s
confidence limits L1, L2 is confidence
interval (CI) is the confidence
coefficient.

6
Hypothesis test

Hypothesis test
Based on certain knowledge, we set null and
alternate hypotheses for a statistical
population, then calculate a test statistic and
determine the probability, and make a conclusion
to either reject or accept certain hypothesis
based on the probability.
Example
Test two fertilizers. We measured 20 plants per
treatment. The mean biomasses are 502 g and
550 g, the different in biomass is 48 g. Is
this difference due to the difference in
population means or just because the random
sampling error?

7
Hypothesis test

Research hypotheses, as opposed to statistical
hypotheses, are the research questions that drive
the research
e.g., lowering the fat in a persons diet lowers
the blood cholesterol levels
e.g., climate change warming increases soil
respiration because more biomass produced by
plants.

8
Hypothesis test

Statistical hypotheses are specific research
hypotheses that are stated in such a way that
they may be evaluated by appropriate statistical
techniques.
Example We know the biomass (g) of a plant is
normally distributed N( 300, 625). We
watered the plant and measured 4 plants, the
sample mean biomass ( 315 g). Is the
difference (315-30015 g) caused by sample error?
Null hypothesis, H0.
Alternative hypothesis, HA.
Under H0, we have a sampling mean distribution,
So we can calculate

9
The ideas in hypothesis testing are based on
deductive reasoning - we assume that some
probability model is true and then ask What are
the chances that these observations came from
that probability model?.
10
Steps to Hypothesis Testing
Question We know the biomass (g) of a plant is
normally distributed N( 300, 625). We
watered the plant and measured 4 plants, the
sample mean biomass ( 315 g). Is the
difference (315-30015 g) caused by sample
error? 1. Develop hypotheses a. Null Hypothesis
Generally, the hypothesis that the unknown
parameter equals a fixed value. H0 ? ?0
300 g b. Alternative Hypothesis contradicts the
null hypothesis, there is a difference between
these two values.. HA ? ? ?0
Typically, the alternative hypothesis is the
thing you are trying to prove. You want to prove
that watering increased biomass. The null
hypothesis is a straw man.
11
Steps to Hypothesis Testing
2. Set significant level, ?0.05, z0.051.96 (or
0.01). 3. Determine the approximate test
statistic. Under null hypothesis, we assume this
sample is drawn from XN(300, 625), so
x-barN(300, 625/4). We can use z distribution to
calculate the probability (p) that the
difference of sampling mean and population mean
was caused by random sampling error. 4.
Compare p with alpha. If plt(? 0.05 or 0.01),
reject H0 and accept HA, call the difference
between miu and miu0 significant (or very
significant) otherwise (pgt ? 0.05), accept
null hypothesis H0. Here 0.2302gt0.05, i.e.
there is 23 chance to get a x-bar315 g from a
mean 300g population. We conclude watering did
not change biomass.
12
Steps to Hypothesis Testing
Probability theory used in this decision
making If an even has a very small probability,
it occurs at a certain probability with large
sampling but with only one sampling, this event
should not occur. If it indeed occurs, we
conclude that either the event is a small
probability and something unusual happened (with
probability ?) or, the event is not a small
probability event. Here we set ? to 0.05 or 0.01.
If plt ? , we reject H0, with a small chance that
we make mistake.
13
Steps (summary)

Development of hypothesis from knowledge base
Set up significant level
Determine and generate test statistic and
calculate probability of the difference caused by
random error
Make Conclusions

14
Example of Hypothesis Test
Packing machine Example The weight (kg) of sugar
bag packed by a machine XN(100,2) under normal
working conditions. One day, we sampled n4 and
get a sample mean of 101 kg. Was the machine
working properly that day? 1. Null hypothesis
H0 ? ?0 100 kg Mean of weight packed that
day was the same as under normal
condition. Alternative hypothesis HA ? ? ?0
(? ? 100 kg). Mean of weight packed that day was
different from the mean under good condition.
15
Example of Hypothesis Test

2. Set significant level, ?0.05, Z0.051.96
3. Calculate statistic Z value
4. Because ZgtZ0.05, (this mean that the
difference caused by sampling error is smaller
than 0.05), we reject H0, accept HA, the machine
was not working properly that day.

16
Determination of Statistical Significance

Two ways
Calculate the test statistic Z and compare it
with the critical value at an ? level of
.05.
If Z gt 1.96, then H0 is rejected and the
results are declared statistically significant
(i.e., p lt .05).
Otherwise, H0 is accepted and the results are
declared not statistically significant (i.e., p ?
0.05). We refer to this approach as the
critical-value method.
(2) The exact p-value can be computed, and if p lt
0..05, then H0 is rejected and the results are
declared statistically significant . Otherwise,
if p ? 0.05 then H0 is accepted and the results
are declared not statistically significant . We
will refer to this approach as the p-value method
.

17
Significance Levels

Level of significance, ?, is the probability that
the test statistic was declared significantly
different.
May be different from biologically significant
Distribution of the test statistic is divided
into rejection and acceptance regions
We want to keep the ? low
Typically 0.05 or 0.01

18
Guidelines for Judging the Significance
If p gt 0.05, then the results are considered not
statistically significant (sometimes denoted by
NS). If 0.01 lt p lt 0.05, then the results are
significant (denoted by ) If 0.001 lt p lt 0.01,
then the results are highly significant (denoted
by ). If p lt .001, then the results are very
highly significant (denoted by). However, if
.05 lt p lt .10, then a trend toward statistically
significance is sometimes noted.
19
Region of acceptance and rejection
Accept H0
Reject H0
Reject H0
X-bar
-1.96 0 1.96
z

Computed value of the test statistic that falls
in the rejection region is said to be
statistically significantor just significant

20
Two sides or one side test

Two-tailed test
There are two regions of rejection.
Most common, as y-bargtmiu or y-barltmiu
One-tailed test
When do we use one-tailed test?
Based on our knowledge, we can only reject H0 at
one side. For example 1, the life time of a
calculator. We want to test if . So
Example 2.
The toxins concentration in grain. We care if
miugtmiu0. So
Only one region of rejection (H0 miugtmiu0 left
side H0 miultmiu0, right side).
Same test as two-tailed test, U0.05 or t0.05
value change from 1.96 to 1.645, easy to reject
H0.

21
Two sides or one side test
Poison concentration in grain
Life of calculator
22
Interpreting Results of Hypothesis Testing

We cannot prove hypotheses, only provide
support for either the null or alternative
i.e., we accept or fail to reject the null
hypothesis if the test statistic indicates that
the two groups may be from the same population or
we reject the null hypothesis if the test
statistic indicates that they may be from
different populations
Hypothesis testing results are couched in
probability terms since we can never be 100 sure

23
Types of Error

Type I error (? error), ?, is the probability of
rejecting a true null hypothesis
Reflected in the level of significance
Typical values are 0.05 or 0.01
Type II error (? error), ?, is the probability of
accepting a false null hypothesis
Only an issue if fail to reject the null
hypothesis
Typical values are 0.10 or 0.20
Power of a test, 1-?, is the probability of
correctly rejecting a false null hypothesis
Typical values are 0.80 or 0.90

24
Types of Error
Decision / Action Either H0 or HA must be true.
Based on the data we will choose one of these
hypotheses. But what if we choose wrong?! What is
the probability of that happening?

significance level
P(reject H0 H0 true)
1-? power
P(reject H0 HA true)

25
Types of Error
?
? Set point
How to decrease both the alpha and beta
error? 1). Decrease standard error by good
experimental design and large sample size. 2).
Set alpha0.05. If p0.1, you may do more
experiment to further test your hypothesis rather
than accept H0.
26
Hypothesis test for means

Single mean
Two means
Independent samples
Paired test

27
Testing a Single Mean

Test to compare the mean of a normal distribution
against a prespecified value, such as a
population mean
Test statistic is
for H0 ??0 vs HA ???0
with ? unknown and nlt30.
Reject if t gt t(n-1,?), accept otherwise
t is called a test statistic
t?(n-1) is called a critical value

28
Example of one sample mean test
Cholesterol Example Suppose population mean is
211 220 mg/ml s 38.6 mg/ml n 25
(town) H0 m 211 mg/ml HA m ¹ 211
mg/ml For an a 0.05 test we use the critical
value determined from the t(24) distribution.
Since t 1.17 lt 2.064 the difference is not
statistically significant at the a 0.05 level
and we fail to reject H0.
29
Mathematic model of one sample mean
When ngt30 or variance is known, use z
test Otherwise, use t test.
30
Two-sample t-test for Independent Samples

Assume two independent samples
Sample 1. n1, x1-bar Sample 2. n2, x2-bar.
H0 H0 ?1?2 vs HA ?1??2

31
Two-sample t-test for Independent Samples

1) s2 known,
use z statistics
2) s2 unknown, but ngt30,
use z statistics
3) s2 unknown and nlt30, but s12 s22,
use t statistics (dfn1-1n2-1)

32
Two-sample t-test for Independent Samples

4) s2 unknown and nlt30, s12? s22,
use t statistics.

Satterthwaite approximation for d.f.
33
Example of two-sample t-test
We test the hypothesis of equal means for the
two populations, assuming a common variance. H0
m1 m2, HA m1 ¹ m2
34
Example of two-sample t-test
35
Example of two-sample t-test
When we arent willing to assume that the
variances are equal we can still test the
population means and use the sample variances.
We use the test statistics
Variances are equal or not?
36
Strategy for Testing Equality of Means

Test equality of variances
If not equal, use t-test for unequal variances
If equal, use t-test for equal variances

37
Mathematical model for two means of independent
samples

Two samples

Z test for known variance and t test
38
Paired t-Test

Test to deal with two observations with strong
comparability (e.g. two treatments on the same
individuals, or one individual Before vs. after
treatment, very close plots)
Sample 1. X11, X12, , X1n
Sample 2. X21, X22, , X2n
Method
Calculate differences between two measurements
for each individual di Xi1 Xi2
Calculate

39
Paired t-Test
40
Example of paired t-test
Test infectness of virus on tobacco leaves.
Number of death pots on leaves.
41
Advantage of paired test

1). Usually , it is easy to
find a true small difference.
2). Do not need to consider if the variances of
two populations are same or not.

42
Mathematic model of paired t test
43
SAS programming (hypothesis test)

One sample t-test (PROC MEANS)
Two samples t-test (independent, PROC TTEST)
Two samples t-test (Paired samples, PROC MEANS)

44
One sample t-test (PROC MEANS)

Example 1. We measured light level at the floor
under a tree canopy 4 times 3.4, 2.8, 3.5, and
4.1 klx. According to Beer-Lambert law, the
theory value for this measurement is 3.0. Does
the result here differ from the theoretical value?

45
One sample t-test (PROC MEANS)

PROC MEANS N MEAN STD T PRT
VAR dx
RUN

OPTIONS LINESIZE80 NODATE
DATA new
INPUT x
dxx-3
DATALINES
3.4
2.8
3.5
4.1

Analysis Variable
dx N Mean Std
Dev t Value Pr gt t
--------------------------------------------------
-------------------- 4
0.4500000 0.5322906 1.69 0.1895
---------------------------------------
-----------------------------
46
Two samples t-test (independent, PROC TTEST)

PROC TTEST data data set
CLASS variables / it identifies the variable(s)
that divide the data set into
two groups. The variable(s) must have only two
values
(numeric or character) /
VAR variables
PAIRED ab / a-b /
RUN

47
Two samples t-test (independent, PROC TTEST)

PROC TTEST options
ALPHAp (p0.05)
DATASAS-data-set
H0m (m0)

48
Two samples t-test (independent, PROC TTEST)

OPTIONS linesize78 nodate
TITLE t-test for independent samples
DATA mydata
INPUT treatment x _at__at_
DATALINES
C 80 C 93 C 83 C 89 C 98 T 100
T 103 T 104 T 99 T 102
PROC TTEST datamydata
CLASS treatment
VAR x RUN

49
Two samples t-test (independent, PROC TTEST)

T-Tests
Variable Method Variances
DF t Value Pr gt t
x Pooled Equal
8 -3.83 0.0050
x Satterthwaite Unequal
4.64 -3.83 0.0141
Equality of Variances
Variable Method Num DF Den DF
F Value Pr gt F
x Folded F 4 4
12.40 0.0318

50
Two samples t-test (Paired samples, PROC MEANS)

Example 3 We want to test the light effect on
sunflower plant photosynthesis. We set two light
levels, 800 and 1200 mol photon m-2 s-1. We
selected 6 leaves. For each leaf, we measured one
side using 800 and another side using 1200 light
levels. The data are shown as below. If the
different light levels have different effects on
sunflower photosynthesis.
DATA t_paired
INPUT p800 p1200
pp1200 - p800

DATALINES
90 95
87 92
100 104
80 89
95 101
90 105

51
Two samples t-test (Paired samples, PROC MEANS)

PROC MEANS N MEAN STDERR T PRT
VAR p
RUN

Analysis
Variable p N
Mean Std Error t Value Pr gt t
-------------------------------------
-------------------------- 6
7.3333333 1.6865481 4.35
0.0074 -----------------------
----------------------------------------

Write a Comment

User Comments (0)