Classical Hypothesis Testing Theory

About This Presentation

Title:

Classical Hypothesis Testing Theory

Description:

– PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 103

Provided by: alexand97

Category:

more less

Transcript and Presenter's Notes

Title: Classical Hypothesis Testing Theory

1
Classical Hypothesis Testing Theory

Alexander Senf

2
Review

5 steps of classical hypothesis testing (Ch. 3)
Declare null hypothesis H0 and alternate
hypothesis H1
Fix a threshold a for Type I error (1 or 5)
Type I error (a) reject H0 when it is true
Type II error (ß) accept H0 when it is false
Determine a test statistic
a quantity calculated from the data

3
Review

Determine what observed values of the test
statistic should lead to rejection of H0
Significance point K (determined by a)
Test to see if observed data is more extreme than
significance point K
If it is, reject H0
Otherwise, accept H0

4
Overview of Ch. 9

Simple Fixed-Sample-Size Tests
Composite Fixed-Sample-Size Tests
The -2 log ? Approximation
The Analysis of Variance (ANOVA)
Multivariate Methods
ANOVA the Repeated Measures Case
Bootstrap Methods the Two-sample t-test
Sequential Analysis

5
Simple Fixed-Sample-Size Tests
6
The Issue

In the simplest case, everything is specified
Probability distribution of H0 and H1
Including all parameters
a (and K)
But ß is left unspecified
It is desirable to have a procedure that
minimizes ß given a fixed a
This would maximize the power of the test
1-ß, the probability of rejecting H0 when H1 is
true

7
Most Powerful Procedure

Neyman-Pearson Lemma
States that the likelihood-ratio (LR) test is the
most powerful test for a given a
The LR is defined as
where
f0, f1 are completely specified density functions
for H0,H1
X1, X2, Xn are iid random variables

8
Neyman-Pearson Lemma

H0 is rejected when LR K
With a constant K chosen such that
P(LR K when H0 is true) a
Lets look at an example using the Neyman-Pearson
Lemma!
Then we will prove it.

9
Example

Basketball players seem to be taller than average
Use this observation to formulate our hypothesis
H1
Tallness is a factor in the recruitment of KU
basketball players
The null hypothesis, H0, could be
No, the players on KUs team are a just average
height compared to the population in the U.S.
Average height of the team and the population in
general is the same

10
Example

Setup
Average height of males in the US 59 ½
Average height of KU players in 2008 604 ½
Assumption both populations are
normal-distributed centered on their respective
averages (µ0 69.5 in, µ1 76.5 in) and s 2
Sample size 3
Choose a 5

11
Example

The two populations

f0
f1
p
height (inches)
12
Example

Our test statistic is the Likelihood Ratio, LR
Now we need to determine a significance point K
at which we can reject H0, given a 5
P(?(x) K H0 is true) 0.05, determine K

13
Example

So we just need to solve for K and calculate K
How to solve this? Well, we only need one set of
values to calculate K, so lets pick two and
solve for the third
We get one result K371.0803

14
Example

Then we can just plug it in to ? and calculate K

15
Example

With the significance point K 1.66310-7 we can
now test our hypothesis based on observations
E.g. Sasha 83 in, Darrell 81 in, Sherron
71 in
1.4461012 gt 1.66310-7
Therefore, our hypothesis that tallness is a
factor in the recruitment of KU basketball
players is true.

16
Neyman-Pearson Proof

Let A define region in the joint range of X1, X2,
Xn such that LR K. A is the critical region.
If A is the only critical region of size a we are
done
Lets assume another critical region of size a,
defined by B

17
Proof

H0 is rejected if the observed vector (x1, x2, ,
xn) is in A or in B.
Let A and B overlap in region C
Power of the test rejecting H0 when H1 is true
The Power of this test using A is

18
Proof

Define ? ?AL(H1) - ?BL(H1)
The power of the test using A minus using B
Where A\C is the set of points in A but not in C
And B\C contains points in B but not in C

19
Proof

So, in A\C we have
While in B\C we have

Why?
20
Proof

Thus
Which implies that the power of the test using A
is greater than or equal to the power using B.

21
Composite Fixed-Sample-Size Tests
22
Not Identically Distributed

In most cases, random variables are not
identically distributed, at least not in H1
This affects the likelihood function, L
For example, H1 in the two-sample t-test is
Where µ1 and µ2 are different

23
Composite

Further, the hypotheses being tested do not
specify all parameters
They are composite
This chapter only outlines aspects of composite
test theory relevant to the material in this book.

24
Parameter Spaces

The set of values the parameters of interest can
take
Null hypothesis parameters in some region ?
Alternate hypothesis parameters in O
? is usually a subspace of O
Nested hypothesis case
Null hypothesis nested within alternate
hypothesis
This book focuses on this case
if the alternate hypothesis can explain the data
significantly better we can reject the null
hypothesis

25
? Ratio

Optimality theory for composite tests suggests
this as desirable test statistic
Lmax(?) maximum likelihood when parameters are
confined to the region ?
Lmax(O) maximum likelihood when parameters are
confined to the region O, defined by H1
H0 is rejected when ? is sufficiently small (?
Type I error)

26
Example t-tests

The next slides calculate the ?-ratio for the two
sample t-test (with the likelihood)
t-tests later generalize to ANOVA and T2 tests

27
Equal Variance Two-Sided t-test

Setup
Random variables X11,,X1m in group 1 are
Normally and Independently Distributed (µ1,s2)
Random variables X21,,X2n in group 2 are NID
(µ2,s2)
X1i and X2j are independent for all i and j
Null hypothesis H0 µ1 µ2 ( µ, unspecified)
Alternate hypothesis H1 both unspecified

28
Equal Variance Two-Sided t-test

Setup (continued)
s2 is unknown and unspecified in H0 and H1
Is assumed to be the same in both distributions
Region ? is
Region O is

29
Equal Variance Two-Sided t-test

Derivation
H0 writing µ for the mean, when µ1 µ2, the
maximum over likelihood ? is at
And the (common) variance s2 is

30
Equal Variance Two-Sided t-test

Inserting both into the likelihood function, L

31
Equal Variance Two-Sided t-test

Do the same thing for region O
Which produces this likelihood Function, L

32
Equal Variance Two-Sided t-test

The test statistic ? is then

Its the same function, just With different
variances
33
Equal Variance Two-Sided t-test

We can then use the algebraic identity
To show that
Where t is (from Ch. 3)

34
Equal Variance Two-Sided t-test

t is the observed value of T
S is defined in Ch. 3 as

?
We can plot ? as a function of t (e.g. mn10)
t
35
Equal Variance Two-Sided t-test

So, by the monotonicity argument, we can use t2
or t instead of ? as test statistic
Small values of ? correspond to large values of
t
Sufficiently large t lead to rejection of H0
The H0 distribution of t is known
t-distribution with mn-2 degrees of freedom
Significance points are widely available
Once a has been chosen, values of t
sufficiently large to reject H0 can be determined

36
Equal Variance Two-Sided t-test
http//www.socr.ucla.edu/Applets.dir/T-table.html
37
Equal Variance One-Sided t-test

Similar to Two-Sided t-test case
Different region O for H1
Means µ1 and µ2 are not simply different, but one
is larger than the other µ1 µ2
If then maximum likelihood
estimates are the same as for the two-sided case

38
Equal Variance One-Sided t-test

If then the unconstrained maximum
of the likelihood is outside of ?
The unique maximum is at , implying
that the maximum in ? occurs at a boundary point
in O
At this point estimates of µ1 and µ2 are equal
At this point the likelihood ratio is 1 and H0 is
not rejected
Result H0 is rejected in favor of H1 (µ1 µ2)
only for sufficiently large positive values of t

39
Example - Revised

This scenario fits with our original example
H1 is that the average height of KU basketball
players is bigger than for the general population
One-sided test
We could assume that we dont know the averages
for H0 and H1
We actually dont know s (I just guessed 2 in the
original example)

40
Example - Revised

Updated example
Observation in group 1 (KU) X1 83, 81, 71
Observation in group 2 X2 65, 72, 70
Pick significance point for t from a table ta
2.132
t-distribution, mn-2 4 degrees of freedom, a
0.05
Calculate t with our observations
t gt ta, so we can reject H0!

41
Comments

Problems that might arise in other cases
The ?-ratio might not reduce to a function of a
well-known test statistic, such as t
There might not be a unique H0 distribution of ?
Fortunately, the t statistic is a pivotal
quantity
Independent of the parameters not prescribed by
H0
e.g. µ, s
For many testing procedures this property does
not hold

42
Unequal Variance Two-Sided t-test

Identical to Equal Variance Two-Sided t-test
Except variances in group 1 and group 2 are no
longer assumed to be identical
Group 1 NID(µ1, s12)
Group 2 NID(µ2, s22)
With s12 and s22 unknown and not assumed
identical
Region ? µ1 µ2, 0 lt s12, s22 lt 8
O makes no constraints on values µ1, µ2, s12, and
s22

43
Unequal Variance Two-Sided t-test

The likelihood function of (X11, X12, , X1m,
X21, X22, , X2n) then becomes
Under H0 (µ1 µ2 µ), this becomes

44
Unequal Variance Two-Sided t-test

Maximum likelihood estimates , and
satisfy the simultaneous equations

45
Unequal Variance Two-Sided t-test

? cubic equation in
Neither the ? ratio, nor any monotonic function
has a known probability distribution when H0 is
true!
This does not lead to any useful testing
statistic
The t-statistic may be used as reasonably close
However H0 distribution is still unknown, as it
depends on the unknown ratio s12/s22
In practice, a heuristic is often used (see Ch.
3.5)

46
The -2 log ? Approximation
47
The -2 log ? Approximation

Used when the ?-ratio procedure does not lead to
a test statistic whose H0 distribution is known
Example Unequal Variance Two-Sided t-test
Various approximations can be used
But only if certain regularity assumptions and
restrictions hold true

48
The -2 log ? Approximation

Best known approximation
If H0 is true, -2 log ? has an asymptotic
chi-square distribution,
with degrees of freedom equal to the difference
in parameters unspecified by H0 and H1,
respectively.
? is the likelihood ratio
asymptotic as the sample size ? 8
Provides an asymptotically valid testing procedure

49
The -2 log ? Approximation

Restrictions
Parameters must be real numbers that can take on
values in some interval
The maximum likelihood estimator is found at a
turning point of the function
i.e. a real maximum, not at a boundary point
H0 is nested in H1 (as in all previous slides)
These restrictions are important in the proof
I skip the proof

50
The -2 log ? Approximation

Instead
Our original basketball example, revised again
Lets drop our last assumption, that the variance
in the population at large is the same as in the
group of KU basketball players.
All we have left now are our observations and the
hypothesis that µ1 gt µ2
Where µ1 is the average height of Basketball
players
Observation in group 1 (KU) X1 83, 81, 71
Observation in group 2 X2 65, 72, 70

51
Example Revised Again

Using the Unequal Variance One-Sided t-Test
We get

52
The Analysis of Variance (ANOVA)
53
The Analysis of Variance (ANOVA)

Probably the most frequently used hypothesis
testing procedure in statistics
This section
Derives of the Sum of Squares
Gives an outline of the ANOVA procedure
Introduces one-way ANOVA as a generalization of
the two-sample t-test
Two-way and multi-way ANOVA
Further generalizations of ANOVA

54
Sum of Squares

New variables (from Ch. 3)
The two-sample t-test tests for equality of the
means of two groups.
We could express the observations as
Where the Eij are assumed to be NID(0,s2)
H0 is µ1 µ2

55
Sum of Squares

This can also be written as
µ could be seen as overall mean
aj as deviation from µ in group j
This model is overparameterized
Uses more parameters than necessary
Necessitates the requirement
(always assumed imposed)

56
Sum of Squares

We are deriving a test procedure similar to the
two-sample two-sided t-test
Using t as test statistic
Absolute value of the T statistic
This is equivalent to using t2
Because its a monotonic function of t
The square of the t statistic (from Ch. 3)

57
Sum of Squares

can, after algebraic manipulations, be written
as F
where

58
Sum of Squares

B between (among) group sum of squares
W within group sum of squares
B W total sum of squares
Can be shown to be
Total number of degrees of freedom m n 1
Between groups 1
Within groups m n - 2

59
Sum of Squares

This gives us the F statistic
Our goal is to test the significance of the
difference between the means of two groups
B measures the difference
The difference must be measured relative to the
variance within the groups
W measures that
The larger F is, the more significant the
difference

60
The ANOVA Procedure

Subdivide observed total sum of squares into
several components
In our case, B and W
Pick appropriate significance point for a chosen
Type I error a from an F table
Compare the observed components to test our
hypothesis

61
F-Statistic

Significance points depend on degrees of freedom
in B and W
In our case, 1 and (m n 2)

http//www.ento.vt.edu/sharov/PopEcol/tables/f005
.html
62
Comments

The two-group case readily generalizes to any
number of groups.
ANOVAs can be classified in various ways, e.g.
fixed effects models
mixed effects models
random effects model
Difference is discussed later
For now we consider fixed effect models
Parameter ai is fixed, but unknown, in group i

63
Comments

Terminology
Although ANOVA contains the word variance
What we actually test for is a equality in means
between the groups
The different mean assumptions affect the
variance, though
ANOVAs are special cases of regression models
from Ch. 8

64
One-Way ANOVA

One-Way fixed-effect ANOVA
Setup and derivation
Like two-sample t-test for g number of groups
Observations (ni observations, i1,2,,g)
Using overparameterized model for X
Eij assumed NID(0,s2), Sniai 0, ai fixed in
group i

65
One-Way ANOVA

Null Hypothesis H0 is a1 a2 ag 0
Total sum of squares is
This is subdivided into B and W
with

66
One-Way ANOVA

Total degrees of freedom N 1
Subdivided into dfB g 1 and dfW N - g
This gives us our test statistic F
We can now look in the F-table for these degrees
of freedom to pick significance points for B and
W
And calculate B and W from the observed data
And accept or reject H0

67
Example

Revisiting the Basketball example
Looking at it as a One-Way ANOVA analysis
Observation in group 1 (KU) X1 83, 81, 71
Observation in group 2 X2 65, 72, 70
Total Sum of Squares
B (between groups sum of squares)

68
Example

W (within groups sum of squares)
Degrees of freedom
Total N-1 5
dfB g 1 2 - 1 1
dfW N g 6 2 4

69
Example

Table lookup for df 1 and 4 and a 0.05
Critical value F 7.71
Calculate F from our data
So 4.806 lt 7.71
With ANOVA we actually accept H0!
Seems to be the large variance in group 1

70
Same Example with Excel

Screenshots

71
Excel

Offers most of these tests, built-in

72
Two-Way ANOVA

Two-Way Fixed Effects ANOVA
Overview only (in the scope of this book)
More complicated setup example
Expression levels of one gene in lung cancer
patients
a different risk classes
E.g. ultrahigh, very high, intermediate, low
b different age groups
n individuals for each risk/age combination

73
Two-Way ANOVA

Expression levels (our observations) Xijk
i is the risk class (i 1, 2, , a)
j indicates the age group
k corresponds to the individual in each group (k
1, , n)
Each group is a possible risk/age combination
The number of individuals in each group is the
same, n
This is a balanced design
Theory for unbalanced designs is more complicated
and not covered in this book

74
Two-Way ANOVA

The Xijk can be arranged in a table

Risk category
j
i
Age group
Number of individuals in this risk/age group (aka
cell)
This is a two-way table
75
Two-Way ANOVA

The model adopted for each Xijk is
Where Eijk are NID(µ, a2)
The mean of Xijk is µ ai ßi dij
ai is a fixed parameter, additive for risk class
i
ßi is a fixed parameter, additive for age group i
dij is a fixed risk/age interaction parameter
Should be added is a possible group/group
interaction exists

76
Two-Way ANOVA

These constraints are imposed
Siai Sißi 0
Sidij 0 for all j
Sjdij 0 for all i
The total sum of squares is then subdivided into
four groups
Risk class sum of squares
Age group sum of squares
Interaction sum of squares
Within cells (residual or error) sum of
squares

77
Two-Way ANOVA

Associated with each sum of squares
Corresponding degrees of freedom
Hence also a corresponding mean square
Sum of squares divided by degrees of freedom
The mean squares are then compared using F ratios
to test for significance of various effects
First test for a significant risk/age
interaction
F-ratio used is ratio of interaction mean square
and within-cells mean square

78
Two-Way ANOVA

If such an interaction is used, it may not be
reasonable to test for significant risk or age
differences
Example, µ in two risk classes, two age groups
No evidence of interaction
Example of interaction

Risk
Age
Age
79
Multi-Way ANOVA

One-way and two-way fixed effects ANOVAs can be
extended to multi-way ANOVAs
Gets complicated
Example three-way ANOVA model

80
Further generalizations of ANOVA

The 2m factorial design
A particular form of the one-way ANOVA
Interactions between main effects
m factors taken at two levels
E.g. (1) Gender, (2) Tissue (lung, kidney), and
(3) status (affected, not affected)
2m possible combinations of levels/groups
Can test for main effects and interactions
Need replicated experiments
n replications for each of the 2m experiments

81
Further generalizations of ANOVA

Example, m 3, denoted by A, B, C
8 groups, abc, ab, ac, bc, a, b, c, 1
Write totals of n observations Tabc, Tab, , T1
The total between sum of squares can be
subdivided into seven individual sums of squares
Three main effects (A, B, C)
Three pair wise interactions (AB, AC, BC)
One triple-wise interaction (ABC)
Example Sum of squares for A, and for BC,
respectively

82
Further generalizations of ANOVA

If m 5 the number of groups becomes large
Then the total number of observations, n2m is
large
It is possible to reduce the number of
observations by a process
Confounding
Interaction ABC probably very small and not
interesting
So, prefer a model without ABC, reduce data
There are ANOVA designs for that

83
Further generalizations of ANOVA

Fractional Replication
Related to confounding
Sometimes two groups cannot be distinguished from
each other, then they are aliases
E.g. A and BC
This reduces the need to experiments and data
Ch. 13 talks more about this in the context of
microarrays

84
Random/Mixed Effect Models

So far fixed effect models
E.g. Risk class, age group fixed in previous
example
Multiple experiments would use same categories
But what if we took experimental data on several
random days?
The days in itself have no meaning, but a
between days sum of squares must be extracted
What if the days turn out to be important?
If we fail to test for it, the significance of
our procedure is diminished.
Days are a random category, unlike risk and age!

85
Random/Mixed Effect Models

Mixed Effect Models
If some categories are fixed and some are random
Symbols used
Greek letters for fixed effects
Uppercase Roman letters for random effects
Example two-way mixed effect model with
Risk class a and days d and n values collected
each day, the appropriate model is written

86
Random/Mixed Effect Models

Random effect model have no fixed categories
The details on the ANOVA analysis depend on which
effects are random and which are fixed
In a microarray context (more in Ch. 13)
There tend to be several fixed and several random
effects, which complicates the analysis
Many interactions simply assumed zero

87
Multivariate Methods
ANOVA the Repeated Measures Case
Bootstrap Methods the Two-sample t-test
All skipped
88
Sequential Analysis
89
Sequential Analysis

Sequential Probability Ratio
Sample size not known in advance
Depends on outcomes of successive observations
Some of this theory is in BLAST
Basic Local Alignment Search Tool
The book focuses on discreet random variables

90
Sequential Analysis

Consider
Random variable Y with distribution P(y?)
Tests usually relate to the value of parameter ?
H0 ? is ?0
H1 ? is ?1
We can choose a value for the Type I error a
And a value for the Type II error ß
Sampling then continues while

91
Sequential Analysis

A and B are chosen to correspond to an a and ß
Sampling continues until the ratio is less than A
(accept H0) or greater than B (reject H0)
Because these are discreet variables, boundary
overshoot usually occurs
We dont expect to exactly get values a and ß
Desired values for a and ß approximately achieved
by using

92
Sequential Analysis

It is also convenient to take logarithms, which
gives us
Using
We can write

93
Sequential Analysis

Example sequence matching
H0 p0 0.25 (probability of a match is 0.25)
H1 p1 0.35 (probability of a match is 0.35)
Type I error a and Type II error ß chosen 0.01
Yi 1 if there is a match at position i,
otherwise 0
Sampling continues while
with

94
Sequential Analysis

S can be seen as the support offered by Yi for H1
The inequality can be re-written as
This is actually a random walk with step sizes
0.7016 for a match and -0.2984 for a mismatch

95
Sequential Analysis

Power Function for a Sequential Test
Suppose the true value of the parameter of
interest is ?
We wish to know the probability that H1 is
accepted, given ?
This probability is the power ?(?) of the test

96
Sequential Analysis

Where ? is the unique non-zero solution to ? in
R is the range of values of Y
Equivalently, ? is the unique non-zero solution
to ? in
Where S is defined as before

97
Sequential Analysis

This is very similar to Ch. 7 Random Walks
The parameter ? is the same as in Ch. 7
And it will be the same in Ch 10 BLAST
lt skipping the random walk part gt

98
Sequential Analysis

Mean Sample Size
The (random) number of observations until one or
the other hypothesis is accepted
Find approximation by ignoring boundary overshoot
Essentially identical method used to find the
mean number of steps until the random walk stops

99
Sequential Analysis

Two expressions are calculated for SiS1,0(Yi)
One involves the mean sample size
By equating both expressions, solve for mean
sample size

100
Sequential Analysis

So, the mean sample size is
Both numerator and denominator depend on ?(?),
and so also on ?
A generalization applies if Q(y) of Y has
different distribution than H0 and H1 relevant
to BLAST

101
Sequential Analysis

Example
Same sequence matching example as before
H0 p0 0.25 (probability of a match is 0.25)
H1 p1 0.35 (probability of a match is 0.35)
Type I error a and Type II error ß chosen 0.01
Mean sample size equation is
Mean sample size is when H0 is true 194
Mean sample size is when H1 is true 182

102
Sequential Analysis

Boundary Overshoot
So far we assumed no boundary overshoot
In practice, there will almost always be, though
Exact Type I and Type II errors different from a
and ß
Random walk theory can be used to assess how
significant the effects of boundary overshoot are
It can be shown that the sum of Type I and Type
II errors is always less than a ß (also
individually)
BLAST deals with this in a novel way -gt see Ch. 10

Write a Comment

User Comments (0)

About PowerShow.com

Classical Hypothesis Testing Theory - PowerPoint PPT Presentation

Classical Hypothesis Testing Theory

– PowerPoint PPT presentation