Title: Classical Hypothesis Testing Theory
1Classical Hypothesis Testing Theory
- 5 steps of classical hypothesis testing (Ch. 3)
- Declare null hypothesis H0 and alternate
hypothesis H1 - Fix a threshold a for Type I error (1 or 5)
- Type I error (a) reject H0 when it is true
- Type II error (ß) accept H0 when it is false
- Determine a test statistic
- a quantity calculated from the data
- Determine what observed values of the test
statistic should lead to rejection of H0 - Significance point K (determined by a)
- Test to see if observed data is more extreme than
significance point K - If it is, reject H0
- Otherwise, accept H0
4Overview of Ch. 9
- Simple Fixed-Sample-Size Tests
- Composite Fixed-Sample-Size Tests
- The -2 log ? Approximation
- The Analysis of Variance (ANOVA)
- Multivariate Methods
- ANOVA the Repeated Measures Case
- Bootstrap Methods the Two-sample t-test
- Sequential Analysis
5Simple Fixed-Sample-Size Tests
6The Issue
- In the simplest case, everything is specified
- Probability distribution of H0 and H1
- Including all parameters
- a (and K)
- But ß is left unspecified
- It is desirable to have a procedure that
minimizes ß given a fixed a - This would maximize the power of the test
- 1-ß, the probability of rejecting H0 when H1 is
7Most Powerful Procedure
- Neyman-Pearson Lemma
- States that the likelihood-ratio (LR) test is the
most powerful test for a given a - The LR is defined as
- where
- f0, f1 are completely specified density functions
for H0,H1 - X1, X2, Xn are iid random variables
8Neyman-Pearson Lemma
- H0 is rejected when LR K
- With a constant K chosen such that
- P(LR K when H0 is true) a
- Lets look at an example using the Neyman-Pearson
Lemma! - Then we will prove it.
- Basketball players seem to be taller than average
- Use this observation to formulate our hypothesis
H1 - Tallness is a factor in the recruitment of KU
basketball players - The null hypothesis, H0, could be
- No, the players on KUs team are a just average
height compared to the population in the U.S. - Average height of the team and the population in
general is the same
- Setup
- Average height of males in the US 59 ½
- Average height of KU players in 2008 604 ½
- Assumption both populations are
normal-distributed centered on their respective
averages (µ0 69.5 in, µ1 76.5 in) and s 2 - Sample size 3
- Choose a 5
height (inches)
- Our test statistic is the Likelihood Ratio, LR
- Now we need to determine a significance point K
at which we can reject H0, given a 5 - P(?(x) K H0 is true) 0.05, determine K
- So we just need to solve for K and calculate K
- How to solve this? Well, we only need one set of
values to calculate K, so lets pick two and
solve for the third - We get one result K371.0803
- Then we can just plug it in to ? and calculate K
- With the significance point K 1.66310-7 we can
now test our hypothesis based on observations - E.g. Sasha 83 in, Darrell 81 in, Sherron
71 in - 1.4461012 gt 1.66310-7
- Therefore, our hypothesis that tallness is a
factor in the recruitment of KU basketball
players is true.
16Neyman-Pearson Proof
- Let A define region in the joint range of X1, X2,
Xn such that LR K. A is the critical region. - If A is the only critical region of size a we are
done - Lets assume another critical region of size a,
defined by B
- H0 is rejected if the observed vector (x1, x2, ,
xn) is in A or in B. - Let A and B overlap in region C
- Power of the test rejecting H0 when H1 is true
- The Power of this test using A is
- Define ? ?AL(H1) - ?BL(H1)
- The power of the test using A minus using B
- Where A\C is the set of points in A but not in C
- And B\C contains points in B but not in C
- So, in A\C we have
- While in B\C we have
- Thus
- Which implies that the power of the test using A
is greater than or equal to the power using B.
21Composite Fixed-Sample-Size Tests
22Not Identically Distributed
- In most cases, random variables are not
identically distributed, at least not in H1 - This affects the likelihood function, L
- For example, H1 in the two-sample t-test is
- Where µ1 and µ2 are different
- Further, the hypotheses being tested do not
specify all parameters - They are composite
- This chapter only outlines aspects of composite
test theory relevant to the material in this book.
24Parameter Spaces
- The set of values the parameters of interest can
take - Null hypothesis parameters in some region ?
- Alternate hypothesis parameters in O
- ? is usually a subspace of O
- Nested hypothesis case
- Null hypothesis nested within alternate
hypothesis - This book focuses on this case
- if the alternate hypothesis can explain the data
significantly better we can reject the null
25? Ratio
- Optimality theory for composite tests suggests
this as desirable test statistic - Lmax(?) maximum likelihood when parameters are
confined to the region ? - Lmax(O) maximum likelihood when parameters are
confined to the region O, defined by H1 - H0 is rejected when ? is sufficiently small (?
Type I error)
26Example t-tests
- The next slides calculate the ?-ratio for the two
sample t-test (with the likelihood) - t-tests later generalize to ANOVA and T2 tests
27Equal Variance Two-Sided t-test
- Setup
- Random variables X11,,X1m in group 1 are
Normally and Independently Distributed (µ1,s2) - Random variables X21,,X2n in group 2 are NID
(µ2,s2) - X1i and X2j are independent for all i and j
- Null hypothesis H0 µ1 µ2 ( µ, unspecified)
- Alternate hypothesis H1 both unspecified
28Equal Variance Two-Sided t-test
- Setup (continued)
- s2 is unknown and unspecified in H0 and H1
- Is assumed to be the same in both distributions
- Region ? is
- Region O is
29Equal Variance Two-Sided t-test
- Derivation
- H0 writing µ for the mean, when µ1 µ2, the
maximum over likelihood ? is at - And the (common) variance s2 is
30Equal Variance Two-Sided t-test
- Inserting both into the likelihood function, L
31Equal Variance Two-Sided t-test
- Do the same thing for region O
- Which produces this likelihood Function, L
32Equal Variance Two-Sided t-test
- The test statistic ? is then
Its the same function, just With different
33Equal Variance Two-Sided t-test
- We can then use the algebraic identity
- To show that
- Where t is (from Ch. 3)
34Equal Variance Two-Sided t-test
- t is the observed value of T
- S is defined in Ch. 3 as
We can plot ? as a function of t (e.g. mn10)
35Equal Variance Two-Sided t-test
- So, by the monotonicity argument, we can use t2
or t instead of ? as test statistic - Small values of ? correspond to large values of
t - Sufficiently large t lead to rejection of H0
- The H0 distribution of t is known
- t-distribution with mn-2 degrees of freedom
- Significance points are widely available
- Once a has been chosen, values of t
sufficiently large to reject H0 can be determined
36Equal Variance Two-Sided t-test
37Equal Variance One-Sided t-test
- Similar to Two-Sided t-test case
- Different region O for H1
- Means µ1 and µ2 are not simply different, but one
is larger than the other µ1 µ2 - If then maximum likelihood
estimates are the same as for the two-sided case
38Equal Variance One-Sided t-test
- If then the unconstrained maximum
of the likelihood is outside of ? - The unique maximum is at , implying
that the maximum in ? occurs at a boundary point
in O - At this point estimates of µ1 and µ2 are equal
- At this point the likelihood ratio is 1 and H0 is
not rejected - Result H0 is rejected in favor of H1 (µ1 µ2)
only for sufficiently large positive values of t
39Example - Revised
- This scenario fits with our original example
- H1 is that the average height of KU basketball
players is bigger than for the general population - One-sided test
- We could assume that we dont know the averages
for H0 and H1 - We actually dont know s (I just guessed 2 in the
original example)
40Example - Revised
- Updated example
- Observation in group 1 (KU) X1 83, 81, 71
- Observation in group 2 X2 65, 72, 70
- Pick significance point for t from a table ta
2.132 - t-distribution, mn-2 4 degrees of freedom, a
0.05 - Calculate t with our observations
- t gt ta, so we can reject H0!
- Problems that might arise in other cases
- The ?-ratio might not reduce to a function of a
well-known test statistic, such as t - There might not be a unique H0 distribution of ?
- Fortunately, the t statistic is a pivotal
quantity - Independent of the parameters not prescribed by
H0 - e.g. µ, s
- For many testing procedures this property does
not hold
42Unequal Variance Two-Sided t-test
- Identical to Equal Variance Two-Sided t-test
- Except variances in group 1 and group 2 are no
longer assumed to be identical - Group 1 NID(µ1, s12)
- Group 2 NID(µ2, s22)
- With s12 and s22 unknown and not assumed
identical - Region ? µ1 µ2, 0 lt s12, s22 lt 8
- O makes no constraints on values µ1, µ2, s12, and
43Unequal Variance Two-Sided t-test
- The likelihood function of (X11, X12, , X1m,
X21, X22, , X2n) then becomes - Under H0 (µ1 µ2 µ), this becomes
44Unequal Variance Two-Sided t-test
- Maximum likelihood estimates , and
satisfy the simultaneous equations
45Unequal Variance Two-Sided t-test
- ? cubic equation in
- Neither the ? ratio, nor any monotonic function
has a known probability distribution when H0 is
true! - This does not lead to any useful testing
statistic - The t-statistic may be used as reasonably close
- However H0 distribution is still unknown, as it
depends on the unknown ratio s12/s22 - In practice, a heuristic is often used (see Ch.
46The -2 log ? Approximation
47The -2 log ? Approximation
- Used when the ?-ratio procedure does not lead to
a test statistic whose H0 distribution is known - Example Unequal Variance Two-Sided t-test
- Various approximations can be used
- But only if certain regularity assumptions and
restrictions hold true
48The -2 log ? Approximation
- Best known approximation
- If H0 is true, -2 log ? has an asymptotic
chi-square distribution, - with degrees of freedom equal to the difference
in parameters unspecified by H0 and H1,
respectively. - ? is the likelihood ratio
- asymptotic as the sample size ? 8
- Provides an asymptotically valid testing procedure
49The -2 log ? Approximation
- Restrictions
- Parameters must be real numbers that can take on
values in some interval - The maximum likelihood estimator is found at a
turning point of the function - i.e. a real maximum, not at a boundary point
- H0 is nested in H1 (as in all previous slides)
- These restrictions are important in the proof
- I skip the proof
50The -2 log ? Approximation
- Instead
- Our original basketball example, revised again
- Lets drop our last assumption, that the variance
in the population at large is the same as in the
group of KU basketball players. - All we have left now are our observations and the
hypothesis that µ1 gt µ2 - Where µ1 is the average height of Basketball
players - Observation in group 1 (KU) X1 83, 81, 71
- Observation in group 2 X2 65, 72, 70
51Example Revised Again
- Using the Unequal Variance One-Sided t-Test
- We get
52The Analysis of Variance (ANOVA)
53The Analysis of Variance (ANOVA)
- Probably the most frequently used hypothesis
testing procedure in statistics - This section
- Derives of the Sum of Squares
- Gives an outline of the ANOVA procedure
- Introduces one-way ANOVA as a generalization of
the two-sample t-test - Two-way and multi-way ANOVA
- Further generalizations of ANOVA
54Sum of Squares
- New variables (from Ch. 3)
- The two-sample t-test tests for equality of the
means of two groups. - We could express the observations as
- Where the Eij are assumed to be NID(0,s2)
- H0 is µ1 µ2
55Sum of Squares
- This can also be written as
- µ could be seen as overall mean
- aj as deviation from µ in group j
- This model is overparameterized
- Uses more parameters than necessary
- Necessitates the requirement
- (always assumed imposed)
56Sum of Squares
- We are deriving a test procedure similar to the
two-sample two-sided t-test - Using t as test statistic
- Absolute value of the T statistic
- This is equivalent to using t2
- Because its a monotonic function of t
- The square of the t statistic (from Ch. 3)
57Sum of Squares
- can, after algebraic manipulations, be written
as F - where
58Sum of Squares
- B between (among) group sum of squares
- W within group sum of squares
- B W total sum of squares
- Can be shown to be
- Total number of degrees of freedom m n 1
- Between groups 1
- Within groups m n - 2
59Sum of Squares
- This gives us the F statistic
- Our goal is to test the significance of the
difference between the means of two groups - B measures the difference
- The difference must be measured relative to the
variance within the groups - W measures that
- The larger F is, the more significant the
60The ANOVA Procedure
- Subdivide observed total sum of squares into
several components - In our case, B and W
- Pick appropriate significance point for a chosen
Type I error a from an F table - Compare the observed components to test our
- Significance points depend on degrees of freedom
in B and W - In our case, 1 and (m n 2)
- The two-group case readily generalizes to any
number of groups. - ANOVAs can be classified in various ways, e.g.
- fixed effects models
- mixed effects models
- random effects model
- Difference is discussed later
- For now we consider fixed effect models
- Parameter ai is fixed, but unknown, in group i
- Terminology
- Although ANOVA contains the word variance
- What we actually test for is a equality in means
between the groups - The different mean assumptions affect the
variance, though - ANOVAs are special cases of regression models
from Ch. 8
64One-Way ANOVA
- One-Way fixed-effect ANOVA
- Setup and derivation
- Like two-sample t-test for g number of groups
- Observations (ni observations, i1,2,,g)
- Using overparameterized model for X
- Eij assumed NID(0,s2), Sniai 0, ai fixed in
group i
65One-Way ANOVA
- Null Hypothesis H0 is a1 a2 ag 0
- Total sum of squares is
- This is subdivided into B and W
- with
66One-Way ANOVA
- Total degrees of freedom N 1
- Subdivided into dfB g 1 and dfW N - g
- This gives us our test statistic F
- We can now look in the F-table for these degrees
of freedom to pick significance points for B and
W - And calculate B and W from the observed data
- And accept or reject H0
- Revisiting the Basketball example
- Looking at it as a One-Way ANOVA analysis
- Observation in group 1 (KU) X1 83, 81, 71
- Observation in group 2 X2 65, 72, 70
- Total Sum of Squares
- B (between groups sum of squares)
- W (within groups sum of squares)
- Degrees of freedom
- Total N-1 5
- dfB g 1 2 - 1 1
- dfW N g 6 2 4
- Table lookup for df 1 and 4 and a 0.05
- Critical value F 7.71
- Calculate F from our data
- So 4.806 lt 7.71
- With ANOVA we actually accept H0!
- Seems to be the large variance in group 1
70Same Example with Excel
- Offers most of these tests, built-in
72Two-Way ANOVA
- Two-Way Fixed Effects ANOVA
- Overview only (in the scope of this book)
- More complicated setup example
- Expression levels of one gene in lung cancer
patients - a different risk classes
- E.g. ultrahigh, very high, intermediate, low
- b different age groups
- n individuals for each risk/age combination
73Two-Way ANOVA
- Expression levels (our observations) Xijk
- i is the risk class (i 1, 2, , a)
- j indicates the age group
- k corresponds to the individual in each group (k
1, , n) - Each group is a possible risk/age combination
- The number of individuals in each group is the
same, n - This is a balanced design
- Theory for unbalanced designs is more complicated
and not covered in this book
74Two-Way ANOVA
- The Xijk can be arranged in a table
Risk category
Age group
Number of individuals in this risk/age group (aka
This is a two-way table
75Two-Way ANOVA
- The model adopted for each Xijk is
- Where Eijk are NID(µ, a2)
- The mean of Xijk is µ ai ßi dij
- ai is a fixed parameter, additive for risk class
i - ßi is a fixed parameter, additive for age group i
- dij is a fixed risk/age interaction parameter
- Should be added is a possible group/group
interaction exists
76Two-Way ANOVA
- These constraints are imposed
- Siai Sißi 0
- Sidij 0 for all j
- Sjdij 0 for all i
- The total sum of squares is then subdivided into
four groups - Risk class sum of squares
- Age group sum of squares
- Interaction sum of squares
- Within cells (residual or error) sum of
77Two-Way ANOVA
- Associated with each sum of squares
- Corresponding degrees of freedom
- Hence also a corresponding mean square
- Sum of squares divided by degrees of freedom
- The mean squares are then compared using F ratios
to test for significance of various effects - First test for a significant risk/age
interaction - F-ratio used is ratio of interaction mean square
and within-cells mean square
78Two-Way ANOVA
- If such an interaction is used, it may not be
reasonable to test for significant risk or age
differences - Example, µ in two risk classes, two age groups
- No evidence of interaction
- Example of interaction
79Multi-Way ANOVA
- One-way and two-way fixed effects ANOVAs can be
extended to multi-way ANOVAs - Gets complicated
- Example three-way ANOVA model
80Further generalizations of ANOVA
- The 2m factorial design
- A particular form of the one-way ANOVA
- Interactions between main effects
- m factors taken at two levels
- E.g. (1) Gender, (2) Tissue (lung, kidney), and
(3) status (affected, not affected) - 2m possible combinations of levels/groups
- Can test for main effects and interactions
- Need replicated experiments
- n replications for each of the 2m experiments
81Further generalizations of ANOVA
- Example, m 3, denoted by A, B, C
- 8 groups, abc, ab, ac, bc, a, b, c, 1
- Write totals of n observations Tabc, Tab, , T1
- The total between sum of squares can be
subdivided into seven individual sums of squares - Three main effects (A, B, C)
- Three pair wise interactions (AB, AC, BC)
- One triple-wise interaction (ABC)
- Example Sum of squares for A, and for BC,
82Further generalizations of ANOVA
- If m 5 the number of groups becomes large
- Then the total number of observations, n2m is
large - It is possible to reduce the number of
observations by a process - Confounding
- Interaction ABC probably very small and not
interesting - So, prefer a model without ABC, reduce data
- There are ANOVA designs for that
83Further generalizations of ANOVA
- Fractional Replication
- Related to confounding
- Sometimes two groups cannot be distinguished from
each other, then they are aliases - E.g. A and BC
- This reduces the need to experiments and data
- Ch. 13 talks more about this in the context of
84Random/Mixed Effect Models
- So far fixed effect models
- E.g. Risk class, age group fixed in previous
example - Multiple experiments would use same categories
- But what if we took experimental data on several
random days? - The days in itself have no meaning, but a
between days sum of squares must be extracted - What if the days turn out to be important?
- If we fail to test for it, the significance of
our procedure is diminished. - Days are a random category, unlike risk and age!
85Random/Mixed Effect Models
- Mixed Effect Models
- If some categories are fixed and some are random
- Symbols used
- Greek letters for fixed effects
- Uppercase Roman letters for random effects
- Example two-way mixed effect model with
- Risk class a and days d and n values collected
each day, the appropriate model is written
86Random/Mixed Effect Models
- Random effect model have no fixed categories
- The details on the ANOVA analysis depend on which
effects are random and which are fixed - In a microarray context (more in Ch. 13)
- There tend to be several fixed and several random
effects, which complicates the analysis - Many interactions simply assumed zero
87Multivariate Methods
ANOVA the Repeated Measures Case
Bootstrap Methods the Two-sample t-test
All skipped
88Sequential Analysis
89Sequential Analysis
- Sequential Probability Ratio
- Sample size not known in advance
- Depends on outcomes of successive observations
- Some of this theory is in BLAST
- Basic Local Alignment Search Tool
- The book focuses on discreet random variables
90Sequential Analysis
- Consider
- Random variable Y with distribution P(y?)
- Tests usually relate to the value of parameter ?
- H0 ? is ?0
- H1 ? is ?1
- We can choose a value for the Type I error a
- And a value for the Type II error ß
- Sampling then continues while
91Sequential Analysis
- A and B are chosen to correspond to an a and ß
- Sampling continues until the ratio is less than A
(accept H0) or greater than B (reject H0) - Because these are discreet variables, boundary
overshoot usually occurs - We dont expect to exactly get values a and ß
- Desired values for a and ß approximately achieved
by using
92Sequential Analysis
- It is also convenient to take logarithms, which
gives us - Using
- We can write
93Sequential Analysis
- Example sequence matching
- H0 p0 0.25 (probability of a match is 0.25)
- H1 p1 0.35 (probability of a match is 0.35)
- Type I error a and Type II error ß chosen 0.01
- Yi 1 if there is a match at position i,
otherwise 0 - Sampling continues while
- with
94Sequential Analysis
- S can be seen as the support offered by Yi for H1
- The inequality can be re-written as
- This is actually a random walk with step sizes
0.7016 for a match and -0.2984 for a mismatch
95Sequential Analysis
- Power Function for a Sequential Test
- Suppose the true value of the parameter of
interest is ? - We wish to know the probability that H1 is
accepted, given ? - This probability is the power ?(?) of the test
96Sequential Analysis
- Where ? is the unique non-zero solution to ? in
- R is the range of values of Y
- Equivalently, ? is the unique non-zero solution
to ? in - Where S is defined as before
97Sequential Analysis
- This is very similar to Ch. 7 Random Walks
- The parameter ? is the same as in Ch. 7
- And it will be the same in Ch 10 BLAST
- lt skipping the random walk part gt
98Sequential Analysis
- Mean Sample Size
- The (random) number of observations until one or
the other hypothesis is accepted - Find approximation by ignoring boundary overshoot
- Essentially identical method used to find the
mean number of steps until the random walk stops
99Sequential Analysis
- Two expressions are calculated for SiS1,0(Yi)
- One involves the mean sample size
- By equating both expressions, solve for mean
sample size
100Sequential Analysis
- So, the mean sample size is
- Both numerator and denominator depend on ?(?),
and so also on ? - A generalization applies if Q(y) of Y has
different distribution than H0 and H1 relevant
101Sequential Analysis
- Example
- Same sequence matching example as before
- H0 p0 0.25 (probability of a match is 0.25)
- H1 p1 0.35 (probability of a match is 0.35)
- Type I error a and Type II error ß chosen 0.01
- Mean sample size equation is
- Mean sample size is when H0 is true 194
- Mean sample size is when H1 is true 182
102Sequential Analysis
- Boundary Overshoot
- So far we assumed no boundary overshoot
- In practice, there will almost always be, though
- Exact Type I and Type II errors different from a
and ß - Random walk theory can be used to assess how
significant the effects of boundary overshoot are - It can be shown that the sum of Type I and Type
II errors is always less than a ß (also
individually) - BLAST deals with this in a novel way -gt see Ch. 10