Hypothesis Testing - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Hypothesis Testing

Description:

We call that number the 'critical value' of the t-statistic. ... Step 1. We need to establish our 'critical value. ... tailed test,' the critical value is 1.66 ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 25
Provided by: homeUc
Category:

less

Transcript and Presenter's Notes

Title: Hypothesis Testing


1
Hypothesis Testing
2
Introduction
  • Hypothesis Testing is the procedure that social
    scientists use to determine the empirical value
    of their theory.
  • Today, Im going to develop the logic of
    hypothesis testing using the relatively simple
    case of hypothesis tests about sample means.
  • The procedure that will be developed is a form of
    proof by statistical contradiction. Evidence is
    mustered in favor of theory by demonstrating that
    the data is unlikely to be observed if the
    postulated theoretical model were false.

3
Epistemological Foundations of Hypothesis Testing
  • Foundation 1. There exists one and only one
    process that generates the actions of a
    population with respect to some variable.
  • Foundation 2. There are many examples of long
    accepted scientific theories losing credibility.
    Once objective truths are rejected.
  • Foundation 3. If we cannot be sure that a theory
    is true, then the next best thing is to judge
    the probability that a theory is true.

4
How do we express the probability that a theory
is true?
  • Wed like to be able to express our uncertainty
    as
  • P ( Model is True Observed Data )
  • But, based on our epistemological foundations, we
    cannot state that the model is true with
    Probability X. Either the model is true, or not.
  • Instead, we are limited to a knowledge of
  • P ( Observed Data Model is True )

5
Interpretation of P( Observed Data Model is
True )
  • If P( Data Model) is close to one, then the
    data is consistent with the model, and we would
    not reject it as an objective interpretation of
    reality.
  • Hypothesis men have higher wages than woman
  • Data The median income for a male is 38, 275
  • The median income for a female is 29, 215.
  • We would say that the data is consistent with the
    model. That is, P( Data Model) is close to one.

6
Interpretation of probabilities continued.
  • If P( Data Model) is not close to one, then the
    data is inconsistent with the models
    predictions, and we reject the model.
  • Hypothesis People born in the U.S. have higher
    incomes than immigrants.
  • Data The median income for someone who is native
    born is 42,917.
  • The median income for a naturalized immigrant
    is 43,968.
  • We would say that the data is not consistent with
    the model. That is, P( Data Model) is not close
    to one and the model is not a useful
    representation of reality.

7
The Hypothesis Testing Setup
  • Step 1. Define the Research Hypothesis.
  • A Research or Alternative Hypothesis is a
    statement derived from theory about what the
    researcher expects to find in the data.
  • Step 2. Define the Null Hypothesis.
  • The Null Hypothesis is a statement of what you
    would not expect to find if your research or
    alternative hypothesis was consistent with
    reality.
  • Step 3. Conduct an analysis of the data to
    determine whether or not you can reject the null
    hypothesis with some pre-determined probability.
  • If you can reject the null hypothesis with some
    probability, then the data is consistent with the
    model.
  • If you cannot reject the null hypothesis with
    some probability, then the data is not consistent
    with the model.

8
The Role of Averages in Hypothesis Testing
  • Hypothesis tests utilize the concept Lhomme
    moyen.
  • In effect, we ask with what probability can we
    reject the null hypothesis for an average
    individual who is endowed with certain
    characteristics?

9
The motivating question for the rest of class
How do we judge the probability of the null
hypothesis?
  • Assume that our null hypothesis is that
  • X gt 0
  • To the left, the sample mean of X equals two.
    Wed like to be able to reject the null
    hypothesis.
  • How do we make a probabilistic statement about
    the validity of the null hypothesis?

The Sample Mean
10
Population vs. Sample Statistics
  • When we make statistical inferences, we assume
    that our data is a sample from an entire
    population.
  • - The population is described by the population
    mean and the population variance that are
    unknown.
  • - The sample is described by the sample mean and
    the sample variance.
  • The sample mean and variance provide estimates
    about the mean and variance of the entire
    population.
  • Importantly, these estimates are known only with
    some uncertainty.
  • Statistical inference generally focuses on
    estimates of the mean.

11
Sampling Distributions
  • The sample distribution of sample means is a
    hypothetical distribution of all possible sample
    means for samples of size N that could be formed
    for a given population.
  • The observed sample mean is just one realization
    of this population.
  • Needless to say, this is a theoretical construct
    since, with a large population, there will be
    billions or even trillions of unique samples and
    it would be superior to simply sample the entire
    population.

12
The Central Limit Theorem
  • The Central Limit Theorem states that the
    sampling distribution of sample means is a normal
    distribution.
  • - The mean of the sampling distribution of
    sample means equals the mean of the population
    distribution.
  • - The variance of the sampling distribution of
    sample means equals the variance of the
    population distribution divided by n.

13
A Monte Carlo examination of the Central Limit
Theorem
  • Monte Carlo simulations are a way of examining
    properties of statistical estimators using random
    number generators instead of using proofs.
  • Using Excel, we shall illustrate how the Central
    Limit Theorem works. We shall demonstrate
  • 1) the mean of the sample means from a random
    sample with known population mean and variance
    approximately equals the population mean.
  • 2) the variance of the sample means
    approximately equals the known population
    variance divided by n.
  • 3) the distribution of the sample means is
    approximately normal.

14
Why do we care about the central limit theorem?
  • The central limit theorem provides us with a way
    of summarizing our uncertainty about the sample
    mean. It therefore allows us make probabilistic
    statements about the null hypothesis.
  • The key is that we have estimates based on the
    sample of the population mean and the population
    variance.
  • Further, because we know that the sample mean
    follows a normal distribution and the standard
    deviation of the sampling distribution follows a
    chi-squared distribution (for reasons that are
    unimportant to the class), we know that the
    statistic
  • t Sample Mean Value of the Null Hypothesis
  • Sample Standard Deviation / ??n
  • follows the t distribution with n-1 degrees of
    freedom.

15
The t-distribution
  • The t-distribution is a symmetric, bell-shaped
    curve much like the normal distribution.
  • The number of degrees of freedom, which is
    closely related to the number of observations,
    expresses how much certainty you have your
    estimate (influences the variance)

t-dist for n5 and n100
16
Interpreting the t-statistic
  • The t-statistic corresponds to a value along the
    x-axis of the t-distribution. In effect, it
    measures how many standard deviations (divided by
    root n) the sample mean is from the null
    hypothesis.

17
Interpreting the t-statistic cont.
  • As hypothesis testers, we only want to reject the
    null hypothesis if we are very confident that the
    null hypothesis is mistaken.
  • The standard is that we reject the null if we are
    95 certain that it is false.
  • Note when we refer to statistical significance,
    we say that a finding is statistically
    significant if we can reject the null hypothesis
    at the 95 level.

18
Interpreting the t-statistic cont.
  • For a given number of degrees of freedom, by the
    property of the t-distribution, we know how large
    the t-statistic must be in order to reject the
    null.
  • We call that number the critical value of the
    t-statistic.
  • If the value of the t-statistic calculated from
    the data is greater than this critical value,
    then we reject the null hypothesis.

19
Example
  • Suppose our null hypothesis is that X is less
    than 0.
  • The sample mean is 3
  • The sample standard deviation is 2
  • There are 100 observations.
  • Step 1. We need to establish our critical
    value.
  • We wish to reject the null hypothesis if we are
    95 certain that it is false. For 100
    observations and a one-tailed test, the
    critical value is 1.66
  • Step 2. The t-statistic ( 3 0 ) / ( 2 / ?100
    ) 3 / .2 15
  • Step 3. Compare the t-statistic with the critical
    value. If the t-statistic is greater than the
    critical value, then you can reject the null
    hypothesis.
  • In this case, 15 is greater than 1.66, so we can
    reject the null hypothesis that X is less than
    zero.

20
Two-Sample Tests
  • Suppose our null hypothesis is that men have
    higher incomes than women.
  • This requires us to test whether the difference
    between two different sample means is
    statistically significant.
  • The procedure is fundamentally the same as
    before, except we calculate the t-statistic in a
    slightly different way.

21
T-statistic for 2 sample tests
  • t-stat (Mean Pop 1) (Mean Pop 2)
  • (SP2/n1 SP2/n2)1/2
  • Where SP2 is the estimate of the common or
    pooled variance.
  • SP2 (n11)(Var Pop1) (n21)(Var Pop2)
  • n1 n2 - 2
  • There are n 2 degrees of freedom.

22
Example
  • Null hypothesis is that men have higher incomes
    than women.
  • Male Mean 44,000 Male Var 1,000 n 101
  • Female Mean 36,000 Female Var 1,000 n
    101
  • The critical value is approximately t 1.65
  • In thousands
  • Sp2 (1001 1001) / 200 1
  • T-stat (44 36) / (1/101 1 / 101).5 8 /
    .14 57
  • Therefore, you can reject with much greater than
    95 probability the null hypothesis.

23
P-values
  • P-Values Rather than using a critical value of
    the t-statistic, it is possible to determine
    based on the number of degrees of freedom and the
    t-statistic derived from the data to determine
    the p-value.
  • The p-value is the probability of falsely
    rejecting the null hypothesis.
  • e.g. the p-value for a t-statistic derived from
    the data of 2.358 with 120 degrees of freedom is
    .01.
  • If the p-value is less than .05, or whatever we
    define to be our pre-determined cut-off, we say
    the result is statistically significant.

24
Misc. SlideHypothesis Tests About Variable Means
cont.
The Figure to the right plots a population
distribution and a sampling distribution to
illustrate that our sampling distribution of
sample means has much less dispersion than the
population distribution.
Sampling Distribution
Population Distribution
Write a Comment
User Comments (0)
About PowerShow.com