Statistical Inference - PowerPoint PPT Presentation

1 / 165
About This Presentation
Title:

Statistical Inference

Description:

Statistical Inference Making decisions regarding the population base on a sample ... – PowerPoint PPT presentation

Number of Views:191
Avg rating:3.0/5.0
Slides: 166
Provided by: lave9
Category:

less

Transcript and Presenter's Notes

Title: Statistical Inference


1
Statistical Inference
  • Making decisions regarding the population base on
    a sample

2
Decision Types
  • Estimation
  • Deciding on the value of an unknown parameter
  • Hypothesis Testing
  • Deciding a statement regarding an unknown
    parameter is true of false
  • Prediction
  • Deciding the future value of a random variable
  • All decisions will be based on the values of
    statistics

3
Estimation
  • Definitions
  • An estimator of an unknown parameter is a sample
    statistic used for this purpose
  • An estimate is the value of the estimator after
    the data is collected
  • The performance of an estimator is assessed by
    determining its sampling distribution and
    measuring its closeness to the parameter being
    estimated

4
Examples of Estimators
5
The Sample Proportion
  • Let p population proportion of interest or
    binomial probability of success.
  • Let

sample proportion or proportion of successes.
is a normal distribution with
6
(No Transcript)
7
The Sample Mean
  • Let x1, x2, x3, , xn denote a sample of size n
    from a normal distribution with mean m and
    standard deviation s.
  • Let

is a normal distribution with
8
(No Transcript)
9
Confidence Intervals
10
Estimation by Confidence Intervals
  • Definition
  • An (100) P confidence interval of an unknown
    parameter is a pair of sample statistics (t1 and
    t2) having the following properties
  1. Pt1 lt t2 1. That is t1 is always smaller
    than t2.
  1. Pthe unknown parameter lies between t1 and t2
    P.
  • the statistics t1 and t2 are random variables
  • Property 2. states that the probability that the
    unknown parameter is bounded by the two
    statistics t1 and t2 is P.

11
Critical values for a distribution
  • The a upper critical value for a any distribution
    is the point xa underneath the distribution such
    that PX gt xa a

a
xa
12
Critical values for the standard Normal
distribution
  • PZ gt za a

a
za
13
Critical values for the standard Normal
distribution
  • PZ gt za a

14
  • Confidence Intervals for a proportion p

Let
and
Then t1 to t2 is a (1 a)100 P100 confidence
interval for p
15
  • Logic

Thus t1 to t2 is a (1 a)100 P100 confidence
interval for p
16
Example
  • Suppose we are interested in determining the
    success rate of a new drug for reducing Blood
    Pressure
  • The new drug is given to n 70 patients with
    abnormally high Blood Pressure
  • Of these patients to X 63 were able to reduce
    the abnormally high level of Blood Pressure
  • The proportion of patients able to reduce the
    abnormally high level of Blood Pressure was

17
and za/2 1.960
If P 1 a 0.95 then a/2 .025
  • Then

and
Thus a 95 confidence interval for p is 0.8297 to
0.9703
18
What is the probability that p is beween 0.8297
and 0.9703?
Is it 95 ?
Answer p (unknown) , 0.8297 and 0.9703 are
numbers. Either p is between 0.8297 and 0.9703
or it is not. The 95 refers to success of
confidence interval procedure prior to the
collection of the data. After the data is
collected it was either successful in capturing p
or it was not.
19
Statistical Inference
  • Making decisions regarding the population base on
    a sample

20
Two Areas of Statistical Inference
  • Estimation
  • Hypothesis Testing

21
Estimation
  • Definitions
  • An estimator of an unknown parameter is a sample
    statistic used for this purpose
  • An estimate is the value of the estimator after
    the data is collected
  • The performance of an estimator is assessed by
    determining its sampling distribution and
    measuring its closeness to the parameter being
    estimated

22
Confidence Intervals
  • Estimation of a parameter by a range of values
    (an interval)

23
Estimation by Confidence Intervals
  • Definition
  • An (100) P confidence interval of an unknown
    parameter is a pair of sample statistics (t1 and
    t2) having the following properties
  1. Pt1 lt t2 1. That is t1 is always smaller
    than t2.
  1. Pthe unknown parameter lies between t1 and t2
    P.
  • the statistics t1 and t2 are random variables
  • Property 2. states that the probability that the
    unknown parameter is bounded by the two
    statistics t1 and t2 is P.

24
Confidence Interval for a Proportion
  • 100(1 a) Confidence Interval for the
    population proportion

Interpretation For about 100(1 a)P of all
randomly selected samples from the population,
the confidence interval computed in this manner
captures the population proportion.
25
Comment
  • The usual choices of a are 0.05 and 0.01
  • In this case the level of confidence, 100(1 -
    a), is 95 and 99 respectively
  • Also the tabled value za/2 is
  • z0.025 1.960 and
  • z0.005 2.576 respectively

26
Example
  • Suppose we are interested in determining the
    success rate of a new drug for reducing Blood
    Pressure
  • The new drug is given to n 70 patients with
    abnormally high Blood Pressure
  • Of these patients to X 63 were able to reduce
    the abnormally high level of Blood Pressure
  • The proportion of patients able to reduce the
    abnormally high level of Blood Pressure was

27
and za/2 1.960
If P 1 a 0.95 then a/2 .025
  • Then

and
Thus a 95 confidence interval for p is 0.8297 to
0.9703
28
What is the probability that p is beween 0.8297
and 0.9703?
Is it 95 ?
Answer p (unknown) , 0.8297 and 0.9703 are
numbers. Either p is between 0.8297 and 0.9703
or it is not. The 95 refers to success of
confidence interval procedure prior to the
collection of the data. After the data is
collected it was either successful in capturing p
or it was not.
29
Error Bound
For a (1 a) confidence level, the approximate
margin of error in a sample proportion is
30
Factors that Determine the Error Bound
  • 1. The sample size, n. When sample size
    increases, margin of error decreases.

3. The multiplier za/2. Connected to the (1
a) level of confindence of the Error Bound.
The value of za/2 for a 95 level of confidence
is 1.96 This value is changed to change the level
of confidence.
31
Determination of Sample Size
  • In almost all research situations the researcher
    is interested in the question

How large should the sample be?
32
Answer
  • Depends on
  • How accurate you want the answer.

Accuracy is specified by
  • Specifying the magnitude of the error bound
  • Level of confidence

33
Error Bound
  • If we have specified the level of confidence then
    the value of za/2 will be known.
  • If we have specified the magnitude of B, it will
    also be known

Solving for n we get
34
Summarizing
  • The sample size that will estimate p with an
    Error Bound B and level of confidence P 1 a
    is
  • where
  • B is the desired Error Bound
  • za/2 is the a/2 critical value for the standard
    normal distribution
  • p is some preliminary estimate of p.
  • If you do not have a preliminary estimate of p,
    use p 0.50

35
Reason
For p 0.50
n will take on the largest value.
Thus using p 0.50, n may be larger than
required if p is not 0.50. but will give the
desired accuracy or better for all values of p.
36
Example
  • Suppose that I want to conduct a survey and want
    to estimate p proportion of voters who favour a
    downtown location for a casino
  • I know that the approximate value of p is
  • p 0.50. This is also a good choice for p if
    one has no preliminary estimate of its value.
  • I want the survey to estimate p with an error
    bound B 0.01 (1 percentage point)
  • I want the level of confidence to be 95 (i.e. a
    0.05 and za/2 z0.025 1.960
  • Then

37
  • Confidence Intervals
  • for the mean , m,
  • of a Normal Population

38
  • Confidence Intervals for the mean of a Normal
    Population, m

Then t1 to t2 is a (1 a)100 P100 confidence
interval for m
39
  • Logic

has a Standard Normal distribution
Hence
Thus t1 to t2 is a (1 a)100 P100 confidence
interval for m
40
Example
  • Suppose we are interested average Bone Mass
    Density (BMD) for women aged 70-75
  • A sample n 100 women aged 70-75 are selected
    and BMD is measured for eahc individual in the
    sample.
  • The average BMD for these individuals is
  • The standard deviation (s) of BMD for these
    individuals is

41
If P 1 a 0.95 then a/2 .025
and za/2 1.960
  • Then

and
Thus a 95 confidence interval for m is 24.10 to
27.16
42
Determination of Sample Size
  • Again a question to be asked

How large should the sample be?
43
Answer
  • Depends on
  • How accurate you want the answer.

Accuracy is specified by
  • Specifying the magnitude of the error bound
  • Level of confidence

44
Error Bound
  • If we have specified the level of confidence then
    the value of za/2 will be known.
  • If we have specified the magnitude of B, it will
    also be known

Solving for n we get
45
Summarizing
  • The sample size that will estimate m with an
    Error Bound B and level of confidence P 1 a
    is
  • where
  • B is the desired Error Bound
  • za/2 is the a/2 critical value for the standard
    normal distribution
  • s is some preliminary estimate of s.

46
Notes
  • n increases as B, the desired Error Bound,
    decreases
  • Larger sample size required for higher level of
    accuracy
  • n increases as the level of confidence, (1 a),
    increases
  • za/2 increases as a/2 becomes closer to zero.
  • Larger sample size required for higher level of
    confidence
  • n increases as the standard deviation, s, of the
    population increases.
  • If the population is more variable then a larger
    sample size required

47
Summary
  • The sample size n depends on
  • Desired level of accuracy
  • Desired level of confidence
  • Variability of the population

48
Example
  • Suppose that one is interested in estimating the
    average number of grams of fat (m) in one
    kilogram of lean beef hamburger
  • This will be estimated by
  • randomly selecting one kilogram samples, then
  • Measuring the fat content for each sample.
  • Preliminary estimates of m and s indicate
  • that m and s are approximately 220 and 40
    respectively.
  • I want the study to estimate m with an error
    bound 5
  • and
  • a level of confidence to be 95 (i.e. a 0.05
    and za/2 z0.025 1.960)

49
Solution
Hence n 246 one kilogram samples are required
to estimate m within B 5 gms with a 95 level
of confidence.
50
Statistical Inference
  • Making decisions regarding the population base on
    a sample

51
Decision Types
  • Estimation
  • Deciding on the value of an unknown parameter
  • Hypothesis Testing
  • Deciding a statement regarding an unknown
    parameter is true of false
  • Prediction
  • Deciding the future value of a random variable
  • All decisions will be based on the values of
    statistics

52
Estimation
  • Definitions
  • An estimator of an unknown parameter is a sample
    statistic used for this purpose
  • An estimate is the value of the estimator after
    the data is collected
  • The performance of an estimator is assessed by
    determining its sampling distribution and
    measuring its closeness to the parameter being
    estimated

53
Comments
  • When you use a single statistic to estimate a
    parameter it is called a point estimator
  • The estimate is a single value
  • The accuracy of this estimate cannot be
    determined from this value
  • A better way to estimate is with a confidence
    interval.
  • The width of this interval gives information on
    its accuracy

54
Estimation by Confidence Intervals
  • Definition
  • An (100) P confidence interval of an unknown
    parameter is a pair of sample statistics (t1 and
    t2) having the following properties
  1. Pt1 lt t2 1. That is t1 is always smaller
    than t2.
  1. Pthe unknown parameter lies between t1 and t2
    P.
  • the statistics t1 and t2 are random variables
  • Property 2. states that the probability that the
    unknown parameter is bounded by the two
    statistics t1 and t2 is P.

55
Confidence Intervals
Summary
56
Confidence Interval for a Proportion
57
Determination of Sample Size
  • The sample size that will estimate p with an
    Error Bound B and level of confidence P 1 a
    is
  • where
  • B is the desired Error Bound
  • za/2 is the a/2 critical value for the standard
    normal distribution
  • p is some preliminary estimate of p.

58
  • Confidence Intervals for the mean of a Normal
    Population, m

59
Determination of Sample Size
  • The sample size that will estimate m with an
    Error Bound B and level of confidence P 1 a
    is
  • where
  • B is the desired Error Bound
  • za/2 is the a/2 critical value for the standard
    normal distribution
  • s is some preliminary estimate of s.

60
Hypothesis Testing
  • An important area of statistical inference

61
Definition
  • Hypothesis (H)
  • Statement about the parameters of the population
  • In hypothesis testing there are two hypotheses of
    interest.
  • The null hypothesis (H0)
  • The alternative hypothesis (HA)

62
  • Either
  • null hypothesis (H0) is true or
  • the alternative hypothesis (HA) is true.
  • But not both
  • We say that are mutually exclusive and
    exhaustive.

63
  • One has to make a decision
  • to either to accept null hypothesis (equivalent
    to rejecting HA)
  • or
  • to reject null hypothesis (equivalent to
    accepting HA)

64
  • There are two possible errors that can be made.
  • Rejecting the null hypothesis when it is true.
    (type I error)
  • accepting the null hypothesis when it is false
    (type II error)

65
  • An analogy a jury trial
  • The two possible decisions are
  • Declare the accused innocent.
  • Declare the accused guilty.

66
  • The null hypothesis (H0) the accused is
    innocent
  • The alternative hypothesis (HA) the accused is
    guilty

67
  • The two possible errors that can be made
  • Declaring an innocent person guilty.
  • (type I error)
  • Declaring a guilty person innocent.
  • (type II error)
  • Note in this case one type of error may be
    considered more serious

68
Decision Table showing types of Error
H0 is True
H0 is False


Correct Decision
Type II Error
Accept H0
Correct Decision
Type I Error
Reject H0
69
  • To define a statistical Test we
  • Choose a statistic (called the test statistic)
  • Divide the range of possible values for the test
    statistic into two parts
  • The Acceptance Region
  • The Critical Region

70
  • To perform a statistical Test we
  • Collect the data.
  • Compute the value of the test statistic.
  • Make the Decision
  • If the value of the test statistic is in the
    Acceptance Region we decide to accept H0 .
  • If the value of the test statistic is in the
    Critical Region we decide to reject H0 .

71
  • Example
  • We are interested in determining if a coin is
    fair.
  • i.e. H0 p probability of tossing a head ½.
  • To test this we will toss the coin n 10 times.
  • The test statistic is x the number of heads.
  • This statistic will have a binomial distribution
    with p ½ and n 10 if the null hypothesis is
    true.

72
Sampling distribution of x when H0 is true
73
  • Note
  • We would expect the test statistic x to be around
    5 if H0 p ½ is true.
  • Acceptance Region 3, 4, 5, 6, 7.
  • Critical Region 0, 1, 2, 8, 9, 10.
  • The reason for the choice of the Acceptance
    region
  • Contains the values that we would expect for x if
    the null hypothesis is true.

74
  • Definitions For any statistical testing
    procedure define
  • a PRejecting the null hypothesis when it is
    true P type I error
  • b Paccepting the null hypothesis when it is
    false P type II error

75
  • In the last example
  • a P type I error p(0) p(1) p(2) p(8)
    p(9) p(10) 0.109, where p(x) are binomial
    probabilities with p ½ and n 10 .
  • b P type II error p(3) p(4) p(5) p(6)
    p(7), where p(x) are binomial probabilities
    with p (not equal to ½) and n 10. Note these
    will depend on the value of p.

76
Table Probability of a Type II error, b vs. p
Note the magnitude of b increases as p gets
closer to ½.
77
  • Comments
  • You can control a P type I error and b P
    type II error by widening or narrowing the
    acceptance region. .
  • Widening the acceptance region decreases a P
    type I error but increases b P type II
    error.
  • Narrowing the acceptance region increases a P
    type I error but decreases b P type II
    error.

78
  • Example Widening the Acceptance Region
  • Suppose the Acceptance Region includes in
    addition to its previous values 2 and 8 then a
    P type I error p(0) p(1) p(9) p(10)
    0.021, where again p(x) are binomial
    probabilities with p ½ and n 10 .
  • b P type II error p(2) p(3) p(4) p(5)
    p(6) p(7) p(8). Tabled values of are given
    on the next page.

79
Table Probability of a Type II error, b vs. p
Note Compare these values with the previous
definition of the Acceptance Region. They have
increased,
80
  • Example Narrowing the Acceptance Region
  • Suppose the original Acceptance Region excludes
    the values 3 and 7. That is the Acceptance Region
    is 4,5,6. Then a P type I error p(0)
    p(1) p(2) p(3) p(7) p(8) p(9) p(10)
    0.344.
  • b P type II error p(4) p(5) p(6) .
    Tabled values of are given on the next page.

81
Table Probability of a Type II error, b vs. p
Note Compare these values with the otiginal
definition of the Acceptance Region. They have
decreased,
82
Acceptance Region 4,5,6.
Acceptance Region 2,3,4,5,6,7,8.
Acceptance Region 3,4,5,6,7.
a 0.344
a 0.109
a 0.021
83
Hypothesis Testing
  • An important area of statistical inference

84
Definition
  • Hypothesis (H)
  • Statement about the parameters of the population
  • In hypothesis testing there are two hypotheses of
    interest.
  • The null hypothesis (H0)
  • The alternative hypothesis (HA)

85
  • Either
  • null hypothesis (H0) is true or
  • the alternative hypothesis (HA) is true.
  • But not both
  • We say that are mutually exclusive and
    exhaustive.

86
Decision Table showing types of Error
H0 is True
H0 is False


Correct Decision
Type II Error
Accept H0
Correct Decision
Type I Error
Reject H0
87
  • The Approach in Statistical Testing is
  • Set up the Acceptance Region so that a is close
    to some predetermine value (the usual values are
    0.05 or 0.01)
  • The predetermine value of a (0.05 or 0.01) is
    called the significance level of the test.
  • The significance level of the test is a Ptest
    makes a type I error

88
  • Determining the Critical Region
  • The Critical Region should consist of values of
    the test statistic that indicate that HA is true.
    (hence H0 should be rejected).
  • The size of the Critical Region is determined so
    that the probability of making a type I error, a,
    is at some pre-determined level. (usually 0.05 or
    0.01). This value is called the significance
    level of the test.
  • Significance level Ptest makes type I error

89
  • To find the Critical Region
  • Find the sampling distribution of the test
    statistic when is H0 true.
  • Locate the Critical Region in the tails (either
    left or right or both) of the sampling
    distribution of the test statistic when is H0
    true.
  • Whether you locate the critical region in the
    left tail or right tail or both tails depends on
    which values indicate HA is true.
  • The tails chosen values indicating HA.

90
  1. the size of the Critical Region is chosen so that
    the area over the critical region and under the
    sampling distribution of the test statistic when
    is H0 true is the desired level of a Ptype I
    error

Sampling distribution of test statistic when H0
is true
Critical Region - Area a
91
The z-test for Proportions
  • Testing the probability of success in a binomial
    experiment

92
Situation
  • A success-failure experiment has been repeated n
    times
  • The probability of success p is unknown. We want
    to test
  • H0 p p0 (some specified value of p)
  • Against
  • HA

93
The Data
  • The success-failure experiment has been repeated
    n times
  • The number of successes x is observed.
  • Obviously if this proportion is close to p0 the
    Null Hypothesis should be accepted otherwise the
    null Hypothesis should be rejected.

94
The Test Statistic
  • To decide to accept or reject the Null Hypothesis
    (H0) we will use the test statistic
  • If H0 is true we should expect the test statistic
    z to be close to zero.
  • If H0 is true we should expect the test statistic
    z to have a standard normal distribution.
  • If HA is true we should expect the test statistic
    z to be different from zero.

95
  • The sampling distribution of z when H0 is true
  • The Standard Normal distribution

Accept H0
96
  • The Acceptance region

Accept H0
97
  • Acceptance Region
  • Accept H0 if
  • Critical Region
  • Reject H0 if
  • With this Choice

98
Summary
  • To Test for a binomial probability p
  • H0 p p0 (some specified value of p)
  • Against
  • HA
  • we
  • Decide on a PType I Error the significance
    level of the test (usual choices 0.05 or 0.01)

99
  1. Collect the data
  • Compute the test statistic
  • Make the Decision
  • Accept H0 if
  • Reject H0 if

100
Example
  • In the last provincial election the proportion of
    the voters who voted for the Liberal party was
    0.08 (8 )
  • The party is interested in determining if that
    percentage has changed
  • A sample of n 800 voters are surveyed

101
  • We want to test
  • H0 p 0.08 (8)
  • Against
  • HA

102
Summary
  • Decide on a PType I Error the significance
    level of the test
  • Choose (a 0.05)
  • Collect the data
  • The number in the sample that support the liberal
    party is x 92

103
  • Compute the test statistic
  • Make the Decision
  • Accept H0 if
  • Reject H0 if

104
  • Since the test statistic is in the Critical
    region we decide to Reject H0
  • Conclude that H0 p 0.08 (8) is false
  • There is a significant difference (a 5) in the
    proportion of the voters supporting the liberal
    party in this election than in the last election

105
The two-tailed z-test for Proportions
  • Testing the probability of success in a binomial
    experiment

106
Situation
  • A success-failure experiment has been repeated n
    times
  • The probability of success p is unknown. We want
    to test
  • H0 p p0 (some specified value of p)
  • Against
  • HA

107
The Test Statistic
  • To decide to accept or reject the Null Hypothesis
    (H0) we will use the test statistic

108
  • Acceptance Region
  • Accept H0 if
  • Critical Region
  • Reject H0 if
  • With this Choice

109
  • The Acceptance region

Accept H0
110
The one tailed z-test
  • A success-failure experiment has been repeated n
    times
  • The probability of success p is unknown. We want
    to test
  • H0 (some specified value of p)
  • Against
  • HA
  • The alternative hypothesis is in this case called
    a one-sided alternative

111
The Test Statistic
  • To decide to accept or reject the Null Hypothesis
    (H0) we will use the test statistic
  • If H0 is true we should expect the test statistic
    z to be close to zero or negative
  • If p p0 we should expect the test statistic z
    to have a standard normal distribution.
  • If HA is true we should expect the test statistic
    z to be a positive number.

112
  • The sampling distribution of z when p p0
  • The Standard Normal distribution

Reject H0
Accept H0
113
  • The Acceptance and Critical region

Reject H0
Accept H0
114
  • Acceptance Region
  • Accept H0 if
  • Critical Region
  • Reject H0 if
  • The Critical Region is called one-tailed
  • With this Choice

115
Example
  • A new surgical procedure is developed for
    correcting heart defects infants before the age
    of one month.
  • Previously the procedure was used on infants that
    were older than one month and the success rate
    was 91
  • A study is conducted to determine if the success
    rate of the new procedure is greater than 91 (n
    200)

116
  • We want to test
  • H0
  • Against
  • HA

117
Summary
  • Decide on a PType I Error the significance
    level of the test
  • Choose (a 0.05)
  • Collect the data
  • The number of successful operations in the sample
    of 200 cases is x 187

118
  • Compute the test statistic
  • Make the Decision
  • Accept H0 if
  • Reject H0 if

119
  • Since the test statistic is in the Acceptance
    region we decide to Accept H0
  • There is a no significant (a 5) increase in
    the success rate of the new procedure over the
    older procedure

120
Comments
  • When the decision is made to accept H0 is made
    one should not conclude that we have proven H0.
  • This is because when setting up the test we have
    not controlled b Ptype II error Paccepting
    H0 when H0 is FALSE
  • Whenever H0 is accepted there is a possibility
    that a type II error has been made.

121
In the last example
  • The conclusion that there is a no significant (a
    5) increase in the success rate of the new
    procedure over the older procedure should be
    interpreted
  • We have been unable to proof that the new
    procedure is better than the old procedure

122
Some other comments
  • When does one use a two-tailed test?
  • When does one use a one tailed test?
  • Answer This depends on the alternative
    hypothesis HA.
  • Critical Region values that indicate HA
  • Thus if only the upper tail indicates HA, the
    test is one tailed.
  • If both tails indicate HA, the test is two
    tailed.

123
Also
  • The alternative hypothesis HA usually corresponds
    to the research hypothesis (the hypothesis that
    the researcher is trying to prove)
  • The new procedure is better
  • The drug is effective in reducing levels of
    cholesterol.
  • There has a change in political opinion from the
    time the survey was taken till the present time
    (time of current survey).

124
The z-test for the Mean of a Normal Population
  • We want to test, m, denote the mean of a normal
    population

125
Situation
  • A sample of n observations are collected from a
    Normal distribution
  • The mean of the Normal distribution, m, is
    unknown. We want to test
  • H0 m m0 (some specified value of m)
  • Against
  • HA

126
The Data
  • Let x1, x2, x3 , , xn denote a sample from a
    normal population with mean m and standard
    deviation s.
  • Let
  • we want to test if the mean, m, is equal to some
    given value m0.
  • Obviously if the sample mean is close to m0 the
    Null Hypothesis should be accepted otherwise the
    null Hypothesis should be rejected.

127
The Test Statistic
  • To decide to accept or reject the Null Hypothesis
    (H0) we will use the test statistic
  • If H0 is true we should expect the test statistic
    z to be close to zero.
  • If H0 is true we should expect the test statistic
    z to have a standard normal distribution.
  • If HA is true we should expect the test statistic
    z to be different from zero.

128
  • The sampling distribution of z when H0 is true
  • The Standard Normal distribution

Accept H0
129
  • The Acceptance region

Accept H0
130
  • Acceptance Region
  • Accept H0 if
  • Critical Region
  • Reject H0 if
  • With this Choice

131
Summary
  • To Test for mean m, of a normal population
  • H0 m m0 (some specified value of m)
  • Against
  • HA
  • Decide on a PType I Error the significance
    level of the test (usual choices 0.05 or 0.01)

132
  1. Collect the data
  • Compute the test statistic
  • Make the Decision
  • Accept H0 if
  • Reject H0 if

133
Example
  • A manufacturer Glucosamine capsules claims that
    each capsule contains on the average
  • 500 mg of glucosamine

To test this claim n 40 capsules were selected
and amount of glucosamine (X) measured in each
capsule.
Summary statistics
134
  • We want to test

Manufacturers claim is correct
against
Manufacturers claim is not correct
135
The Test Statistic
136
The Critical Region and Acceptance Region
Using a 0.05
za/2 z0.025 1.960
We accept H0 if -1.960 z 1.960
reject H0 if z lt -1.960 or z gt 1.960
137
The Decision
Since z -2.75 lt -1.960 We reject H0
Conclude the manufacturerss claim is incorrect
138
Hypothesis Testing
A review of the concepts
139
  • In hypotheses testing there are two hypotheses
  • The Null Hypothesis (H0)
  • The Alternative Hypothesis (HA)
  • The alternative hypothesis is usually the
    research hypothesis - the hypothesis that the
    researcher is trying to prove.
  • The null hypothesis is the hypothesis that the
    research hypothesis is not true.

140
  • A statistical Test is defined by
  • Choosing a statistic (called the test statistic)
  • Dividing the range of possible values for the
    test statistic into two parts
  • The Acceptance Region
  • The Critical Region

141
  • To perform a statistical Test we
  • Collect the data.
  • Compute the value of the test statistic.
  • Make the Decision
  • If the value of the test statistic is in the
    Acceptance Region we decide to accept H0 .
  • If the value of the test statistic is in the
    Critical Region we decide to reject H0 .

142
  • You can compare a statistical test to a meter

Value of test statistic
Acceptance Region
Critical Region
Critical Region
Critical Region is the red zone of the meter
143
Value of test statistic
Acceptance Region
Critical Region
Critical Region
Accept H0
144
Acceptance Region
Value of test statistic
Critical Region
Critical Region
Reject H0
145
Acceptance Region
Critical Region
Sometimes the critical region is located on one
side. These tests are called one tailed tests.
146
  • Whether you use a one tailed test or a two tailed
    test depends on
  1. The hypotheses being tested (H0 and HA).
  2. The test statistic.

147
If only large positive values of the test
statistic indicate HA then the critical region
should be located in the positive tail. (1 tailed
test)
If only large negative values of the test
statistic indicate HA then the critical region
should be located in the negative tail. (1 tailed
test)
If both large positive and large negative values
of the test statistic indicate HA then the
critical region should be located both the
positive and negative tail. (2 tailed test)
148
Usually 1 tailed tests are appropriate if HA is
one-sided.
Two tailed tests are appropriate if HA is two
-sided. But not always
149
  • Once the test statistic is determined, to set up
    the critical region we have to find the sampling
    distribution of the test statistic when H0 is true

This describes the behaviour of the test
statistic when H0 is true
150
  • We then locate the critical region in the tails
    of the sampling distribution of the test
    statistic when H0 is true

a /2
a /2
The size of the critical region is chosen so that
the area over the critical region is a.
151
  • This ensures that the Ptype I error
    Prejecting H0 when true a

a /2
a /2
152
  • To find Ptype II error P accepting H0 when
    false b, we need to find the sampling
    distribution of the test statistic when H0 is
    false

sampling distribution of the test statistic when
H0 is false
sampling distribution of the test statistic when
H0 is true
b
a /2
a /2
153
The p-value approach to Hypothesis Testing
154
In hypothesis testing we need
  1. A test statistic
  2. A Critical and Acceptance region for the test
    statistic

The Critical Region is set up under the sampling
distribution of the test statistic. Area a
(0.05 or 0.01) above the critical region. The
critical region may be one tailed or two tailed
155
  • The Critical region

a/2
a/2
Accept H0
156
In test is carried out by
  • Computing the value of the test statistic
  • Making the decision
  • Reject if the value is in the Critical region and
  • Accept if the value is in the Acceptance region.

157
The value of the test statistic may be in the
Acceptance region but close to being in the
Critical region, or The it may be in the Critical
region but close to being in the Acceptance
region.
To measure this we compute the p-value.
158
Definition Once the test statistic has been
computed form the data the p-value is defined to
be
p-value Pthe test statistic is as or more
extreme than the observed value of the test
statistic
more extreme means giving stronger evidence to
rejecting H0
159
Example Suppose we are using the z test for
the mean m of a normal population and a 0.05.
Z0.025 1.960
Thus the critical region is to reject H0 if Z lt
-1.960 or Z gt 1.960 . Suppose the z 2.3, then
we reject H0
p-value Pthe test statistic is as or more
extreme than the observed value of the test
statistic P z gt 2.3 Pz lt -2.3
0.0107 0.0107 0.0214
160
Graph
p - value
-2.3
2.3
161
If the value of z 1.2, then we accept H0
p-value Pthe test statistic is as or more
extreme than the observed value of the test
statistic P z gt 1.2 Pz lt -1.2
0.1151 0.1151 0.2302
23.02 chance that the test statistic is as or
more extreme than 1.2. Fairly high, hence 1.2 is
not very extreme
162
Graph
p - value
1.2
-1.2
163
Properties of the p -value
  1. If the p-value is small (lt0.05 or 0.01) H0 should
    be rejected.
  2. The p-value measures the plausibility of H0.
  3. If the test is two tailed the p-value should be
    two tailed.
  4. If the test is one tailed the p-value should be
    one tailed.
  5. It is customary to report p-values when reporting
    the results. This gives the reader some idea of
    the strength of the evidence for rejecting H0

164
Summary
  • A common way to report statistical tests is to
    compute the p-value.
  • If the p-value is small ( lt 0.05 or lt 0.01) then
    H0 is rejected.
  • If the p-value is extremely small this gives a
    strong indication that HA is true.
  • If the p-value is marginally above the threshold
    0.05 then we cannot reject H0 but there would be
    a suspicion that H0 is false.

165
Next topic Students t - test
Write a Comment
User Comments (0)
About PowerShow.com