Title: SLIDES PREPARED
1STATISTICS for the Utterly Confused, 2nd ed.
- SLIDES PREPARED
- By
- Lloyd R. Jaisingh Ph.D.
- Morehead State University
- Morehead KY
2Chapter 14
3Outline
- Do I Need to Read This Chapter? You should read
the Chapter if you would like to learn about -
- 14-1 Properties of the chi-square
distribution. - 14-2 The chi-square test for
goodness-of-fit. - 14-3 The chi-square test for
independence. - 14-4 Benfords Law.
-
4Objectives
- To introduce you to the chi-square
distribution. - To use the chi-square distribution to perform
tests for goodness-of-fit and independence.
5Objectives
- To introduce you to Benfords Law.
- To introduce technology integration for
chi-square tests.
614-1 The Chi-Square ( ?2 ) Distribution
- Properties
- It is a continuous distribution.
- It is not symmetric.
- It is skewed to the right.
- The distribution depends on the degrees of
freedom, df n 1, where n is the sample size.
714-1 The Chi-Square ( ?2 ) Distribution
- Properties
- The value of a ?2 random variable is always
nonnegative. - There are infinitely many ?2 distributions, since
each is uniquely defined by its degrees of
freedom.
814-1 The Chi-Square ( ?2 ) Distribution
- Properties
- For small sample size, the ?2 distribution is
much skewed to the right. - As n increases, the ?2 distribution becomes more
and more symmetrical.
914-1 The Chi-Square ( ?2 ) Distribution
- Properties
Family of ?2 distributions.
1014-1 The Chi-Square ( ?2 ) Distribution
- Properties
- Since we will be using the ?2 distribution for
the tests in this chapter, we will need to be
able to find critical values associated with the
distribution.
11Quick Tip
- Extensive tables of critical values are
available for use in solving confidence
intervals and hypothesis testing problems that
are associated with the ?2 distribution.
1214-1 The Chi-Square ( ?2 ) Distribution
- Properties
- Notation ?2?, n-1
- Explanation of the notation ?2?, n -1 ?2?, n
-1 is a ?2 value with n - 1 degrees of freedom
such that ? area is to the right of the
corresponding ?2 value.
1314-1 The Chi-Square ( ?2 ) Distribution
- Properties
Diagram explaining the notation ?2?, n-1
1414-1 The Chi-Square ( ?2 ) Distribution
- Properties
- Values for the random variable with the
appropriate degrees of freedom can be obtained
from the tables in the appendix of the text
(Table 4). - Example What is the value of ?20.05,10?
1514-1 The Chi-Square ( ?2 ) Distribution
- Properties
- Solution From Table 4 in the appendix, ?20.05,10
18.307. (Verify). - Example What is the value of ?20.95,20?
- Solution From Table 4 in the appendix, ?20.95,20
10.851. (Verify).
1614-2 The Chi-Square test for Goodness of Fit
- Have you ever wondered whether a sample of
observed data (frequency distribution or
proportions) fits some pattern or distribution? - We should not expect the pattern to exactly fit a
given distribution, so we can look for
differences and make conclusions as to the
goodness-of-fit of the data.
1714-2 The Chi-Square test for Goodness of Fit
- From the Figure on the next slide, one can
clearly see that the pattern of the sample data
does not quite follow the distribution of the
population. - As a matter of fact, the sample data deviates
quite severely from the population distribution.
1814-2 The Chi-Square test for Goodness of Fit
1914-2 The Chi-Square test for Goodness of Fit
- Hence one may intuitively conclude in this case
that the sample data did not come from the
population to which it is compared because of the
large deviations from the sample distribution to
the population distribution.
2014-2 The Chi-Square test for Goodness of Fit
- From the Figure on the next slide, one can
observe that the sample distribution follows
quite closely to the population distribution. - In this case, one may intuitively conclude that
the sample data did come from the population to
which it is compared because of the very small
deviation of the sample distribution from the
population distribution.
2114-2 The Chi-Square test for Goodness of Fit
2214-2 The Chi-Square test for Goodness of Fit
- Generally, we can assume that a good fit exists.
- That is, we can propose a hypothesis that a
specified theoretical distribution is appropriate
to model the pattern. - Below is a summary of the tests for
goodness-of-fit.
2314-2 The Chi-Square test for Goodness of Fit
24Quick Tip
- The chi-square goodness of fit test is always a
right-tailed test.
25Quick Tip
- For the chi-square goodness-of-fit test, the
expected frequencies should be at least 5. - When the expected frequency of a class or
category is less than 5, this class or category
can be combined with another class or category so
that the expected frequency is at least 5.
26EXAMPLE
- Example There are 4 TV sets that are located in
the student center of a large university. At a
particular time each day, four different soap
operas (1, 2, 3, and 4) are viewed on these TV
sets. The percentages of the audience captured
by these shows during one semester were 25
percent, 30 percent, 25 percent, and 20 percent,
respectively. During the first week of the
following semester, 300 students are surveyed.
27EXAMPLE (Continued)
- (a) If the viewing pattern has not changed, what
number of students is expected to watch each soap
opera? - Solution Based on the information, the expected
values will be 0.25?300 75, 0.30?300 90,
0.25?300 75, and 0.20?300 60.
28EXAMPLE (Continued)
- (b) Suppose that the actual observed numbers of
students viewing the soap operas are given in the
following table, test whether these numbers
indicate a change at the 1 percent level of
significance.
29EXAMPLE (Continued)
- Solution Given ? 0.01, n 4, df 4 1
3, ?20.01, 3 11.345. The observed and
expected frequencies are given below
30EXAMPLE (Continued)
- Solution (continued) The ?2 test statistic is
computed below.
31EXAMPLE (Continued)
32EXAMPLE (Continued)
Diagram showing the rejection region.
3314-3 The Chi-Square test for Independence
- The chi-square independence test can be used to
test for the independence between two variables.
34EXAMPLE
- Example A survey was done by a car manufacturer
concerning a particular make and model. A group
of 500 potential customers were asked whether
they purchased their current car because of its
appearance, its performance rating, or its fixed
price (no negotiating). The results, broken down
by gender responses, are given on the next slide.
35EXAMPLE (Continued)
Question Do females feel differently than males
about the three different criteria used in
choosing a car, or do they feel basically the
same?
36EXAMPLE (Continued)
- One way of answering this question is to
determine whether the criterion used in buying a
car is independent of gender.
37EXAMPLE (Continued)
- That is, we can do a test for independence.
- Thus the null hypothesis will be that the
criterion used is independent of gender, while
the alternative hypothesis will be that the
criterion used is dependent on gender.
38Quick Tips
- When data are arranged in tabular form for the
chi-square independence test, the table is called
a contingency table. - Here the table on slide 35 has 2 rows and 3
columns, so we say we have a 2 by 3 (2?3)
contingency table.
39Quick Tips
- The degrees of freedom for any contingency table
is given by (number of rows 1)?(number of
columns 1). In this example, - df (2 1)?(3 1) 2.
40EXAMPLE (Continued)
- In order to test for independence using the
chi-square independence test, we must compute
expected values under the assumption that the
null hypothesis is true. - To find these expected values, we need to compute
the row totals and the column totals.
41EXAMPLE (Continued)
- The table on the next slide shows the observed
frequencies with the row and column totals. - These row and column are called marginal totals.
42EXAMPLE (Continued)
43EXAMPLE (Continued)
- Computation of the expected values (example)-
- The total for the first row (male) is 185, and
the total for the first column (appearance) is
180. The expected value for the cell in the
table where the first row (male) and first column
(appearance) intersect will be (185?180)/500
66.6.
44EXAMPLE (Continued)
- The table on the next slide shows the expected
frequencies with the marginal totals.
45EXAMPLE (Continued)
Let us use ? 0.01. So df (2 1)(3 1) 2
and ?20.01, 2 9.210.
46EXAMPLE (Continued)
- Solution (continued) The ?2 test statistic is
computed in the same manner as was done for the
goodness-of-fit test.
47EXAMPLE (Continued)
48EXAMPLE (Continued)
- Solution (continued) Diagram showing the
rejection region.
4914-4 Benfords Law
- Frank Benford, in the 1930s, noticed that
logarithm tables (these were used by scientists
long before the common use of computers and
calculators) tended to be worn out on the early
pages where the numbers started with the digit 1.
5014-4 Benfords Law
- Based on this observation and many others, he
discovered that more numbers in the real world
started with the digit 1 rather than with 2, and
that more started with the digit 2 rather than
with 3, and so on. - He later published a formula which describes the
proportion of times a number will begin with the
digit 1, 2, 3, etc.
5114-4 Benfords Law
- This formula is now called Benfords Law.
- The Table on the next slide shows the
distribution of the proportions, to three decimal
places, for the leading digits of numbers based
on Benfords Law.
5214-4 Benfords Law
The next slide shows a graphical Depiction of
Benfords Law.
5314-4 Benfords Law
5414-4 Benfords Law
- Example Students who attend college and apply
for student loans must submit a FAFSA (Free
Application for Federal Student Aid) form. Part
of the information that is required is the annual
income of the parent or parents. A sample of
3,633 forms was sampled from a college records
and the proportion, to three decimal places, of
the leading digits for the total annual income
for the parents were recorded. This information
is presented on the next slide.
5514-4 Benfords Law
- Test at the 5 percent significance level whether
the distribution of the first digits for the
reported total salaries for the parents follow
Benfords Law.
5614-4 Benfords Law
- Solution Plots of the proportions of the leading
digits for both Benfords Law and the parents
salaries are shown below.
5714-4 Benfords Law
- Solution (continued) The Table on the next slide
shows the computations needed to compute the ?2
test statistic. - The value of the test statistic is equal to
507.527. - To obtain the expected frequencies based on
Benfords Law one should multiply the total of
3,633 by Benfords proportions. - For example, from the table, the expected
frequency value of 639.408 is obtained from
3,6330.176 639.408, etc.
5814-4 Benfords Law
5914-4 Benfords Law
60EXAMPLE (Continued)
- Solution (continued) Diagram showing the
rejection region.