Title: Nonparametric Tests of Significance
1Nonparametric Tests of Significance
2Nonparametric Tests
- Nonparametric tests are tests that do not test
hypotheses about population parameters. - We generally use nonparametric tests of
significance under one of two conditions - 1) The variable we are measuring can not be
tested with parametric tests. - 2) The underlying assumptions of parametric
tests are not fulfilled. For example, the shape
of the distribution is not normal (particularly
if N is small).
3- This is typically the case when we have
categorical variables (nominal scale data) or
ordinally scaled variables. - Nominal scale data represent classes or
categories. They have no quantitative properties
(e.g., political party, religion, gender, numbers
assigned to horses in a horse race). - Ordinal scale data measurements that represent
position or order in a series (ranking political
candidates on popularity, movie ratings, places
runners finished in a race).
4Categorical Variables
- Sometimes nominal data can only be broken down
into two categories. - E.g., right or wrong, male or female, pass or
fail, heads or tails - The data are then referred to as binomial and we
have a dichotomous population. - In such a case, we define p as the probability of
obtaining one category and q as the probability
of obtaining the other.
5The Binomial Test
- Note, there is a special relationship between p
and q. - p q 1
- So q 1 p
- If p q 0.5, we can solve binomial problems
using Binomial Table M on page 352.
6An Example
- E.g., Tom claims he has a coin fixed to turn up
heads. He flips it 20 times and gets 16 heads.
Is his claim correct? Use normal decision rules. - We will call p the probability of getting heads
in one toss. - p 0.5
- We will call q the probability of getting tails
in one toss. - q 0.5
- N 20 X 16 (score on the trials)
7- H0 p 0.5
- H1 p gt 0.5
- The left column of the Table M has values of N (5
- 50). - There are four other columns. Two for one tailed
test (? at 0.05 and 0.01) and two for a
two-tailed test (? at 0.05 and 0.01). - The numbers in these columns represent X or N X
(which ever is larger).
8- We will use X (16) since it is larger than N - X
( 4) - Go down the left column to where N 20.
- Go across to the one-tailed column where ? is
0.05. If your obtained X (or N X), is greater
than or equal to the number in the Table, reject
the H0. - The number listed is 15.
- Since our Xobs 16 gt Xcrit 15, we will reject
the H0. The coin does appear to be fixed (X obs
16, p lt 0.05).
9Normal Approximation to the Binomial
- What if p q 0.5
- If N is sufficiently large, we can treat the
binomial distribution as though its a normal
distribution. - If Np and Nq are both greater than 10, we can use
normal approximation to the binomial - If p q 0.5, we can still use normal
approximation to the binomial as long as N is at
least 25. - In these cases, we can obtain a z-score.
10 11- E.g., In psychology departments in Canada, ¾ of
the students are female and only ¼ are male. A
university administrator believes his university
has a different percentage of males and females.
A sample of 48 is randomly chosen and 14 are
male. Is his claim correct? Use ? 0.05. - p 0.25 q 0.75 N 48 X 14
- H0 p 0.25
- H1 p 0.25
12Np 48(0.25) 12 Nq 48(0.75) 36
We may use normal approximation to the binomial.
Remember, zcrit for a two tailed test for
alpha at 0.05, is 1.96.
Since Zobs 0.67 lt zcrit 1.96 we do not
reject the H0. The university does not have a
different percentage of males and females (Zobs
0.67, p gt 0.05).
13Another Example
- A police inquiry claims that 68 of all drivers
speed on highways. A patroller uses his radar
gun on 100 passing cars and finds that 86 of them
were speeding. Is the inquiry accurate? Use ?
0.01. - p 0.68 q 0.32 N 100 X 86
- H0 p 0.68
- H1 p 0.68
14Np 100(0.68) 68 Nq 100(0.32) 32
We may use normal approximation to the binomial.
Zcrit 2.58
Since zobs 3.86 gt zcrit 2.58, reject
H0. The inquiry was not accurate (zobs 3.86, p
lt 0.01).
15?2 One Variable Case
- If p q 0.5, we can also use ?2.
- Also referred to as a chi square or goodness of
fit test. - This technique provides a test of whether a
significant difference exists between observed
number of cases and expected number of cases.
16Example
- E.g., A recent survey by the Center for Applied
Psychological Testing suggests that 55 of all
people are introverts, whereas 45 are
extraverts. A skeptical psychologist randomly
selects a sample of 93 subjects and gives them a
personality test. Based on the test, he finds
that 38 are extraverts and 55 are introverts. Is
he right to be skeptical? Use normal decision
rules.
17- Well make a table that compares expected
frequencies (fe) to observed frequencies (fo). - H0 fo fe
- H1 fo fe
18? 2 ? (fo - fe)2 fe
?2 (38 - 42)2 42
(55 - 51)2 51
? 2 0.38 0.31 0.69
We now compare this to a ?2crit in Table B on
page 328.
19- The ?2crit changes as the degrees of freedom
change. - df k - 1 where k represents the number of
categories.
df k - 1 2 - 1 1
? 2crit 3.841
Since ? 2 0.69 lt ? 2crit 3.841, do not
reject H0. The survey is accurate (?12 0.69,
p gt 0.05).
20?2 One Variable Case
- Importantly, ?2 can be used to analyze
categorical data that is not binomial. - That is, it can be used to analyze data which has
more than two outcomes. - E.g., A researcher wants to determine whether
there is a difference among beer drinkers living
in St. Johns in their preference for brands of
light beer.
21An Example
- 150 beer drinkers taste three brands of light
beer. Their preferences are provided below. Is
there a difference in the preference of the
brands? Use ? 0.01. - H0 fo fe
- H1 fo fe
Brand A Brand B Brand C Total fo 45
40 65 150 fe
50 50 50
22?2 ? (fo - fe)2 fe
?2 (45 - 50)2 50
(40 - 50)2 50
(65 - 50)2 50
0.5 2.00 4.5 7.00
df k - 1
3 - 1 2
Since ?2obs 7.00 lt ?2crit 9.210, do not
reject H0. There is no difference in preference
of the three brands (? 22 7.00, p gt 0.01).
23?2 Test of Independence
- So far, weve looked at cases in which we were
investigating one categorical variable (e.g.,
brand of beer, introvert/extravert). - However, we can also use ?2 to deal with cases
with more than one categorical variable in the ?2
test of independence. - When carrying out this test, we assume that the
two variables are independent, i.e., one variable
does not influence the other.
24An Example
- A bill in the United States has been proposed to
lower the legal age for drinking to 18. A
political scientist is interested in determining
whether there is a relationship between political
affiliation and opinions on this bill. He
samples 200 registered Republicans and 200
Democrats. Their opinions on the bill are
presented below. Is there a difference in
opinions between the Democrats and Republicans?
Use ? 0.05.
25- H0 Opinions of the bill are independent of
political affiliation. - H1 Opinions on the bill depends on political
affiliation.
26 Opinion For Undecided
Against Republican 68 22
110 Democrat 92 18 90
Row Totals 200 200 400
Column 100 40 200 Totals
Our expected frequencies will be based on
our obtained frequencies.
27(No Transcript)
28fe (c) (row total)(column)/N
(200)(200)/400 100
fe (d) (row total)(column)/N
(200)(160)/400 80
fe (e) (row total)(column)/N
(200)(40)/400 20
fe (f) (row total)(column)/N
(200)(200)/400 100
29Now we follow the same formula as before to
calculate ?2.
30 (68 - 80)2 (22 - 20)2 (110 - 100)2 80
20 100
(92 - 80)2 (18 - 20)2 (90 - 100)2 80
20 100
1.8 0.2 1.0 1.8 0.2 1.00 6.00
df (r - 1)(c - 1)
r rows c columns
(2 - 1)(3 - 1) 2
31?2 crit 5.991
Since ?2 obt 6.00 gt ?2 crit 5.991, we
reject the H0. Opinion of the bill depends
on political affiliation (?22 6.00, p lt 0.05).