PS 300 Notes Part II Statistical Tests - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

PS 300 Notes Part II Statistical Tests

Description:

We can use the properties of probability density functions to make probability ... Yate's Corrected. 2 For use with a 2x2 table with low cell frequencies (5 n 10) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 52
Provided by: keith351
Category:

less

Transcript and Presenter's Notes

Title: PS 300 Notes Part II Statistical Tests


1
PS 300 Notes Part II Statistical Tests
  • Notes Version June 14, 2007

2
Statistical tests
  • We can use the properties of probability density
    functions to make probability statements about
    the likelihood of events occurring.
  • The standard normal curve provides us with a
    scale or benchmark for the likelihood of being at
    (or above or below) any point on the scale

3
The Standard Normal Table
  • Area under the normal curve
  • Also available in your Essential of Political
    Analysis text
  • Table 5.2 on page 110

4
Standard normal values
  • Note for instance that if we look at the value
    1.5 under the standard normal table, we find the
    value .4332.
  • This means that the probability of having a
    standard normal value greater than 1.5 is .5 -
    .4332 .0668

5
Standard Normal Transform
  • Any data series X1, X2, Xi may be converted to
    a standardized score.
  • A standardized score has a mean of 0 and a
    standard deviation of 1
  • We call this a standardized variable a z-score,
    or a z-transformed variable
  • The series Z has the original distributional
    characteristics of the original data (Normal,
    Uniform, Poisson, etc). It simple has a mean of
    0.0 and s1.0.

6
Using z-scores
  • We can use the z-score of a normally distributed
    variable to obtain probability estimates of
    ranges of scores
  • e.g. the probability that a persons IQ will be
    between 100 and 120
  • To do this we convert the range to a new range
    expressed in z-scores
  • So if IQ N(100,20) then ZIQ N(0,1)
  • Where

7
Z-Scores an Example
8
In Applied Terms
  • If IQ has a mean of 100, and a standard deviation
    of 20, what is the probability that any given
    individuals IQ will be greater than or equal to
    130.
  • Standardize the score of 130
  • Look up 1.5 in the standard normal table

9
Two-tailed hypotheses
  • In general our hypothesis is
  • Did the sample come from some particular
    population?
  • If the sample mean is too high or too low, we
    suspect that it did not.
  • Thus, we must check to see if the sample mean is
    either significantly higher, or significantly
    lower.
  • This is called a two-tailed test.
  • When in doubt, most tests are best done as
    two-tailed ones

10
The One Tailed Hypothesis
  • Sometimes we suspect, or hypothesize, direction
  • e.g.
  • The average income for West Virginia will be
    significantly lower than the country as a whole.
  • HA Xbar lt ?
  • Or the level of lead in the city drinking water
    is below EPA standards
  • HA Xbar lt .015 ?g/l
  • These are one-tailed tests
  • We can ignore the tail in the direction not
    hypothesized
  • But we must pay attention to which tail that is!

11
The Z-test
  • The z-test is based upon the standard normal
    distribution.
  • It uses the standard normal distribution in the
    same way as a z-transformation. In this case we
    are making statements about the sample mean,
    instead of the actual data values

12
The Z-test (cont.)
  • Note that the Z-test is based upon two parts.
  • The standard normal transformation
  • The standard deviation of the sampling
    distribution.

13
The Z-test an example
  • Suppose that you took a sample of 25 people off
    the street in Morgantown and found that their
    personal income is 24,379
  • And you have information that the national
    average for personal income per capita is 31,632
    in 2003.
  • Is the Morgantown significantly different from
    the National Average
  • Sources
  • (1) Economagic
  • (2) US Bureau of Economic Analysis

14
What to conclude?
  • Should you conclude that West Virginia is lower
    than the national average?
  • Is it significantly lower?
  • Could it simple be a randomly bad sample
  • Assume that it is not a poor sampling technique
  • How do you decide?

15
Example (cont.)
  • We will hypothesize that WV income is lower than
    the national average.
  • HA WVInc lt USInc (Alternate Hypothesis)
  • H0 WVInc USInc (Null Hypothesis)
  • Since we know the national average (31,632) and
    standard deviation (15000), we can use the z-test
    to decide if WV is indeed significantly lower
    than the nation

16
Example (cont.)
  • Using the z-test, we get

17
The Probability of a Type I error
  • We would like to not make mistakes when we make
    statistical decisions.
  • We know we will.
  • With statistical inference, we have the ability
    to decide how often we find it acceptable to be
    wrong by random chance.
  • Thus we set the probability of making a Type I
    error.
  • P(Type I error) ? ?
  • By convention ?.05

18
The Critical Value of Z (cont)
  • Ok, now we know the calculated value of z
  • We know that we can make probability statements
    about z, since it is from the standard normal
    distribution
  • We know that if z 1.96 then the area out in the
    tail past 1.96 is equal to .025
  • This means that the likelihood of obtaining a
    value of z gt 1.96 by random chance in any given
    sample is less than .025.

19
The Critical Values of Z to memorize
  • Two tailed hypothesis
  • Reject the null (H0) if z ? 1.96, or z ? -1.96
  • One tailed hypothesis
  • If HA is Xbar gt ?, then reject H0 if z ? 1.645
  • If HA is Xbar lt ?, then reject H0 if z ? -1.645

20
Z test example (cont.)
  • Suppose we decided to look at a different state,
    say Oregon.
  • Would you try a 1-tailed test?
  • Which way? HA Xbar gt ? or HA Xbar lt ?
  • Without an a priori reason to hypothesize higher
    or lower, use the 2-tailed test
  • Assume Oregon has a mean of 29,340, and that we
    collected a somewhat larger sample, say 100.
  • Using the z-test, we get
  • What would we conclude? What if n25? 1000?

21
The applicability of the z-test
  • We frequently run into a problem with trying to
    do a z test.
  • The sample size may be below the number needed
    for the CLT to apply (N30)
  • While the population mean (?) may be frequently
    available, the population standard deviation (?)
    frequently is not.
  • Thus we use our best estimate of the population
    standard deviation the sample standard
    deviation (s).

22
The t test
  • When we cannot use the population standard
    deviation, we must employ a different statistical
    test
  • Think of it this way
  • The sample standard deviation is biased a little
    low, but we know that as the sample size gets
    larger, it becomes closer to the true value.
  • As a result, we need a sampling distribution that
    makes small sample estimates conservative, but
    gets closer to the normal distribution as the
    sample size gets larger, and the sample standard
    deviation more closely resembles the population
    standard deviation.

23
The t-test (cont.)
  • The t-test is a very similar formula.
  • Note the two differences
  • using s instead of ?
  • The resultant is a value that has a
    t-distribution instead of a standard normal one.

24
The t distribution
  • The t distribution is a statistical distribution
    that varies according to the number of degrees of
    freedom (Sample size 1)
  • As df gets larger, the t approximates the normal
    distribution.
  • For practical purposes, the t-distribution with
    samples greater than 100 can be viewed as a
    normal distribution.

25
Selecting the critical value t-dist
  • Selecting the critical value of the
    t-distribution requires these steps.
  • Determine whether one- or two-tailed test.
  • Select a level (a.05)
  • Determine degrees of freedom (n-1)
  • Find value for t in appropriate column (table if
    one- two-tailed tests are separate tables)
  • Critical value of t is at intersection of df row
    and a-level column.
  • T-table
  • Also available in your Essentials of Political
    Analysis text
  • Table 5.3 on page 122.

26
Interpreting t-value
  • The t-test formula gives you a value that you can
    compare to the critical value.
  • If
  • Conducting a two tailed test, if the calculated
    t-value is outside the range of t to t, we
    conclude that the sample is significantly
    different that the population.
  • Note that a t-value that exceeds the critical
    value means that the probability of that t is
    less than the selected a-level.
  • Hence if t gt C.V . of t, then p(t) lt a (say .05)

27
Interpreting t-value one tailed test
  • The t-test formula gives you a value that you can
    compare to the critical value.
  • If
  • Conducting a one-tailed test, if the calculated
    t-value is greater that the critical value of t,
    or less than (critical value of t), we conclude
    that the sample is significantly different that
    the population.
  • Choice of t or t is determined by the one-tailed
    test direction.
  • Note that a t-value that exceeds the critical
    value means that the probability of that t is
    less than the selected a-level.
  • Hence if t gt C.V . of t, then p(t) lt a (say .05)

28
T-test example
  • Suppose we decided to look at Oregon, but do not
    know the population standard deviation
  • And we have a small sample anyway (N25).
  • Would you try a 1-tailed test?
  • Which way? HA Xbar gt ? or HA Xbar lt ?
  • Like the z-test, without an a priori reason to
    hypothesize higher or lower, use the 2-tailed
    test
  • Assume Oregon has a mean of 29,340, and that we
    collected a sample of 169.
  • Using the t-test, we get
  • What would we conclude? What if n25? 1000?

29
T-test another example
  • Suppose you are the environmental affairs officer
    for the city of Morgantown. You are tasked with
    determining if the measures of lead in the citys
    drinking water exceed EPA standards (.02mg/l)
  • And we have a small sample (N25).
  • Would you try a 1-tailed test?
  • Which way? HA Xbar gt ? or HA Xbar lt ?
  • Like the z-test, without an a priori reason to
    hypothesize higher or lower, use the 2-tailed
    test
  • Assume
  • Using the t-test, we get
  • What would we conclude?

30
Two-sample t-test
  • Frequently we need to compare the means of two
    different samples.
  • Is one group higher/lower than some other group?
  • e.g. is the Income of blacks significantly lower
    than whites?
  • The two-sample t difference of means test is the
    typical way to address this question.

31
Examples
  • Is the income of blacks lower than whites?
  • Are teachers salaries in West Virginia and
    Mississippi alike?
  • Is there any difference between the background
    well and the monitoring well of a landfill?

32
The Difference of means Test
  • Frequently we wish to ask questions that compare
    two groups.
  • Is the mean of A larger (smaller) than B?
  • Are As different (or treated differently) than
    Bs?
  • Are A and B from the same population?
  • To answer these common types of questions we use
    the standard two-sample t-test

33
The Difference of means Test
  • The standard two-sample t-test is

34
The standard two sample t-test
  • In order to conduct the two sample t-test, we
    only need the two samples
  • Population data is not required.
  • We are not asking whether the two samples are
    from some large population.
  • We are asking whether they are from the same
    population, whatever it may be.

35
Assumptions about the variance
  • The standard two-sample t-test makes no
    assumptions about the variances of the underlying
    populations.
  • Hence we refer to the standard test as the
    unequal variance test.
  • If we can assume that the variances of the tow
    populations are the same, then we can use a more
    powerful test the equal variance t-test.

36
The equal Variance test
  • If the variances from the two samples are the
    same we may use a more powerful variation
  • Where

37
Which test to Use?
  • In order to choose the appropriate two-sample
    t-test, we must decide if we think the variances
    are the same.
  • Hence we perform a preliminary statistical test
    the equal variance F-test.

38
The Equal Variance F-test
  • One of the fortunate properties on statistics is
    that the ratio of two variances will have an F
    distribution.
  • Thus with this knowledge, we can perform a simple
    test.

39
Interpretation of F-test
  • If we find that P(F) gt .05, we conclude that the
    variances are equal.
  • If we find that P(F) ? .05, we conclude that the
    variances are equal.
  • We then select the equal or unequal-variance
    t-test accordingly.

40
Degrees of freedom
  • Note that the degrees of freedom is different
    across the two tests
  • Equal variance test
  • Df n1 n2-2
  • Unequal variance test
  • Df complicated real number not integer

41
Contingency Tables
  • Often we have limited measurement of our data.
  • Contingency Tables are a means of looking at the
    impact of nominal and ordinal measures on each
    other.
  • They are called contingency tables because one
    variables value is contingent upon the other.
  • Also called cross-tabulation or crosstabs.

42
Contingency Tables
  • The procedure is quite simple and intuitively
    appealing
  • Construct a table with the independent variable
    across the top and the dependent variable on the
    side
  • This works fairly well for low numbers of
    categories (r,c lt 6 or so)

43
Contingency Tables An example
  • Presidents are often suspected of using military
    force to enhance their popularity.
  • What do you suppose the data actually look like?
  • Any conjectures
  • Lets categorize presidents as using force,or
    not, and as having popularity above and below 50
  • Are there definition problems here?
  • Which is independent and which is dependent?

44
Contingency Tables

45
Measures of Independence
  • Are the variables actually contingent upon each
    other?
  • Is the use of force contingent upon the
    presidents level of popularity?
  • We would like to know if these variables are
    independent of each other, or does the use of
    force actually depend upon the level of approval
    that the president have at that time?

46
?2 Test of Independence
  • The ? 2 Test of Independence gives us a test of
    statistical significance.
  • It is accomplished by comparing the actual
    observed values to those you would expect to see
    if the two variables are independent.

47
? 2 Test of Independence
  • Formula
  • Where

48
Chi-Square Table (?2)
49
Interpreting the ?2
  • The Table gives us a ?2 of 5.55 with 1 degree of
    freedom d.f. (r-1)(c-1)
  • The critical value of ?2 with 1 degree of
    freedom is 3.84 (see ?2 Table)
  • We therefore conclude that Presidential
    popularity and use of force are related.
  • We technically reject the null hypothesis that
    Presidential popularity and use of force are
    independent.
  • Note ?2 is influenced by sample size.
  • It ranges from 0.0 to ?.

50
Corrected ?2 measures
  • Small tables have slightly biased measures of ?2
  • If there are cell frequencies that are low, then
    there are some adjustments to make that correct
    the probability estimates that ?2 provides.

51
Yates Corrected ?2
  • For use with a 2x2 table with low cell
    frequencies (5ltnlt10)
  • If there are any cell frequencies lt 5, the ?2 is
    invalid.
  • Use Fishers Exact Test
Write a Comment
User Comments (0)
About PowerShow.com