AP Statistics - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

AP Statistics

Description:

AP Statistics Inference Chapter 14 Hypothesis Tests: Slopes Given: Observed slope relating Education to Job Prestige = 2.47 Question: Can we generalize this to ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 21
Provided by: HomeC161
Category:

less

Transcript and Presenter's Notes

Title: AP Statistics


1
AP Statistics
  • Inference Chapter 14

2
Hypothesis Tests Slopes
  • Given Observed slope relating Education to Job
    Prestige 2.47
  • Question Can we generalize this to the
    population of all Americans?
  • How likely is it that this observed slope was
    actually drawn from a population with slope 0?
  • Solution Conduct a hypothesis test
  • Notation slope b, population slope b
  • H0 Population slope b 0
  • H1 Population slope b ? 0 (two-tailed test)

3
Review Slope Hypothesis Tests
  • What information lets us to do a hypothesis test?
  • Answer Estimates of a slope (b) have a sampling
    distribution, like any other statistic
  • It is the distribution of every value of the
    slope, based on all possible samples (of size N)
  • If certain assumptions are met, the sampling
    distribution approximates the t-distribution
  • Thus, we can assess the probability that a given
    value of b would be observed, if b 0
  • If probability is low below alpha we reject H0

4
Review Slope Hypothesis Tests
  • Visually If the population slope (b) is zero,
    then the sampling distribution would center at
    zero
  • Since the sampling distribution is a probability
    distribution, we can identify the likely values
    of b if the population slope is zero

5
Bivariate Regression Assumptions
  • Assumptions for bivariate regression hypothesis
    tests
  • 1. Random sample
  • Ideally N gt 20
  • But different rules of thumb exist. (10, 30,
    etc.)
  • 2. Variables are linearly related
  • i.e., the mean of Y increases linearly with X
  • Check scatter plot for general linear trend
  • Watch out for non-linear relationships (e.g.,
    U-shaped)

6
Bivariate Regression Assumptions
  • 3. Y is normally distributed for every outcome
    of X in the population
  • Conditional normality
  • Ex Years of Education X, Job Prestige (Y)
  • Suppose we look only at a sub-sample X 12
    years of education
  • Is a histogram of Job Prestige approximately
    normal?
  • What about for people with X 4? X 16
  • If all are roughly normal, the assumption is met

7
Bivariate Regression Assumptions
  • Normality

8
Bivariate Regression Assumptions
  • 4. The variances of prediction errors are
    identical at different values of X
  • Recall Error is the deviation from the
    regression line
  • Is dispersion of error consistent across values
    of X?
  • Definition homoskedasticity error dispersion
    is consistent across values of X
  • Opposite heteroskedasticity, errors vary with
    X
  • Test Compare errors for X12 years of education
    with errors for X2, X8, etc.
  • Are the errors around line similar? Or different?

9
Bivariate Regression Assumptions
  • Homoskedasticity Equal Error Variance

Here, things look pretty good.
10
Bivariate Regression Assumptions
  • Heteroskedasticity Unequal Error Variance

This looks pretty bad.
11
Bivariate Regression Assumptions
  • Notes/Comments
  • 1. Overall, regression is robust to violations
    of assumptions
  • It often gives fairly reasonable results, even
    when assumptions arent perfectly met
  • 2. Variations of regression can handle
    situations where assumptions arent met
  • 3. But, there are also further diagnostics to
    help ensure that results are meaningful

12
Regression Hypothesis Tests
  • If assumptions are met, the sampling distribution
    of the slope (b) approximates a T-distribution
  • Standard deviation of the sampling distribution
    is called the standard error of the slope (sb)
  • Population formula of standard error
  • Where se2 is the variance of the regression error

13
Regression Hypothesis Tests
  • Estimating se2 lets us estimate the standard
    error
  • Now we can estimate the S.E. of the slope

14
Regression Hypothesis Tests
  • Finally A t-value can be calculated
  • It is the slope divided by the standard error
  • Where sb is the sample point estimate of the
    standard error
  • The t-value is based on N-2 degrees of freedom

15
Regression Confidence Intervals
  • You can also use the standard error of the slope
    to estimate confidence intervals
  • Where tN-2 is the t-value for a two-tailed test
    given a desired a-level
  • Example Observed slope 2.5, S.E. .10
  • 95 t-value for 102 d.f. is approximately 2
  • 95 C.I. 2.5 /- 2(.10)
  • Confidence Interval 2.3 to 2.7

16
Regression Hypothesis Tests
  • You can also use a T-test to determine if the
    constant (a) is significantly different from zero
  • But, this is typically less useful to do
  • Hypotheses (a population parameter of a)
  • H0 a 0, H1 a ? 0
  • But, most research focuses on slopes

17
Regression Outliers
  • Note Even if regression assumptions are met,
    slope estimates can have problems
  • Example Outliers -- cases with extreme values
    that differ greatly from the rest of your sample
  • Outliers can result from
  • Errors in coding or data entry
  • Highly unusual cases
  • Or, sometimes they reflect important real
    variation
  • Even a few outliers can dramatically change
    estimates of the slope (b)

18
Regression Outliers
  • Outlier Example

19
Regression Outliers
  • Strategy for dealing with outliers
  • 1. Identify them
  • Look at scatterplots for extreme values
  • Or, have computer software compute outlier
    diagnostic statistics
  • There are several statistics to identify cases
    that are affecting the regression slope a lot
  • Examples Leverage, Cooks D, DFBETA
  • Computer software can even identify problematic
    cases for you but it is preferable to do it
    yourself.

20
Regression Outliers
  • 2. Depending on the circumstances, either
  • A) Drop cases from sample and re-do regression
  • Especially for coding errors, very extreme
    outliers
  • Or if there is a theoretical reason to drop cases
  • Example In analysis of economic activity,
    communist countries differ a lot
  • B) Or, sometimes it is reasonable to leave
    outliers in the analysis
  • e.g., if there are several that represent an
    important minority group in your data
  • When writing papers, identify if outliers were
    excluded (and the effect that had on the
    analysis).
Write a Comment
User Comments (0)
About PowerShow.com