Statistics and Quantitative Analysis U4320 - PowerPoint PPT Presentation

About This Presentation
Title:

Statistics and Quantitative Analysis U4320

Description:

We call this the total variation in the Y's, or the Total Sum of Squares (SST) ... new terms to determine how much variation is explained by the regression line. ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 39
Provided by: CCN4
Learn more at: http://www.columbia.edu
Category:

less

Transcript and Presenter's Notes

Title: Statistics and Quantitative Analysis U4320


1
Statistics and Quantitative Analysis U4320
  • Lecture 13 Explaining Variation
  • Prof. Sharyn OHalloran

2
I. Explaining Variation R2
  • A. Breaking Down the Distances
  • Let's go back to the basics of regression
    analysis.
  • How well does the predicted line explain the
    variation in the independent variable money
    spent?

3
I. Explaining Variation R2
  • Total Variation

4
I. Explaining Variation R2
  • Total Deviation
  • Total Explained Unexplained
  • Deviation Deviation Deviation
  • The total distance from any point to is the
    sum of the distance from Y to the regression line
    plus the distance from the regression line to .

5
I. Explaining Variation R2
  • B. Sums of Squares
  • We can sum this equation across all the Y's and
    square both sides to get

6
I. Explaining Variation R2
  • 1. Total Sum of Squares (SST).
  • The term on the left-hand side of this equation
    is the sum of the squared distances from all
    points to . We call this the total variation
    in the Y's, or the Total Sum of Squares (SST).
  • 2. Regression Sum of Squares
  • The first term on the right hand side is the sum
    of the squared distances from the regression line
    to . We call it the Regression Sum of Squares,
    or SSR.

7
I. Explaining Variation R2
  • 3. Error Sum of Squares
  • Finally, the last term is the sum of the squared
    distances from the points to the regression line.
    Remember, this is the quantity that least
    squares minimizes. We call it the Error Sum of
    Squares, or SSE.
  • We can rewrite the previous equation as
  • SST SSR SSE.

8
I. Explaining Variation R2
  • C. Definition of R2
  • We can use these new terms to determine how much
    variation is explained by the regression line.
  • If the points are perfectly linear, then the
    Error Sum of Squares is 0

9
I. Explaining Variation R2
  • Here, SSR SST. The variance in the Y's is
    completely explained by the regression line.

10
I. Explaining Variation R2
  • On the other hand, if there is no relation
    between X and Y
  • Now SSR is 0 and SSESST. The regression line
    explains none of the variance in Y.

11
I. Explaining Variation R2
  • 3. Formula
  • So we can construct a useful statistic.
  • Take the ratio of the Regression Sum of Squares
    to the Total Sum of Squares
  • We call this statistic R2
  • It represents the percent of the variation in Y
    explained by the regression

12
I. Explaining Variation R2
  • R2 is always between 0 and 1.
  • For a perfectly straight line it's 1, which is
    perfect correlation.
  • For data with little relation, it's near 0.
  • R2 measures the explanatory power of your model.
  • The more of the variance in Y you can explain,
    the more powerful your model.

13
I. Explaining Variation R2
  • D. Example
  • I wanted to investigate why people have
    confidence in what they see on TV.
  • 1. Dependent variable
  • TRUSTTV 1 if has a lot of confidence
  • 2 if somewhat confidence
  • 3 if the individual has no
    confidence.
  • 2. Independent variables
  • TUBETIME number of Hours of TV watched a week
  • SKOOL years of education.
  • LIKEJPAN feelings towards Japan.
  • YELOWSTN attitudes whether the US should spend
    more on national parks.
  • MYSIGN the respondents astrological sign.

14
I. Explaining Variation R2
  • 3. Calculating R2
  • a) Correlation matrix

15
I. Explaining Variation R2
  • b) The first model is
  • TRUSTTV 2.34 - 0.0539 (TUBETIME)

16
I. Explaining Variation R2
  • How do we calculate the Total Sum of Squares?
  • SST SSR SSE
  • SST 5.90 183.61 189.51
  • Now we can calculate R2

17
I. Explaining Variation R2
  • c) Each of the 4 different models has an
    associated R2.
  • Eq 2 R2 0.035
  • Eq 3 R2 0.039
  • Eq 4 R2 .04

18
I. Explaining Variation R2
  • F. Using R2 in Practice
  • 1. Useful Tool
  • 2. Measure of Unexplained Variance
  • 3. Not a Statistical Test
  • 4. Don't Obsess about R2
  • 5. You can always improve R2 by adding variables

19
I. Explaining Variation R2
  • G. Example
  • You'll notice that R2 increase every time.
  • No matter what variables you add you can always
    increase your R2.

20
II. Adjusted R2
  • II. Adjusted R2
  • A. Definition of Adjusted R2
  • So we'd like a measure like R2, but one that
    takes into account the fact that adding extra
    variables always increases your explanatory
    power.
  • The statistic we use for this is call the
    Adjusted R2, and its formula is
  • So the Adjusted R2 can actually fall if the
    variable you add doesn't explain much of the
    variance.

21
II. Adjusted R2
  • B. Back to the Example
  • 1. Adjusted R2
  • You can see that the adjusted R2 rises from
    equation 1 to equation 2, and from equation 2 to
    equation 3.
  • But then it falls from equation 3 to 4, when we
    add in the variables for national parks and the
    zodiac.

22
II. Adjusted R2
  • 2. Calculating Adjusted R2
  • Example Equation 2

23
II. Adjusted R2
  • We calculate

24
II. Adjusted R2
  • C. Stepwise Regression
  • One strategy for model building is to add
    variables only if they increase your adjusted R2.
  • This technique is called stepwise regression.
  • However, I don't want to emphasize this approach
    to strongly. Just as people can fixate on R2
    they can fixate on adjusted R2.
  • IMPORTANT
  • If you have a theory that suggests that certain
    variables are important for your analysis then
    include them whether or not they increase the
    adjusted R2.
  • Negative findings can be important!

25
III. F Tests
  • III. F Tests
  • A. When to use an F-Test?
  • Say you add a number of variables into a
    regression model and you want to see if, as a
    group, they are significant in explaining
    variation in your dependent variable Y.
  • The F-test tells you whether a group of
    variables, or even an entire model, is jointly
    significant.
  • This is in contrast to a t-test, which tells
    whether an individual coefficient is
    significantly different from zero.

26
III. F Tests
  • B. Equations
  • To be precise, say our original equation is
  • EQ 1 Y b0 b1X1 b2X2,
  • and we add two more variables, so the new
    equation is
  • EQ 2 Y b0 b1X1 b2X2 b3X3 b4X4.
  • We want to test the hypothesis that
  • H0 b3 b4 0.
  • That is, we want to test the joint hypothesis
    that X3 and X4 together are not significant
    factors in determining Y.

27
III. F Tests
  • C. Using Adjusted R2 First
  • There's an easy way to tell if these two
    variables are not significant.
  • First, run the regression without X3 and X4 in
    it, then run the regression with X3 and X4.
  • Now look at the adjusted R2's for the two
    regressions. If the adjusted R2 went down, then
    X3 and X4 are not jointly significant.
  • So the adjusted R2 can serve as a quick test for
    insignificance.

28
III. F Tests
  • D. Calculating an F-Test
  • If the adjusted R2 goes up, then you need to do a
    more complicated test, F-Test.
  • 1. Ratio
  • Let regression 1 be the model without X3 and X4,
    and let regression 2 include X3 and X4.
  • The basic idea of the F statistic, then, is to
    compute the ratio

29
III. F Tests
  • 2. Correction
  • We have to correct for the number of independent
    we add.
  • So the complete statistic is
  • Remember that k is the total number of
    independent variables, including the ones that
    you are testing and the constant.

30
III. F Tests
  • 2. Correction (cont.)
  • This equation defines an F statistic with m and
    n-k degrees of freedom.
  • We write it like this
  • To get critical values for the F statistic, we
    use a set of tables, just like for the normal and
    t-statistics.

31
III. F Tests
  • E. Example
  • 1. Adding Extra Variables Are a group of
    variables jointly significant?
  • Are the variables YELOWSTN and MYSIGN jointly
    significant?

32
III. F Tests
  • 1. Adding Extra Variables (cont.)
  • a) State the null hypothesis
  • b) Calculate the F-statistic
  • Our formula for the F statistic is

33
III. F Tests
  • What is SSE1 --the sum of squared errors in the
    first regression?
  • What is SSE2, the sum of squared errors in the
    second regression?
  • m 2 N 470 k 6
  • The formula is

34
III. F Tests
  • c) Reject or fail to reject the null hypothesis?
  • The critical value at the 5 level
  • from the table, is 3.00.
  • Is the F-statistic gt ?
  • If yes, then we reject the null hypothesis that
    the variables are not significantly different
    from zero otherwise we fail to reject.
  • .319 lt 3.00, so we can reject the null
    hypothesis.

35
III. F Tests
  • 2. Testing All Variables Is the Model
    Significant?
  • Equation 2

36
III. F Tests
  • a) Hypothesis
  • Again, we start with our formula

37
III. F Tests
  • b) Calculate F-statistic
  • SSE2 182.78
  • SSE1 is the sum of squared errors when there are
    no explanatory variables at all.
  • If there are no explanatory variables, then SSR
    must be 0. In this case, SSESST.
  • So we can substitute SST for SSE1 in our formula.
  • SST SSR SSE 6.738 182.78 189.54
  • This is the number reported in your printout
    under the F statistic.

38
III. F Tests
  • c) Reject or fail to reject the null hypothesis?
  • The critical value at the 5 level, from
    your table, is 3.00.
  • So this time we can reject the null hypothesis
    that b1 b2 0.
Write a Comment
User Comments (0)
About PowerShow.com