Regression - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Regression

Description:

From (10) by definition ... the F distribution with 1 degree of freedom in the ... Time index, t = 0 for 1968-69, t=1 for 1969-70 etc. UCBUD(t) = a b*t e(t) ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 54
Provided by: lladph
Category:

less

Transcript and Presenter's Notes

Title: Regression


1
Regression
  • Econ 240A

2
Retrospective
  • Week One
  • Descriptive statistics
  • Exploratory Data Analysis
  • Week Two
  • Probability
  • Binomial Distribution
  • Week Three
  • Normal Distribution
  • Interval Estimation, Hypothesis Testing, Decision
    Theory

3
Last Thursday and Previous Tuesday
  • Bivariate Relationships
  • Correlation and Analysis of Variance

4
Outline
  • A cognitive device to help understand the
    formulas for estimating the slope and the
    intercept, as well as the analysis of variance
  • Table of Analysis of Variance (ANOVA) for
    regression
  • F distribution for testing the significance of
    the regression, i.e. does the independent
    variable, x, significantly explain the dependent
    variable, y?

5
Outline (Cont.)
  • The Coefficient of Determination, R2, and the
    Coefficient of Correlation, r.
  • Estimate of the error variance, s2.
  • Hypothesis tests on the slope, b.

6
Part I A Cognitive Device
7
A Cognitive Device The Conceptual Model
  • (1) yi a bxi ei
  • Take expectations , E
  • (2) E yi a bE xi E ei, where
  • assume (3) E ei 0
  • Subtract (2) from (1) to obtain model in
    deviations
  • (4) yi - E yi bxi - E xi ei
  • Multiply (3) by xi - E xi and take
    expectations

8
A Cognitive Device (Cont.)
  • (5) Eyi - E yi xi - E xi bExi - E xi
    2 Eei xi - E xi , where assume
  • Eei xi - E xi 0, i.e. e and x are
    independent
  • By definition, (6) cov yx b var x, i.e.
  • (7) b cov yx/ var x
  • The corresponding empirical estimate, by the
    method of moments

9
A Cognitive Device (Cont.)
  • The empirical counter part to (2)
  • Square both sides of (4), and take expectations,
  • (10) E yi - E yi 2 b2Exi - E xi 2
    2Eeixi - E xi Eei2
  • Where (11) Eeixi - E xi 0 , i.e. the
    explanatory variable x and the error e are
    assumed to be independent, cov ex 0

10
A Cognitive Device (Cont.)
  • From (10) by definition
  • (11) var y b2 var x var e, this is the
    partition of the total variance in y into the
    variance explained by x, b2 var x , and the
    unexplained or error variance, var e.
  • the empirical counterpart to (11) is the total
    sum of squares equals the explained sum of
    squares plus the unexplained sum of squares

11
A Cognitive Device (Cont.)
  • From Eq. 7, substitute for b in Eq. 11
  • Var y covyx2/Var x Var e
  • Divide by Var y 1 covyx2/varyvarx var
    e/var y
  • or 1 r2 var e/var y where r is the
    correlation coefficient

12
Population Model and Sample Model Side by Side
13
Conceptual Vs. Fitted Model
  • Conceptual
  • (1) yi a bxi ei
  • Take expectations, E
  • (2) Ey a bEx Eei
  • (3) Where Eei 0
  • Subtract (2) from (1)
  • (4)yi - Ey bxi -Ex ei
  • Fitted
  • Minimize

14
Conceptual Vs. Fitted (Cont.)
  • Fitted
  • First order condition
  • compare (3) (vi)
  • From (v) the fitted line goes through the sample
    means
  • Conceptual
  • Multiply (4) by xi - Ex and take expectations,
    E
  • E yi - Ey xi -Ex bE xi -Ex2 Eei xi
    -Ex,
  • (5) where Eei xi -Ex 0
  • (6) covyx bvarx
  • (7) b covyx/varx

15
Conceptual vs. Fitted (Cont.)
16
Part II ANOVA in Regression
17
ANOVA
  • Testing the significance of the regression, i.e.
    does x significantly explain y?
  • F1, n -2 EMS/UMS
  • Distributed with the F distribution with 1 degree
    of freedom in the numerator and n-2 degrees of
    freedom in the denominator

18
Table of Analysis of Variance (ANOVA)
F1,n -2 Explained Mean Square / Error Mean
Square
19
Example from Lab Four
  • Linear Trend Model for UC Budget

20
(No Transcript)
21
Time index, t 0 for 1968-69, t1 for 1969-70
etc UCBUD(t) a bt e(t)
22
Example from Lab Four
  • Exponential trend model for UC Budget
  • UCBud(t) expabte(t)
  • taking the logarithms of both sides
  • ln UCBud(t) a bt e(t)

23
(No Transcript)
24
Ln UCBUD(t) a bt e(t)
Exp(-0.929) 0.3949
25
Ln ucbud(t) a bt e(t)
26
Part III The F Distribution
27
The F Distribution
  • The density function of the F distributionn
    1 and n2 are the numerator and denominator
    degrees of freedom.

28
The F Distribution
  • This density function generates a rich family of
    distributions, depending on the values of n1 and
    n2

n1 5, n2 10 n1 50, n2 10
n1 5, n2 10 n1 5, n2 1
29
Determining Values of F
  • The values of the F variable can be found in the
    F table, Table 6(a) in Appendix B for a type I
    error of 5, or Excel .
  • The entries in the table are the values of the F
    variable of the right hand tail probability (A),
    for which P(Fn1,n2gtFA) A.

30
Time index, t 0 for 1968-69, t1 for 1969-70
etc UCBUD(t) a bt e(t)
F1, 36 (n-2)R2/(1 R2) 36(0.934/0.066)
509
31
1 dof
36 dof F1,36 4.13
32
Part IV The Pearson Coefficient of Correlation,
r
  • The Pearson coefficient of correlation, r, is
    (13) r cov yx/var x1/2 var y1/2
  • Estimated counterpart
  • Comparing (13) to (7) note that
    (15) rvar y1/2 /var x1/2 b

33
A Cognitive Device (Cont.)
  • (5) Eyi - E yi xi - E xi bExi - E xi
    2 Eei xi - E xi , where assume
  • Eei xi - E xi 0, i.e. e and x are
    independent
  • By definition, (6) cov yx b var x, i.e.
  • (7) b cov yx/ var x
  • The corresponding empirical estimate

34
Part IV (Cont.) The coefficient of Determination,
R2
  • For a bivariate regression of y on a single
    explanatory variable, x, R2 r2, i.e. the
    coefficient of determination equals the square of
    the Pearson coefficient of correlation
  • Using (14) to square the estimate of r

35
Part IV (Cont.)
  • Using (8), (16) can be expressed as
  • And so
  • In general, including multivariate regression,
    the estimate of the coefficient of determination,
    , can be calculated from (21) 1
    -USS/TSS .

36
Part IV (Cont.)
  • For the bivariate regression, the F-test can be
    calculated from
    F1, n-2 (n-2)/1ESS/TSS/USS/TSS
    F1, n-2 (n-2)/1ESS/USS(n-2)
  • For a multivariate regression with k explanatory
    variables, the F-test can be calculated as
    Fk, n-2
    (n-k-1)/kESS/USS Fk,
    n-2 (n-k-1)/k

37
Time index, t 0 for 1968-69, t1 for 1969-70
etc UCBUD(t) a bt e(t)
R2 1 USS/TSS 1 2.0794/29.6019 0.93
38
Part VEstimate of the Error Variance
  • Var ei s2
  • Estimate is unexplained mean square, UMS
  • Standard error of the regression is

39
Time index, t 0 for 1968-69, t1 for 1969-70
etc UCBUD(t) a bt e(t)
40
Part VI Hypothesis Tests on the Slope
  • Hypotheses, H0 b0 HA bgt0
  • Test statistic
  • Set probability for the type I error, say 5
  • Note for bivariate regression, the square of the
    t-statistic for the null that the slope is zero
    is the F-statistic

41
Time index, t 0 for 1968-69, t1 for 1969-70
etc UCBUD(t) a bt e(t)
F1, 36 t2 511 22.622.6
42
Part VII Students t-Distribution
43
The Student t Distribution
  • The Student t density function
  • n is the parameter of the student t
    distribution
  • E(t) 0 V(t) n/(n 2)

(for n gt 2)
44
The Student t Distribution
n 3
n 10
45
Determining Student t Values
  • The student t distribution is used extensively in
    statistical inference.
  • Thus, it is important to determine values of tA
    associated with a given number of degrees of
    freedom.
  • We can do this using
  • t tables , Table 4 Appendix B
  • Excel

46
Using the t Table
t
t
t
t
  • The table provides the t values (tA) for which
    P(tn gt tA) A

The t distribution is symmetrical around 0
tA
-1.812
1.812
t.100
t.05
t.025
t.01
t.005
47
(No Transcript)
48
Problem 6.32 in TextTable of Joint Probabilities
49
Problem 6.32
  • The method of instruction in college and
    university applied statistics courses is
    changing. Historically, most courses were taught
    with an emphasis on manual calculation. The
    alternative is to employ a computer and a
    software package to perform the calculations. An
    analysis of applied statistics courses
    investigated whether the instructors
    educational background is primarily mathematics
    (or statistics) or some other field.

50
Problem 6.32
  • A. What is the probability that a randomly
    selected applied statistics course instructor
    whose education was in statistics emphasizes
    manual calculations?
  • What proportion of applied statistics courses
    employ a computer and software?
  • Are the educational background of the instructor
    and the way his or her course are taught
    independent?

51
Midterm 2000
  • .(15 points) The following table shows the
    results of regressing the natural logarithm of
    California General Fund expenditures, in billions
    of nominal dollars, against year beginning in
    1968 and ending in 2000. A plot of actual,
    estimated and residual values follows.
  • .How much of the variance in the dependent
    variable is explained by trend?
  • .What is the meaning of the F statistic in the
    table? Is it significant?
  • .Interpret the estimated slope.
  • .If General Fund expenditures was 68.819 billion
    in California for fiscal year 2000-2001, provide
    a point estimate for state expenditures for
    2001-2002.

52
  • Cont.
  • A state senator believes that state expenditures
    in nominal dollars have grown over time at 7 a
    year. Is the senator in the ballpark, or is his
    impression significantly below the estimated
    rate, using a 5 level of significance?
  • If you were an aide to the Senator, how might you
    criticize this regression?

53
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com