Statistics and Data Analysis - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Statistics and Data Analysis

Description:

Graphics show relative sizes of the two works. 3/49. Part 15 ... Generation cost ($M) and output (Millions of KWH) for 124 American electric utilities. ... – PowerPoint PPT presentation

Number of Views:611
Avg rating:3.0/5.0
Slides: 41
Provided by: William7
Category:

less

Transcript and Presenter's Notes

Title: Statistics and Data Analysis


1
Statistics and Data Analysis
  • Professor William Greene
  • Stern School of Business
  • IOMS Department
  • Department of Economics

2
Statistics and Data Analysis
Part 15 Regression Models
3
Linear Regression Models
1/49
  • Analyzing residuals
  • Violations of assumptions
  • Unusual data points
  • Hints for improving the model
  • Model building
  • Linear models cost functions
  • Semilog models growth models
  • Logs and elasticities

4
An Enduring Art Mystery
3/49
Graphics show relative sizes of the two works.
The Persistence of Statistics. Hildebrand, Ott
and Gray, 2005
Why do larger paintings command higher prices?
The Persistence of Memory. Salvador Dali, 1931
5
The Data
7/49
6
Monet in Large and Small
4/49
Sale prices of 328 signed Monet paintings
The residuals do not show any obvious patterns
that seem inconsistent with the assumptions of
the model.
Log of price a b log surface area e
7
Monet Regression
8/49
8
Using the Residuals
9/49
  • How do you know the model is good?
  • Various diagnostics to be developed over the
    semester.
  • But, the first place to look is at the residuals.

9
Residuals Can Signal a Flawed Model
10/49
  • Standard application Cost function for output
    of a production process.
  • Compare linear equation to a quadratic model (in
    logs)
  • (124 American Electric Utilities)

10
Electricity Cost Function
11
Candidate Model for Cost
11/49
Log c a b log q e
Most of the points in this area are above the
regression line.
Most of the points in this area are above the
regression line.
Most of the points in this area are below the
regression line.
12
A Missing Variable?
12/49
Residuals from the (log)linear cost model
13
A Better Model?
13/49
Log Cost a ß1 logOutput ß2 logOutput2
e
14
Candidate Models for Cost
14/49
The quadratic equation is the appropriate model.
Logc a b1 logq b2 log2q e
15
Missing Variable Included
15/49
Residuals from the quadratic cost model
Residuals from the linear cost model
16
Heteroscedasticity
16/49
  • Hetero - differences
  • Scedastic - function, variation around
    the mean
  • Arises when y is proportional to x
  • Arises sometimes when there are natural,
    heterogeneous groups

17
Heteroscedasticity
17/49
Residuals from a regression of salaries on years
of experience.
Standard deviation of the residuals seems not to
be constant.
18
Problem with the Model?
18/49
This usually suggests the model should be defined
in terms of logs of the variable.
19
Sometimes Heteroscedasticity Can Be Cured By
Taking Logs
19/49
Residuals from a regression of logs of salaries
on years of experience. Salary aeßtee We will
explore this model below.
20
Should I Worry About Heteroscedasticity?
21/49
  • Not a problem for using least squares to estimate
    a or ß.
  • But, there is a better method than least squares.
  • Assessment of the uncertainty of the least
    squares estimates may be too optimistic.
  • (Not contagious)

21
Unusual Data Points
24/49
Outliers have (what appear to be) very large
disturbances, e
Wolf weight vs. tail length The
500 most successful movies
22
Outliers (?)
25/49
Remember the empirical rule, 99.5 of
observations will lie within mean 3 standard
deviations? We show (abx) 3se below.)
Titanic is 8.1 standard deviations from the
regression! Only 0.86 of the 466 observations
lie outside the bounds. (We will refine this
later.)
These observations might deserve a close look.
23
Prices paid at auction for Monet paintings vs.
surface area (in logs)
logPrice a b logArea e
Not an outlier Monet chose to paint a
small painting. Possibly an outlier
Why was the price so low?
24
What to Do About Outliers
26/49
  • (1) Examine the data
  • (2) Are they due to mismeasurement error or
    obvious coding errors? Delete the
    observations.
  • (3) Are they just unusual observations? Do
    nothing.
  • (4) Generally, resist the temptation to remove
    outliers. Especially if the sample is large.
    (500 movies is large. 10 wolves is not.)
  • (5) Question why you think it is an outlier. Is
    it really?

25
Regression Options
29/49
26
Minitabs Opinions
32/49
Minitab uses 2S to flag large residuals.
27
On Removing Outliers
33/49
  • Be careful about singling out particular
    observations this way.
  • The resulting model might be a product of your
    opinions
  • Removing outliers might create new outliers that
    were not outliers before.
  • Statistical inferences from the model will be
    incorrect.

28
Mechanically Remove Outliers?
29
Removing Outliers Creates Outliers
Were they really outliers?
30
Normal Distribution of ei?
34/49
31
Probability Plot
35/49
Graph -gt Probability Plots
32
Using and Interpreting the Model
36/49
  • Interpreting the linear model
  • Semilog and growth models
  • Log-log model and elasticities

33
Statistical Cost Analysis
37/49
The units of the LHS and RHS must be the same. M
cost a b MKWH Y cost a cost
2.444 M b M /MKWH 0.005291
M/MKWH So,.. a fixed cost total cost if
MKWH 0 b marginal cost dCost/dMKWH b MKWH
variable cost
Generation cost (M) and output (Millions of KWH)
for 124 American electric utilities. (1970).
34
Semilog Models and Growth Rates
38/49
LogSalary 9.84 0.05 Years e
35
Growth in a Semilog Model
39/49
36
Using Semilog Models for Trends
40/49
Frequent Flyer Flights for 72 Months. (Text, Ex.
11.1, p. 508)
37
Regression Approach
41/49
  • logFlights a ß Months e
  • a 2.770, b 0.03710, s 0.06102

38
Elasticity and Loglinear Models
43/49
  • logY a ßlogx e
  • The responsiveness of one variable to changes
    in another
  • E.g., in economics demand elasticity (?Q) /
    (?P)
  • Math Ratio of percentage changes
  • ?Q / ?P 100(?Q )/Q / 100(?P)/P
  • Units of measurement and the 100 fall out of
    this eqn.
  • Elasticity (?Q/?P)(P/Q)
  • Elasticities are units free

39
Monet Regression
8/49
40
Summary
49/49
  • Residual analysis
  • Consistent with model assumptions?
  • Suggest missing elements in the model
  • Building the regression model
  • Interpreting the model cost function
  • Growth model semilog
  • Double log and estimating elasticities
Write a Comment
User Comments (0)
About PowerShow.com