Statistics and Data Analysis

1 / 56

About This Presentation

Title:

Statistics and Data Analysis

Description:

Part 18 Multiple Regression:2. Statistics and Data Analysis. Professor William Greene ... (3) What about that David Beckham contract with Major League Soccer? 51/52 ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 57

Provided by: William7

more less

Transcript and Presenter's Notes

Title: Statistics and Data Analysis

1
Statistics and Data Analysis

Professor William Greene
Stern School of Business
IOMS Department
Department of Economics

2
Statistics and Data Analysis
Part 18 Multiple Regression 2
3
Multiple Regression Models
1/52
Part 17 Multiple Regression2

Using Minitab To Compute A Multiple Regression
Basic Multiple Regression
Using Binary Variables
Logs and Elasticities
Hedonic Regression and Interpretation
Trends in Time Series Data
Using Quadratic Terms to Improve the Model

4
Application WHO
2/52

Data Used in Assignment 1 WHO data on 191
countries in 1995-1999.
Analysis of Disability Adjusted Life Expectancy
DALE
EDUC average years of education
PCHexp Per capita health expenditure
DALE a ß1EDUC ß2HealthExp e

5
The (Famous) WHO Data
3/52
6
4/52
7
Specify the Variables in the Model
5/52
8

6/52
9
Graphs? Maybe
7/52
10
Regression Results
8/52
11
Practical Model Building
9/52

Understanding the regression The left out
variable problem
Using different kinds of variables
Dummy variables
Logs
Time trend
Quadratic

12
A Fundamental Result
10/52

What happens when you leave a crucial
variable out of your model? (Bad things)

Regression Analysis g versus GasPrice (no
income) The regression equation is g 3.50
0.0280 GasPrice Predictor Coef SE Coef
T P Constant 3.4963 0.1678 20.84
0.000 GasPrice 0.028034 0.002809 9.98
0.000 Regression Analysis G versus GasPrice,
Income The regression equation is G 0.134 -
0.00163 GasPrice 0.000026 Income Predictor
Coef SE Coef T P Constant
0.13449 0.02081 6.46 0.000 GasPrice
-0.0016281 0.0004152 -3.92 0.000 Income
0.00002634 0.00000231 11.43 0.000
13
Using Dummy Variables
11/52

Dummy variable binary variable a variable
that takes values 0 and 1.
E.g. OECD Life Expectancies compared to the rest
of the world
DALE a ß1 EDUC ß2 PCHexp
ß3 OECD e

Australia, Austria, Belgium, Canada, Czech
Republic, Denmark, Finland, France, Germany,
Greece, Hungary, Iceland, Ireland, Italy, Japan,
Korea, Luxembourg, Mexico, The Netherlands, New
Zealand, Norway, Poland, Portugal, Slovak
Republic, Spain, Sweden, Switzerland, Turkey,
United Kingdom, United States.
14
OECD Life Expectancy
12/52
According to these results, after accounting for
education and health expenditure differences,
people in the OECD countries have a life
expectancy that is 1.19 years shorter than people
in other countries.
15
Binary Variable in Regression
13/52
The regression shifts down by 1.191 years for the
OECD countries
NonOECD DALE 36.770 2.9962 EDUC
.005079 PCHExp
OECD DALE 36.770 2.9962 EDUC
.005079 PCHExp 1.191
We set PCHExp to 1000, approximately the sample
mean.
16
Plotting
For DALE_NonOECD, remove -1.191
17
Two Plots
18
Dummy Variable in Log Regression
14/52

E.g., Monets signature equation
LogPrice a ß1 logArea ß2 Signed
Unsigned PriceU exp(a) Areaß1
Signed PriceS exp(a) Areaß1 exp(ß2)
Signed/Unsigned exp(ß2)
Difference 100(Signed-Unsigned)/Unsigned
100exp(ß2) 1

19
The Signature Effect 253
15/52
100exp(1.2618) 1 1003.532 1 253.2
20
Monet Paintings in Millions
16/52
Difference is about 253
Predicted Price is exp(4.1221.3458logArea1.2618
Signed) / 1000000
21

17/52
22
Dummy Variable for One Observation
18/52
ProofsSee p. 40.

Single out one observation for special attention.
The equation will predict that observation
perfectly.
For the other coefficients, it is the same as
removing that observation from the sample.

23
A London Effect on UK Electronic Store Sales?
19/52
24
20/52
Observation 2 is LondonFit Actual, Residual0.
25
Logs in Regression
21/52
26
Elasticity
22/52

The coefficient on log(Area) is 1.346
For each 1 increase in area, price goes up by
1.34 - even accounting for the signature effect.
The elasticity is 1.34
Remarkable. Not only does price increase with
area, it increases faster than area.

27
Monet By the Square Inch
23/52
28
Elasticities of Demand for Gasoline
24/52
29
Logs and Elasticities
25/52

Theory In the equationy a ß1x1 ß2x2
ßKxK e
ß (change in y) / (unit change in x)
Elasticity ß mean of x / mean of y
When the variables are in logs change in logx
change in x
log y a ß1 log x1 ß2 log x2 ßK
log xK e
Elasticity ß
These will often give approximately the same
answer.
When in doubt, use logs.

30
Elasticities
26/52
Price elasticity -0.02070 Income
elasticity 1.10318
31
A Set of Dummy Variables
27/52

Complete set of dummy variables divides the
sample into groups.
Fit the regression with group effects.
Need to drop one (any one) of the variables to
compute the regression. (Avoid the dummy
variable trap.)

32
Rankings of 132 U.S.Liberal Arts Colleges
28/52
Nancy Burnett Journal of Economic Education,
1998
Reputationaß1Religious ß2GenderEcon
ß3EconFac ß4North
ß5South ß6Midwest ß7West e
33
Minitab to the Rescue
29/52
34
Unordered Categorical Variables
30/52
House price data (fictitious) Type 1 Split
levelType 2 RanchType 3 ColonialType 4
Tudor Use 3 dummy variables for this kind of
data. (Not all 4) Using variable STYLE in the
model makes no sense. You could change the
numbering scale any way you like. 1,2,3,4 are
just labels.
35
Transform Style to Types
31/52
36
32/52
37
House Price Regression
33/52
Each of these is relative to a Split Level, since
that is the omitted category. E.g., the price of
a Ranch house is 74,369 less than a Split Level
of the same size with the same number of bedrooms.
38
Ordered Categories
34/52

Health Satisfaction1Poor, 2So_so, 3OK,
4Good, 5Great
How to handle such a variable?
Just use as is? No, So_so Poor 1, but this is
not equal to Great Good 1 (necessarily)
Use 4 of the indicator variables.
Coding. It is not useful to consider
modifications of the variable, such as
-2,-1,0,1,2 or 2,4,6,8,10. None make sense as
this is just a label. Could also use 1,4,8,17,26
which would also make no sense.
This needs a special kind of model if it is the
dependent variable not a regression equation.

39
Hedonic Regression
35/52

A theory of prices
Price sum of prices for components
House price
Land size
Rooms Fixed amount per room
Swimming pool
View
N car garage
Etc.
Computers
Speed
Screen size
Other features

40
Fumiro Computer Data
36/52
41
Transform Manufacturer Names to Indicator
Variables
37/52
Calc ? Make Indicator Variables
42
Hedonic Regression
38/52
43
Time Trends in Regression
39/52

y a ß1x ß2t e ß2 is the year to
year increase not explained by anything
else.
log y a ß1log x ß2t e (not log t,
just t) 100ß2 is the year to year
increase not explained by anything else.

44
Time Trend Regression
40/52
After accounting for Income, the price and the
price of new cars, per capita gasoline
consumption falls by 1.25 per year. I.e., if
Income and the prices were unchanged, consumption
would fall by 1.25. But, of course, these other
things do not remain unchanged.
45
Nonlinear Equation
41/52

Using a quadratic (like using logs)
y a ß1x ß2x2 e
Usually ß1 gt 0.
If ß2 gt 0 If ß2 lt 0

y
y
x
x
46
A Quadratic Income vs. Age Regression
42/52
-------------------------------------------------
--- LHSHHNINC Mean
.3520836 Standard deviation
.1769083 Model size Parameters
3 Degrees
of freedom 27323 Residuals
Sum of squares 794.9667
Standard error of e .1705730
Fit R-squared
.7040754E-01 ----------------------------------
------------------ --------------------------
------ Variable Coefficient Mean of
X ---------------------------------
Constant -.39266196 AGE .02458140
43.5256898 AGESQ -.00027237 2022.85549
EDUC .01994416 11.3206310 ------------
---------------------
Note the coefficient on Age squared is negative.
Age ranges from 25 to 65.
47
Implied By The Model
43/52
48
Case Study A Huge Sports Contract
44/52

Alex Rodriguez hired by the Texas Rangers for
something like 25 million per year.
Costs the salary plus and minus some fine
tuning of the numbers
Benefits more fans in the stands.
How to determine if the benefits exceed the
costs? Use a regression model.

49
PDV of the Costs
45/52

Using 8 discount factor
Accounting for all costs
Roughly 21M to 28M in each year from 2001 to
2010, then the deferred payments from 2010 to
2020
Total costs About 165 Million in 2001 (Present
discounted value)

50
Benefits
46/52

More fans in the seats
Gate
Parking
Merchandise
Increased chance at playoffs and world series
Sponsorships
(Loss to revenue sharing)
Franchise value

51
How Many New Fans?
47/52

Projected 8 more wins per year.
What is the relationship between wins and
attendance?
Not known precisely
Many empirical studies (The Journal of Sports
Economics)
Use a regression model to find out.

52
A Regression Model
48/52

Based on 10 years of baseball data on wins and
attendance
Approximately (depends on your model)
This years attendance
team specific constant
20,000 Number of Wins
6,000 Last Years Number of Wins
.42 Last Years Attendance
error

53
Marginal Value of a Win
49/52

Roughly, increase in this years attendance if
the team wins one more game
(20,000 6,000) / (1 - .42)
About 45,000 fans per year per win

54
Marginal Value of an A Rod
50/52

8 games 45,000 fans 360,000 fans
360,000 fans
18 per ticket
2.50 parking etc.
1.80 stuff (hats, bobble head dolls,)
8.0 Million per year !!!!! Its not close.
(Marginal cost is about 16.5M / year)

55
Postscripts
51/52

(1) Texas was not out of last place for a single
day while A-Rod was on the team. Was it worth it?
You make the call.
(2) What about the Yankees they now pay most of
the same costs. Is it worth it? How would you
find out?
(3) What about that David Beckham contract with
Major League Soccer?

56
Summary
52/52