Statistics and Data Analysis

1 / 56
About This Presentation
Title:

Statistics and Data Analysis

Description:

Part 18 Multiple Regression:2. Statistics and Data Analysis. Professor William Greene ... (3) What about that David Beckham contract with Major League Soccer? 51/52 ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 57
Provided by: William7

less

Transcript and Presenter's Notes

Title: Statistics and Data Analysis


1
Statistics and Data Analysis
  • Professor William Greene
  • Stern School of Business
  • IOMS Department
  • Department of Economics

2
Statistics and Data Analysis
Part 18 Multiple Regression 2
3
Multiple Regression Models
1/52
Part 17 Multiple Regression2
  • Using Minitab To Compute A Multiple Regression
  • Basic Multiple Regression
  • Using Binary Variables
  • Logs and Elasticities
  • Hedonic Regression and Interpretation
  • Trends in Time Series Data
  • Using Quadratic Terms to Improve the Model

4
Application WHO
2/52
  • Data Used in Assignment 1 WHO data on 191
    countries in 1995-1999.
  • Analysis of Disability Adjusted Life Expectancy
    DALE
  • EDUC average years of education
  • PCHexp Per capita health expenditure
  • DALE a ß1EDUC ß2HealthExp e

5
The (Famous) WHO Data
3/52
6
4/52
7
Specify the Variables in the Model
5/52
8

6/52
9
Graphs? Maybe
7/52
10
Regression Results
8/52
11
Practical Model Building
9/52
  • Understanding the regression The left out
    variable problem
  • Using different kinds of variables
  • Dummy variables
  • Logs
  • Time trend
  • Quadratic

12
A Fundamental Result
10/52
  • What happens when you leave a crucial
    variable out of your model? (Bad things)

Regression Analysis g versus GasPrice (no
income) The regression equation is g 3.50
0.0280 GasPrice Predictor Coef SE Coef
T P Constant 3.4963 0.1678 20.84
0.000 GasPrice 0.028034 0.002809 9.98
0.000 Regression Analysis G versus GasPrice,
Income The regression equation is G 0.134 -
0.00163 GasPrice 0.000026 Income Predictor
Coef SE Coef T P Constant
0.13449 0.02081 6.46 0.000 GasPrice
-0.0016281 0.0004152 -3.92 0.000 Income
0.00002634 0.00000231 11.43 0.000
13
Using Dummy Variables
11/52
  • Dummy variable binary variable a variable
    that takes values 0 and 1.
  • E.g. OECD Life Expectancies compared to the rest
    of the world
  • DALE a ß1 EDUC ß2 PCHexp
    ß3 OECD e

Australia, Austria, Belgium, Canada, Czech
Republic, Denmark, Finland, France, Germany,
Greece, Hungary, Iceland, Ireland, Italy, Japan,
Korea, Luxembourg, Mexico, The Netherlands, New
Zealand, Norway, Poland, Portugal, Slovak
Republic, Spain, Sweden, Switzerland, Turkey,
United Kingdom, United States.
14
OECD Life Expectancy
12/52
According to these results, after accounting for
education and health expenditure differences,
people in the OECD countries have a life
expectancy that is 1.19 years shorter than people
in other countries.
15
Binary Variable in Regression
13/52
The regression shifts down by 1.191 years for the
OECD countries
NonOECD DALE 36.770 2.9962 EDUC
.005079 PCHExp
OECD DALE 36.770 2.9962 EDUC
.005079 PCHExp 1.191
We set PCHExp to 1000, approximately the sample
mean.
16
Plotting
For DALE_NonOECD, remove -1.191
17
Two Plots
18
Dummy Variable in Log Regression
14/52
  • E.g., Monets signature equation
  • LogPrice a ß1 logArea ß2 Signed
  • Unsigned PriceU exp(a) Areaß1
  • Signed PriceS exp(a) Areaß1 exp(ß2)
  • Signed/Unsigned exp(ß2)
  • Difference 100(Signed-Unsigned)/Unsigned
  • 100exp(ß2) 1

19
The Signature Effect 253
15/52
100exp(1.2618) 1 1003.532 1 253.2
20
Monet Paintings in Millions
16/52
Difference is about 253
Predicted Price is exp(4.1221.3458logArea1.2618
Signed) / 1000000
21

17/52
22
Dummy Variable for One Observation
18/52
ProofsSee p. 40.
  • Single out one observation for special attention.
  • The equation will predict that observation
    perfectly.
  • For the other coefficients, it is the same as
    removing that observation from the sample.

23
A London Effect on UK Electronic Store Sales?
19/52
24
20/52
Observation 2 is LondonFit Actual, Residual0.
25
Logs in Regression
21/52
26
Elasticity
22/52
  • The coefficient on log(Area) is 1.346
  • For each 1 increase in area, price goes up by
    1.34 - even accounting for the signature effect.
  • The elasticity is 1.34
  • Remarkable. Not only does price increase with
    area, it increases faster than area.

27
Monet By the Square Inch
23/52
28
Elasticities of Demand for Gasoline
24/52
29
Logs and Elasticities
25/52
  • Theory In the equationy a ß1x1 ß2x2
    ßKxK e
  • ß (change in y) / (unit change in x)
  • Elasticity ß mean of x / mean of y
  • When the variables are in logs change in logx
    change in x
  • log y a ß1 log x1 ß2 log x2 ßK
    log xK e
  • Elasticity ß
  • These will often give approximately the same
    answer.
  • When in doubt, use logs.

30
Elasticities
26/52
Price elasticity -0.02070 Income
elasticity 1.10318
31
A Set of Dummy Variables
27/52
  • Complete set of dummy variables divides the
    sample into groups.
  • Fit the regression with group effects.
  • Need to drop one (any one) of the variables to
    compute the regression. (Avoid the dummy
    variable trap.)

32
Rankings of 132 U.S.Liberal Arts Colleges
28/52
Nancy Burnett Journal of Economic Education,
1998
Reputationaß1Religious ß2GenderEcon
ß3EconFac ß4North
ß5South ß6Midwest ß7West e
33
Minitab to the Rescue
29/52
34
Unordered Categorical Variables
30/52
House price data (fictitious) Type 1 Split
levelType 2 RanchType 3 ColonialType 4
Tudor Use 3 dummy variables for this kind of
data. (Not all 4) Using variable STYLE in the
model makes no sense. You could change the
numbering scale any way you like. 1,2,3,4 are
just labels.
35
Transform Style to Types
31/52
36
32/52
37
House Price Regression
33/52
Each of these is relative to a Split Level, since
that is the omitted category. E.g., the price of
a Ranch house is 74,369 less than a Split Level
of the same size with the same number of bedrooms.
38
Ordered Categories
34/52
  • Health Satisfaction1Poor, 2So_so, 3OK,
    4Good, 5Great
  • How to handle such a variable?
  • Just use as is? No, So_so Poor 1, but this is
    not equal to Great Good 1 (necessarily)
  • Use 4 of the indicator variables.
  • Coding. It is not useful to consider
    modifications of the variable, such as
    -2,-1,0,1,2 or 2,4,6,8,10. None make sense as
    this is just a label. Could also use 1,4,8,17,26
    which would also make no sense.
  • This needs a special kind of model if it is the
    dependent variable not a regression equation.

39
Hedonic Regression
35/52
  • A theory of prices
  • Price sum of prices for components
  • House price
  • Land size
  • Rooms Fixed amount per room
  • Swimming pool
  • View
  • N car garage
  • Etc.
  • Computers
  • Speed
  • Screen size
  • Other features

40
Fumiro Computer Data
36/52
41
Transform Manufacturer Names to Indicator
Variables
37/52
Calc ? Make Indicator Variables
42
Hedonic Regression
38/52
43
Time Trends in Regression
39/52
  • y a ß1x ß2t e ß2 is the year to
    year increase not explained by anything
    else.
  • log y a ß1log x ß2t e (not log t,
    just t) 100ß2 is the year to year
    increase not explained by anything else.

44
Time Trend Regression
40/52
After accounting for Income, the price and the
price of new cars, per capita gasoline
consumption falls by 1.25 per year. I.e., if
Income and the prices were unchanged, consumption
would fall by 1.25. But, of course, these other
things do not remain unchanged.
45
Nonlinear Equation
41/52
  • Using a quadratic (like using logs)
  • y a ß1x ß2x2 e
  • Usually ß1 gt 0.
  • If ß2 gt 0 If ß2 lt 0

y
y
x
x
46
A Quadratic Income vs. Age Regression
42/52
-------------------------------------------------
--- LHSHHNINC Mean
.3520836 Standard deviation
.1769083 Model size Parameters
3 Degrees
of freedom 27323 Residuals
Sum of squares 794.9667
Standard error of e .1705730
Fit R-squared
.7040754E-01 ----------------------------------
------------------ --------------------------
------ Variable Coefficient Mean of
X ---------------------------------
Constant -.39266196 AGE .02458140
43.5256898 AGESQ -.00027237 2022.85549
EDUC .01994416 11.3206310 ------------
---------------------
Note the coefficient on Age squared is negative.
Age ranges from 25 to 65.
47
Implied By The Model
43/52
48
Case Study A Huge Sports Contract
44/52
  • Alex Rodriguez hired by the Texas Rangers for
    something like 25 million per year.
  • Costs the salary plus and minus some fine
    tuning of the numbers
  • Benefits more fans in the stands.
  • How to determine if the benefits exceed the
    costs? Use a regression model.

49
PDV of the Costs
45/52
  • Using 8 discount factor
  • Accounting for all costs
  • Roughly 21M to 28M in each year from 2001 to
    2010, then the deferred payments from 2010 to
    2020
  • Total costs About 165 Million in 2001 (Present
    discounted value)

50
Benefits
46/52
  • More fans in the seats
  • Gate
  • Parking
  • Merchandise
  • Increased chance at playoffs and world series
  • Sponsorships
  • (Loss to revenue sharing)
  • Franchise value

51
How Many New Fans?
47/52
  • Projected 8 more wins per year.
  • What is the relationship between wins and
    attendance?
  • Not known precisely
  • Many empirical studies (The Journal of Sports
    Economics)
  • Use a regression model to find out.

52
A Regression Model
48/52
  • Based on 10 years of baseball data on wins and
    attendance
  • Approximately (depends on your model)
  • This years attendance
  • team specific constant
  • 20,000 Number of Wins
  • 6,000 Last Years Number of Wins
  • .42 Last Years Attendance
  • error

53
Marginal Value of a Win
49/52
  • Roughly, increase in this years attendance if
    the team wins one more game
  • (20,000 6,000) / (1 - .42)
  • About 45,000 fans per year per win

54
Marginal Value of an A Rod
50/52
  • 8 games 45,000 fans 360,000 fans
  • 360,000 fans
  • 18 per ticket
  • 2.50 parking etc.
  • 1.80 stuff (hats, bobble head dolls,)
  • 8.0 Million per year !!!!! Its not close.
    (Marginal cost is about 16.5M / year)

55
Postscripts
51/52
  • (1) Texas was not out of last place for a single
    day while A-Rod was on the team. Was it worth it?
    You make the call.
  • (2) What about the Yankees they now pay most of
    the same costs. Is it worth it? How would you
    find out?
  • (3) What about that David Beckham contract with
    Major League Soccer?

56
Summary
52/52
  • Using Minitab To Compute a Regression
  • Building a Model
  • Logs
  • Dummy variables
  • Qualitative variables
  • Trends
  • Quadratics
  • Effects across time
  • All Assuming You Know the Right Variables!
Write a Comment
User Comments (0)