Quantitative Analysis

About This Presentation

Title:

Quantitative Analysis

Description:

This deals with estimating a relationship between: ... CONCENT = Concentration ratio in an SMSA. LENDERS = Number of lending institutions in an SMSA. ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 59

Provided by: amorr8

Category:

more less

Transcript and Presenter's Notes

Title: Quantitative Analysis

1
Quantitative Analysis

BEO6501

2
Themes

These lectures will deal with regression
analysis.
This deals with estimating a relationship
between
A dependent variable and one or more
explanatory or independent variables.
E.g. Between consumption and price.
E.g. Between sales and price and income.
We can ask whether the relationship is
statistically significant.
We can estimate the strength of the relationship
We can estimate the impact of each explanatory
variable on the dependent variable.
We can use the relationship to make forecasts.

3
Reference

Refer to Hildebrand and Ott
Linear Regression and Correlation Methods.
Multiple Regression Methods.
Constructing a Multiple Regression Model.
Almost any intermediate business statistics text
will have equivalent chapters or sections that
would be useful.

4
A starting point

Data
Consider real GDP and employment.
Is there some sort of connection.
Probably, but which is the dependent variable?
Is real GDP a measure of economic activity that
determines the demand for labour?
Or is it that the level of employment that
determines output and, therefore, real GDP.
So its a good question for Macroeconomic
Principles!
Look at the data.
Because Real GDP is measured in b and
employment is measured in millions, it is better
to use index numbers for both data sets.

5
It is fairly clear that the two data sets tend to
move in the same ways over a 20 year period.
6
Scattergrams

Plot
One variable on the horizontal axis.
E.g. RGDP.
One variable on the vertical axis.
E.g. Employment.
Plot.
For each observation plot a point on the graph.
If the points form something close to a straight
line, we have a strong linear relationship
between the variables.

7
Scattergram
8
Scattergram
9
Simple linear regression

Model.
Y ?0 ?1 X ?.
Y dependent variable.
I.e. The variable we are trying to explain or
predict.
E.g. Sales.
X independent variable.
I.e. The variable we are using to explain or
predict Y.
E.g. Advertising expenditure.
? error (random with mean 0).
I.e. The net effect of all variables other than X
that influence Y.
E.g. Weather, prices, incomes and many others.
Later we will see how we can bring the more
important of these into the model.
?0 constant or intercept.
?1 coefficient or slope.
It is like a formula for calculating Y.
Actually for estimating the average value of Y.
Since ? is random, we cannot use it in the
formula.
We dont know what it will be in any instance.

10
Y
Y ?0 ?1 X
X
11
Estimation

Finding ?0 and ?1.
We need data for X and Y.
We then require the values of ?0 and ?1 that make
the equation the line of best fit.
I.e. The line that is close to as much of the
data on the scattergram as possible.
The method most often used is called ordinary
least squares (OLS).
Squares?
Whenever we use the equation to predict Y, there
will always be an error (e) because we do not
know ?.
Error e actual Y predicted Y.
Some will be positive.
Some some will be negative.
We square these errors so that they will all be
positive.
We add them to get a sum of squared errors or
ESS.
We choose ?0 and ?1 to minimise this sum.

12
Y
Regression line Y ?0 ?1 X
X
13
Y
Observation
Regression line Y ?0 ?1 X
X
14
Y
Observation
Regression line Y ?0 ?1 X
X
15
Y
Some squared errors
X
16
Estimation (cont.)

Approach
Line of best fit
Changing the constant (?0) and coefficient (?1)
changes the squares.
It changes the total of the squares.
We seek the constant and coefficients that
minimise the total area.
The following slide shows how the squared errors
might change.

17
Y
Squared errors
Moving the regression line makes some
squares bigger and others smaller.
X
18
Formulas

We have to minimise the ESS.
To understand this we need to use differential
calculus.
If you know how to use it, its easy.
ESS ?(Y - ?0 - ?1 X)2.
Differentiate ESS with respect to ?0 and set the
derivative equal to 0.
Differentiate ESS with respect to ?1 and set the
derivative equal to 0.
Solve the simultaneous equations for ?0 and ?1.
Sounds complicated, but if differential calculus
is a mystery to you you do not have to learn it!
The results?

19
Exercise

A problem from Selvanathan
Problem 10.
Twelve secretaries at the University of
Queensland were asked to take a three-day
intensive course to improve their keyboard
skills. At the beginning and the end of the
course, they were were given a particular
two-page letter and asked to type it flawlessly.
The next slide shows the data and the one after
that the relevant SAS output.

20
Exercise (cont.)

Data
Typist Experience Improvement
years wpm
A 2 9
B 6 11
C 3 8
D 8 12
E 10 14
F 5 9
G 10 14
H 11 13
I 12 14
J 9 10
K 8 9
L 10 10

21
Scattergram
Typists with more experience seem to have larger
improvements.
22
Parameter Standard
Variable DF Estimate Error
t Value Pr gt t Intercept 1
6.86269 1.19323 5.75 0.0002
exper 1 0.53881 0.14194
3.80 0.0035
Regression equation IMP 6.863 0.539 EXP
23
Exercise (cont.)

Interpretation
What does it mean?
IMP 6.863 0.539 EXP
IMP is the predicted value of the dependent
variable (improvement) for different values of
the independent variable (experience).
?0 6.863.
The average improvement after the course of a
keyboard operator with no experience (EXP 0) is
6.863 wpm.
This may be a little dangerous because we have no
data on typists with little or no experience.
?1 0.539.
The average improvement after the course of a
keyboard operator per additional year of
experience is 0.539 wpm.
I.e. We would expect the improvement of the
average typist with 11 years experience to be
0.539 wpm more than the average typist with 10
years of experience.

24
Scattergram
Regression line.
25
Errors

Large or small.
Ideally, we want the errors to be small.
Looking at the scattergram we can see
Large and small errors.
Positive and negative errors.
The average error?
Not a good idea because the sum of the errors is
always 0.
Standard error of estimate.
We average the squared errors.

26
Scattergram
Positive errors (under-estimates)
Negative errors (over-estimates)
27
Root MSE 1.49995
R-Square 0.5903 Dependent Mean 11.08333
Adj R-Sq 0.5493 Coeff Var
13.53339
28
Significance

Does Y really depend on X?
Remember that we have a small sample and are
trying to estimate a relationship between
variables in a target population.
Consider Y ?0 ?1 X ?.
If X changes, Y must change.
If X increase by 1 unit, Y increases by ?1 units.
Unless, of course, ?1 0.
Then Y doesnt depend on X.
This is how our test works.
HO ?1 0.
Y does not depend on X.
HA ?1 ? 0.
Y does depend on X.

29
Significance (cont.)

Test statistic.
We use Students t distribution (again).
Degrees of freedom.
DF number of observations number of variables.

Trust the mathematicians on this.
30
Significance (cont.)

t tests.
These tests work in exactly the same way as in
tests of hypotheses concerning mean values.
Large t scores lead us to reject the null
hypothesis.
The same critical values apply.
The modern approach considers the sig or p
values.
Reject the null hypothesis if sig or p lt 0.05 (or
some other reasonable level of significance).
p Pr?1 0 given the sample data.
Or p PrX does not explain Y given the sample
data.
We can perform one-sided tests.
HO ?1 0.
HA ?1 ? 0 (positive relationship).
HA ?1 ? 0 (negative relationship).
Divide the p value by 2.

31
Parameter Standard
Variable DF Estimate Error
t Value Pr gt t Intercept 1
6.86269 1.19323 5.75 0.0002
exper 1 0.53881 0.14194
3.80 0.0035
32
Coefficient of determination

How much of variation in Y is explained by X?
The coefficient is called R2.
If the regression line is a perfect fit, R2 1.
If the regression bears no relationship to the
data, R2 0.
The regression line would be horizontal.
I.e. As X changes, Y doesnt.

33
Scattergram
34
Scattergram
35
R2 (cont.)

Definition.
R2 ratio of explained variation to total
variation.
Recall that some variation will be positive and
some negative.
We have to square and then add.

36
R2 (cont.)

Adjusted R2.
If we have small data sets, it is likely that R2
will be quite large.
If there are only two observations, R2 1 no
matter how unlikely the relationship.
E.g. X maximum daily temperature in Melbourne
and Y daily sales of snake skin boots in New
York.
With 2 observations, R2 1!
We have an adjusted R2 that takes sample size
into account.
SAS calculates it.
This is the one to use if we want to compare
models.

37
Root MSE 1.49995
R-Square 0.5903 Dependent Mean 11.08333
Adj R-Sq 0.5493 Coeff Var
13.53339
38
Forecasting

Using the equation.
We have
IMP 6.863 0.539 EXP.
We can substitute values to EXP to forecast IMP.
E.g. EXP 5 ? IMP 6.863 0.539 5 9.59.
This is a point estimate (not a confidence
interval).
It can be thought about in two ways.
The improvement of a particular typist who has 5
years experience?
The average improvement of all typists who have 5
years experience?

39
Forecasting (cont).

Reasonable approximations.
The exact formulas are shockers!
Provided we have reasonably large data sets we
can make approximations.
t ? 2.
Only the first term in the square root matters
much.
The others are relatively small.
Formulas.
For particular values.
Y-predicted ? 2 s?.
For mean values.
Y-predicted ? 2 s?/?n.

40
Causality

Be careful.
Finding a significant and strong regression
equation between Y and X does not establish
causality.
It establishes an association.
The variables move in related ways.
E.g. We could expect to see a significant and
positive regression between
The number of murders per annum in the UK and
Membership of the Church of England.
Causality, seems doubtful.
Causal factor?
Almost certainly, population growth.

41
Multiple regression

More general models.
Few interesting problems contain only two
variables.
We cannot produce scattergrams.
We cannot draw regression lines.
It hard in 3 dimensions.
It is impossible in more than 3 dimensions.
Fortunately the math still works.
Solutions by hand are just about impossible.
SAS can do it at nearly the speed of light!

42
Multiple (cont)

Model.
Y ?0 ?1 X1 ?2 X2 . ?k Xk ?.
Y dependent variable.
E.g. Sales turnover.
k independent variables.
E.g. X1 Size of local market.
E.g. X2 Average household income in local
market.
E.g. X3 Number of competitors in local market.
? error (random with mean 0).
I.e. The net effect of all variables other than
X1, X2 and X3 that influence Y.
?0 constant or intercept.
?j coefficient or slope for variable Xj.
I.e. The average increase in Y when Xj increases
by 1 unit, ceteris paribus (meaning the other
variables dont change).

43
Multiple (cont)

Example.
Aspinwall (1970) in the Southern Economic Journal
wrote an article entitled Market Structure and
Commercial Mortgage Interest Rates.
A market was defined as a standard metropolitan
statistical area (SMSA).
Aspinwall tested the hypothesis that average
mortgage interest rates in SMSAs depend on the
amount of a loan relative to the value of the
property and monopolization within each SMSA.
a priori we would expect
High interest rates to be associated with higher
borrowing ratios.
High interest rates to be associated with greater
monopolization.

44
Multiple (cont)

Example.
Variables
INTEREST Average mortgage rate in an SMSA.
COVERAGE Average loan/price (of home) in an
SMSA.
CONCENT Concentration ratio in an SMSA.
LENDERS Number of lending institutions in an
SMSA.
The concentration ratio is the proportion of the
market in the hands of the largest 10 businesses.
Results.
The data is limited (31 observations).
The findings were not quite what was expected.
Textbook results do not always occur in real
life.
The output here is generated by SAS.

45
The MEANS Procedure Variable N Mean Std
Dev Minimum Maximum interest 31
5.61 0.20 5.22
6.16 coverage 31 65.33
2.85 60.20 70.60 concent
31 37.56 14.84
12.30 67.10 lenders 31
100.41 119.25 7.00
550.00
46
Root MSE 0.1609
R-Square 0.4505 Dependent Mean
5.6158 Adj R-Sq 0.3895
Coeff Var 2.8658
47
We test the null hypothesis that none of the
explanatory variables (LENDERS, COVERAGE or
CONCENT) is significant in explaining the
dependent variable (INTEREST).
Analysis of
Variance
Sum of Mean Source DF
Squares Square F Value Pr gt
F Model 3 0.5734
0.1911 7.38 0.0009 Error
27 0.6993
0.0259 Corrected Total 30 1.2727
p lt 0.05 so we reject the null and conclude that
at least one of LENDERS, COVERAGE or CONCENT is
significant in explaining INTEREST.
48
Theory suggests that this is unlikely to be true
and that we should expect a positive
relationship.
Parameter
Standard Variable DF Estimate
Error t Value Pr gt t Intercept 1
5.71919 0.70962 8.06
lt.0001 coverage 1 -0.00438
0.01077 -0.41 0.6874 concent
1 0.00627 0.00257
2.44 0.0215 lenders 1
-0.00052602 0.00032240 -1.63 0.1144
49
Deleting variables

When?
Variables that have large p values.
Deleting variables that have p values that are
marginally more than 0.05 seems a little too
extreme.
SAS provides sig values or p values for two-sided
tests, and in regression we often want to perform
one-sided tests.
These values could be double what we need to deal
with.
Deleting variables whose coefficients have the
wrong sign.
If the model is telling us that quantity sold
increases when the price increases, ceteris
paribus, something is certainly wrong.
In our example we can delete COVERAGE for both
reasons.

50
This accords with theory greater monopolization
associated with higher interest rates.
Parameter
Standard Variable DF Estimate
Error t Value Pr gt
t Intercept 1 5.43485
0.12046 45.12
lt.0001 concent 1 0.00617
0.00252 2.45
0.0208 lenders 1 -0.00050474
0.00031335 -1.61 0.1184
51
This accords with theory greater competition
associated with lower interest rates.
Parameter
Standard Variable DF Estimate
Error t Value Pr gt
t Intercept 1 5.43485
0.12046 45.12
lt.0001 concent 1 0.00617
0.00252 2.45
0.0208 lenders 1 -0.00050474
0.00031335 -1.61 0.1184
52
Parameter
Standard Variable DF Estimate
Error t Value Pr gt
t Intercept 1 5.43485
0.12046 45.12
lt.0001 concent 1 0.00617
0.00252 2.45
0.0208 lenders 1 -0.00050474
0.00031335 -1.61 0.1184
Equation INTEREST 5.435 0.006166 CONCENT
0.000505 LENDERS
53
Other tests

Modern regression procedures.
Obtaining a plausible model with good p values
and high R2 might not be enough.
Any of the following could lead to regression
equations being misleading.
Multicollinearity.
Two or more of the independent variables being
highly correlated.
Autocorrelation.
Successive pairs of residuals being highly
correlated in models that use time-series data.
Non-normality.
The errors being not normally distributed.
Heteroskedasticity.
The variance (standard deviation squared) of the
errors being not constant.
These tests are outside the scope of this
subject.
When problems of these sorts are identified,
there is often a means of correcting them.

54
About logarithms

Base 10.
Log 10 1.
Log 100 2.
Log 1000 3.
The logarithm in each case is the power to which
we have to raise 10 to, to get the number.
Log 1000000 6 means that 106 1000000.
Numbers that are not obvious powers of 10?
Log 200 2.3010.
The logarithm of any positive number can be
calculated by a power series formula.
Natural logarithms.
For fairly obscure mathematical reasons, we often
prefer to use natural logarithms.
These have base e (instead of 10) where e is
Eulers number (2.718).

55
Logarithmic laws

These apply to all positive numbers and any base.
Law 1
Log(a b) Log(a) Log(b).
Law 2.
Log(an) n Log(a).

56
Elasticity

Concept
In economic modelling, we are often interested in
the impact of the change in one variable on
another
In percentage terms.
And hold other variables constant (or ceteris
paribus).
Example
Suppose a price elasticity is ? -1.3.
This means that a 10 price increase leads to a
13 decrease in consumption, ceteris paribus.
Calculation

57
Constant elasticity models

Model

The ? values are elasticities (as you could
demonstrate using calculus).
Now use the log laws.
The variables are logarithms.
The coefficients are elasticities.
58
Dependent Variable LREVENUE Root MSE
0.1007 R-Square 0.7049 Dependent Mean 11.7259 Adj
R-Sq 0.6744 Coeff Var 0.8593
Parameter Variable Estimate t Value
Pr gt t Intercept 6.6578
8.62 lt.0001 LCOMP -0.3779 -5.82 lt.0001 LPOP
0.3520 6.29 lt.0001 LINCOME 0.1590
1.88 0.0700
Competitors ? 10 ? revenue ? 3.8.
Population ? 10 ? revenue ? 3.5.
Income ? 10 ? revenue ? 1.6.

Write a Comment

User Comments (0)