Quantitative Analysis - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Quantitative Analysis

Description:

This deals with estimating a relationship between: ... CONCENT = Concentration ratio in an SMSA. LENDERS = Number of lending institutions in an SMSA. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 59
Provided by: amorr8
Category:

less

Transcript and Presenter's Notes

Title: Quantitative Analysis


1
Quantitative Analysis
  • BEO6501

2
Themes
  • These lectures will deal with regression
    analysis.
  • This deals with estimating a relationship
    between
  • A dependent variable and one or more
    explanatory or independent variables.
  • E.g. Between consumption and price.
  • E.g. Between sales and price and income.
  • We can ask whether the relationship is
    statistically significant.
  • We can estimate the strength of the relationship
  • We can estimate the impact of each explanatory
    variable on the dependent variable.
  • We can use the relationship to make forecasts.

3
Reference
  • Refer to Hildebrand and Ott
  • Linear Regression and Correlation Methods.
  • Multiple Regression Methods.
  • Constructing a Multiple Regression Model.
  • Almost any intermediate business statistics text
    will have equivalent chapters or sections that
    would be useful.

4
A starting point
  • Data
  • Consider real GDP and employment.
  • Is there some sort of connection.
  • Probably, but which is the dependent variable?
  • Is real GDP a measure of economic activity that
    determines the demand for labour?
  • Or is it that the level of employment that
    determines output and, therefore, real GDP.
  • So its a good question for Macroeconomic
    Principles!
  • Look at the data.
  • Because Real GDP is measured in b and
    employment is measured in millions, it is better
    to use index numbers for both data sets.

5
It is fairly clear that the two data sets tend to
move in the same ways over a 20 year period.
6
Scattergrams
  • Plot
  • One variable on the horizontal axis.
  • E.g. RGDP.
  • One variable on the vertical axis.
  • E.g. Employment.
  • Plot.
  • For each observation plot a point on the graph.
  • If the points form something close to a straight
    line, we have a strong linear relationship
    between the variables.

7
Scattergram
8
Scattergram
9
Simple linear regression
  • Model.
  • Y ?0 ?1 X ?.
  • Y dependent variable.
  • I.e. The variable we are trying to explain or
    predict.
  • E.g. Sales.
  • X independent variable.
  • I.e. The variable we are using to explain or
    predict Y.
  • E.g. Advertising expenditure.
  • ? error (random with mean 0).
  • I.e. The net effect of all variables other than X
    that influence Y.
  • E.g. Weather, prices, incomes and many others.
  • Later we will see how we can bring the more
    important of these into the model.
  • ?0 constant or intercept.
  • ?1 coefficient or slope.
  • It is like a formula for calculating Y.
  • Actually for estimating the average value of Y.
  • Since ? is random, we cannot use it in the
    formula.
  • We dont know what it will be in any instance.

10
Y
Y ?0 ?1 X
X
11
Estimation
  • Finding ?0 and ?1.
  • We need data for X and Y.
  • We then require the values of ?0 and ?1 that make
    the equation the line of best fit.
  • I.e. The line that is close to as much of the
    data on the scattergram as possible.
  • The method most often used is called ordinary
    least squares (OLS).
  • Squares?
  • Whenever we use the equation to predict Y, there
    will always be an error (e) because we do not
    know ?.
  • Error e actual Y predicted Y.
  • Some will be positive.
  • Some some will be negative.
  • We square these errors so that they will all be
    positive.
  • We add them to get a sum of squared errors or
    ESS.
  • We choose ?0 and ?1 to minimise this sum.

12
Y
Regression line Y ?0 ?1 X
X
13
Y
Observation
Regression line Y ?0 ?1 X
X
14
Y
Observation
Regression line Y ?0 ?1 X
X
15
Y
Some squared errors
X
16
Estimation (cont.)
  • Approach
  • Line of best fit
  • Changing the constant (?0) and coefficient (?1)
    changes the squares.
  • It changes the total of the squares.
  • We seek the constant and coefficients that
    minimise the total area.
  • The following slide shows how the squared errors
    might change.

17
Y
Squared errors
Moving the regression line makes some
squares bigger and others smaller.
X
18
Formulas
  • We have to minimise the ESS.
  • To understand this we need to use differential
    calculus.
  • If you know how to use it, its easy.
  • ESS ?(Y - ?0 - ?1 X)2.
  • Differentiate ESS with respect to ?0 and set the
    derivative equal to 0.
  • Differentiate ESS with respect to ?1 and set the
    derivative equal to 0.
  • Solve the simultaneous equations for ?0 and ?1.
  • Sounds complicated, but if differential calculus
    is a mystery to you you do not have to learn it!
  • The results?

19
Exercise
  • A problem from Selvanathan
  • Problem 10.
  • Twelve secretaries at the University of
    Queensland were asked to take a three-day
    intensive course to improve their keyboard
    skills. At the beginning and the end of the
    course, they were were given a particular
    two-page letter and asked to type it flawlessly.
  • The next slide shows the data and the one after
    that the relevant SAS output.

20
Exercise (cont.)
  • Data
  • Typist Experience Improvement
  • years wpm
  • A 2 9
  • B 6 11
  • C 3 8
  • D 8 12
  • E 10 14
  • F 5 9
  • G 10 14
  • H 11 13
  • I 12 14
  • J 9 10
  • K 8 9
  • L 10 10

21
Scattergram
Typists with more experience seem to have larger
improvements.
22
Parameter Standard
Variable DF Estimate Error
t Value Pr gt t Intercept 1
6.86269 1.19323 5.75 0.0002
exper 1 0.53881 0.14194
3.80 0.0035
Regression equation IMP 6.863 0.539 EXP
23
Exercise (cont.)
  • Interpretation
  • What does it mean?
  • IMP 6.863 0.539 EXP
  • IMP is the predicted value of the dependent
    variable (improvement) for different values of
    the independent variable (experience).
  • ?0 6.863.
  • The average improvement after the course of a
    keyboard operator with no experience (EXP 0) is
    6.863 wpm.
  • This may be a little dangerous because we have no
    data on typists with little or no experience.
  • ?1 0.539.
  • The average improvement after the course of a
    keyboard operator per additional year of
    experience is 0.539 wpm.
  • I.e. We would expect the improvement of the
    average typist with 11 years experience to be
    0.539 wpm more than the average typist with 10
    years of experience.

24
Scattergram
Regression line.
25
Errors
  • Large or small.
  • Ideally, we want the errors to be small.
  • Looking at the scattergram we can see
  • Large and small errors.
  • Positive and negative errors.
  • The average error?
  • Not a good idea because the sum of the errors is
    always 0.
  • Standard error of estimate.
  • We average the squared errors.

26
Scattergram
Positive errors (under-estimates)
Negative errors (over-estimates)
27
Root MSE 1.49995
R-Square 0.5903 Dependent Mean 11.08333
Adj R-Sq 0.5493 Coeff Var
13.53339
28
Significance
  • Does Y really depend on X?
  • Remember that we have a small sample and are
    trying to estimate a relationship between
    variables in a target population.
  • Consider Y ?0 ?1 X ?.
  • If X changes, Y must change.
  • If X increase by 1 unit, Y increases by ?1 units.
  • Unless, of course, ?1 0.
  • Then Y doesnt depend on X.
  • This is how our test works.
  • HO ?1 0.
  • Y does not depend on X.
  • HA ?1 ? 0.
  • Y does depend on X.

29
Significance (cont.)
  • Test statistic.
  • We use Students t distribution (again).
  • Degrees of freedom.
  • DF number of observations number of variables.

Trust the mathematicians on this.
30
Significance (cont.)
  • t tests.
  • These tests work in exactly the same way as in
    tests of hypotheses concerning mean values.
  • Large t scores lead us to reject the null
    hypothesis.
  • The same critical values apply.
  • The modern approach considers the sig or p
    values.
  • Reject the null hypothesis if sig or p lt 0.05 (or
    some other reasonable level of significance).
  • p Pr?1 0 given the sample data.
  • Or p PrX does not explain Y given the sample
    data.
  • We can perform one-sided tests.
  • HO ?1 0.
  • HA ?1 ? 0 (positive relationship).
  • HA ?1 ? 0 (negative relationship).
  • Divide the p value by 2.

31
Parameter Standard
Variable DF Estimate Error
t Value Pr gt t Intercept 1
6.86269 1.19323 5.75 0.0002
exper 1 0.53881 0.14194
3.80 0.0035
32
Coefficient of determination
  • How much of variation in Y is explained by X?
  • The coefficient is called R2.
  • If the regression line is a perfect fit, R2 1.
  • If the regression bears no relationship to the
    data, R2 0.
  • The regression line would be horizontal.
  • I.e. As X changes, Y doesnt.

33
Scattergram
34
Scattergram
35
R2 (cont.)
  • Definition.
  • R2 ratio of explained variation to total
    variation.
  • Recall that some variation will be positive and
    some negative.
  • We have to square and then add.

36
R2 (cont.)
  • Adjusted R2.
  • If we have small data sets, it is likely that R2
    will be quite large.
  • If there are only two observations, R2 1 no
    matter how unlikely the relationship.
  • E.g. X maximum daily temperature in Melbourne
    and Y daily sales of snake skin boots in New
    York.
  • With 2 observations, R2 1!
  • We have an adjusted R2 that takes sample size
    into account.
  • SAS calculates it.
  • This is the one to use if we want to compare
    models.

37
Root MSE 1.49995
R-Square 0.5903 Dependent Mean 11.08333
Adj R-Sq 0.5493 Coeff Var
13.53339
38
Forecasting
  • Using the equation.
  • We have
  • IMP 6.863 0.539 EXP.
  • We can substitute values to EXP to forecast IMP.
  • E.g. EXP 5 ? IMP 6.863 0.539 5 9.59.
  • This is a point estimate (not a confidence
    interval).
  • It can be thought about in two ways.
  • The improvement of a particular typist who has 5
    years experience?
  • The average improvement of all typists who have 5
    years experience?

39
Forecasting (cont).
  • Reasonable approximations.
  • The exact formulas are shockers!
  • Provided we have reasonably large data sets we
    can make approximations.
  • t ? 2.
  • Only the first term in the square root matters
    much.
  • The others are relatively small.
  • Formulas.
  • For particular values.
  • Y-predicted ? 2 s?.
  • For mean values.
  • Y-predicted ? 2 s?/?n.

40
Causality
  • Be careful.
  • Finding a significant and strong regression
    equation between Y and X does not establish
    causality.
  • It establishes an association.
  • The variables move in related ways.
  • E.g. We could expect to see a significant and
    positive regression between
  • The number of murders per annum in the UK and
  • Membership of the Church of England.
  • Causality, seems doubtful.
  • Causal factor?
  • Almost certainly, population growth.

41
Multiple regression
  • More general models.
  • Few interesting problems contain only two
    variables.
  • We cannot produce scattergrams.
  • We cannot draw regression lines.
  • It hard in 3 dimensions.
  • It is impossible in more than 3 dimensions.
  • Fortunately the math still works.
  • Solutions by hand are just about impossible.
  • SAS can do it at nearly the speed of light!

42
Multiple (cont)
  • Model.
  • Y ?0 ?1 X1 ?2 X2 . ?k Xk ?.
  • Y dependent variable.
  • E.g. Sales turnover.
  • k independent variables.
  • E.g. X1 Size of local market.
  • E.g. X2 Average household income in local
    market.
  • E.g. X3 Number of competitors in local market.
  • ? error (random with mean 0).
  • I.e. The net effect of all variables other than
    X1, X2 and X3 that influence Y.
  • ?0 constant or intercept.
  • ?j coefficient or slope for variable Xj.
  • I.e. The average increase in Y when Xj increases
    by 1 unit, ceteris paribus (meaning the other
    variables dont change).

43
Multiple (cont)
  • Example.
  • Aspinwall (1970) in the Southern Economic Journal
    wrote an article entitled Market Structure and
    Commercial Mortgage Interest Rates.
  • A market was defined as a standard metropolitan
    statistical area (SMSA).
  • Aspinwall tested the hypothesis that average
    mortgage interest rates in SMSAs depend on the
    amount of a loan relative to the value of the
    property and monopolization within each SMSA.
  • a priori we would expect
  • High interest rates to be associated with higher
    borrowing ratios.
  • High interest rates to be associated with greater
    monopolization.

44
Multiple (cont)
  • Example.
  • Variables
  • INTEREST Average mortgage rate in an SMSA.
  • COVERAGE Average loan/price (of home) in an
    SMSA.
  • CONCENT Concentration ratio in an SMSA.
  • LENDERS Number of lending institutions in an
    SMSA.
  • The concentration ratio is the proportion of the
    market in the hands of the largest 10 businesses.
  • Results.
  • The data is limited (31 observations).
  • The findings were not quite what was expected.
  • Textbook results do not always occur in real
    life.
  • The output here is generated by SAS.

45
The MEANS Procedure Variable N Mean Std
Dev Minimum Maximum interest 31
5.61 0.20 5.22
6.16 coverage 31 65.33
2.85 60.20 70.60 concent
31 37.56 14.84
12.30 67.10 lenders 31
100.41 119.25 7.00
550.00
46
Root MSE 0.1609
R-Square 0.4505 Dependent Mean
5.6158 Adj R-Sq 0.3895
Coeff Var 2.8658
47
We test the null hypothesis that none of the
explanatory variables (LENDERS, COVERAGE or
CONCENT) is significant in explaining the
dependent variable (INTEREST).
Analysis of
Variance
Sum of Mean Source DF
Squares Square F Value Pr gt
F Model 3 0.5734
0.1911 7.38 0.0009 Error
27 0.6993
0.0259 Corrected Total 30 1.2727
p lt 0.05 so we reject the null and conclude that
at least one of LENDERS, COVERAGE or CONCENT is
significant in explaining INTEREST.
48
Theory suggests that this is unlikely to be true
and that we should expect a positive
relationship.
Parameter
Standard Variable DF Estimate
Error t Value Pr gt t Intercept 1
5.71919 0.70962 8.06
lt.0001 coverage 1 -0.00438
0.01077 -0.41 0.6874 concent
1 0.00627 0.00257
2.44 0.0215 lenders 1
-0.00052602 0.00032240 -1.63 0.1144
49
Deleting variables
  • When?
  • Variables that have large p values.
  • Deleting variables that have p values that are
    marginally more than 0.05 seems a little too
    extreme.
  • SAS provides sig values or p values for two-sided
    tests, and in regression we often want to perform
    one-sided tests.
  • These values could be double what we need to deal
    with.
  • Deleting variables whose coefficients have the
    wrong sign.
  • If the model is telling us that quantity sold
    increases when the price increases, ceteris
    paribus, something is certainly wrong.
  • In our example we can delete COVERAGE for both
    reasons.

50
This accords with theory greater monopolization
associated with higher interest rates.
Parameter
Standard Variable DF Estimate
Error t Value Pr gt
t Intercept 1 5.43485
0.12046 45.12
lt.0001 concent 1 0.00617
0.00252 2.45
0.0208 lenders 1 -0.00050474
0.00031335 -1.61 0.1184
51
This accords with theory greater competition
associated with lower interest rates.
Parameter
Standard Variable DF Estimate
Error t Value Pr gt
t Intercept 1 5.43485
0.12046 45.12
lt.0001 concent 1 0.00617
0.00252 2.45
0.0208 lenders 1 -0.00050474
0.00031335 -1.61 0.1184
52
Parameter
Standard Variable DF Estimate
Error t Value Pr gt
t Intercept 1 5.43485
0.12046 45.12
lt.0001 concent 1 0.00617
0.00252 2.45
0.0208 lenders 1 -0.00050474
0.00031335 -1.61 0.1184
Equation INTEREST 5.435 0.006166 CONCENT
0.000505 LENDERS
53
Other tests
  • Modern regression procedures.
  • Obtaining a plausible model with good p values
    and high R2 might not be enough.
  • Any of the following could lead to regression
    equations being misleading.
  • Multicollinearity.
  • Two or more of the independent variables being
    highly correlated.
  • Autocorrelation.
  • Successive pairs of residuals being highly
    correlated in models that use time-series data.
  • Non-normality.
  • The errors being not normally distributed.
  • Heteroskedasticity.
  • The variance (standard deviation squared) of the
    errors being not constant.
  • These tests are outside the scope of this
    subject.
  • When problems of these sorts are identified,
    there is often a means of correcting them.

54
About logarithms
  • Base 10.
  • Log 10 1.
  • Log 100 2.
  • Log 1000 3.
  • The logarithm in each case is the power to which
    we have to raise 10 to, to get the number.
  • Log 1000000 6 means that 106 1000000.
  • Numbers that are not obvious powers of 10?
  • Log 200 2.3010.
  • The logarithm of any positive number can be
    calculated by a power series formula.
  • Natural logarithms.
  • For fairly obscure mathematical reasons, we often
    prefer to use natural logarithms.
  • These have base e (instead of 10) where e is
    Eulers number (2.718).

55
Logarithmic laws
  • These apply to all positive numbers and any base.
  • Law 1
  • Log(a b) Log(a) Log(b).
  • Law 2.
  • Log(an) n Log(a).

56
Elasticity
  • Concept
  • In economic modelling, we are often interested in
    the impact of the change in one variable on
    another
  • In percentage terms.
  • And hold other variables constant (or ceteris
    paribus).
  • Example
  • Suppose a price elasticity is ? -1.3.
  • This means that a 10 price increase leads to a
    13 decrease in consumption, ceteris paribus.
  • Calculation

57
Constant elasticity models
  • Model

The ? values are elasticities (as you could
demonstrate using calculus).
Now use the log laws.
The variables are logarithms.
The coefficients are elasticities.
58
Dependent Variable LREVENUE   Root MSE
0.1007 R-Square 0.7049 Dependent Mean 11.7259 Adj
R-Sq 0.6744 Coeff Var 0.8593  
Parameter Variable Estimate t Value
Pr gt t   Intercept 6.6578
8.62 lt.0001 LCOMP -0.3779 -5.82 lt.0001 LPOP
0.3520 6.29 lt.0001 LINCOME 0.1590
1.88 0.0700
Competitors ? 10 ? revenue ? 3.8.
Population ? 10 ? revenue ? 3.5.
Income ? 10 ? revenue ? 1.6.
Write a Comment
User Comments (0)
About PowerShow.com