Title: Statistical Analysis of Economic Relations Outline
1Statistical Analysis of Economic RelationsOutline
- Data Summary and Description
- Measures of Central Tendency
- Measures of Dispersion
- Hypothesis Testing
- Regression Analysis
- Regression Statistics
- Additional Econometric Issues
2Data Summary and Description
- Population Parameters Summary and descriptive
measures for the population. - Sample Statistics Summary and descriptive
measures for a sample. - NOTE We rarely have data for the population.
Hence we need to be able to draw inferences from
a sample.
3Measures of Central Tendency
- Mean The average
- Issue You must note the distribution of the
sample. If it is unbalanced the mean may be
misleading. - Median Middle observation
- Mode Most common value
4Symmetrical vs. Skewness
- Symmetrical A balanced distribution.
- Median Mean
- Skewness A lack of balance.
- Skewed to the left Median gt Mean
- Skewed to the right Median lt Mean
- If skewness is observed one may with to examine a
sub-sample of the data or consider a different
distribution in estimating the model.
5Measures of Dispersion
- Range Difference between the largest and
smallest sample observations. - Only considers the extremes of the sample.
- Used to identify what is possible.
- Variance and Standard Deviation
- Sample Variance Average squared deviation from
the sample mean. - Sample Standard Deviation Squared root of the
sample variance.
6Coefficient of Variation
- Coefficient of variation Standard deviation
divided by the mean. - A measure that does not rely on the size of the
observations or the unit of measurement. - This is used to compare relative dispersion
across a variety of data.
7Hypothesis Testing
- Hypothesis Testing Statistical experiment used
to measure the reasonableness of a given theory
or premise - NOTE WE DO NOT PROVE A THEORY
- Type I Error Incorrect rejection of a true
hypothesis. - Type II Error Failure to reject a false
hypothesis. - You cannot eliminate both Type I and Type II
errors.
8Testing an Hypothesis
- Steps in testing a hypothesis
- Formally state the basic premise or the null
hypothesis H0 - State the alternative hypothesis HA
- Collect data
- Analyze data with respect to H0 and HA
9Regression AnalysisDefinitions
- Regression analysis statistical method for
describing the relationship between a dependent
variable Y and independent variable(s) X. - Deterministic Relation An identity
- A relationship that is known with certainty.
- Statistical Relation An inexact relation
10Regression AnalysisTypes of Data
- Time series A daily, weekly, monthly, or annual
sequence of data. i.e. GDP data for the United
States from 1950 to 2001 - Cross-section Data from a common point in time.
i.e. GDP data for OECD nations in 1986. - Panel data Data that combines both
cross-section and time-series data. i.e. GDP data
for OECD nations from 1960 to 1992.
11Steps in Regression Analysis
- Specify the dependent and independent variable(s)
to be analyzed - Obtain reliable data.
- Estimate the model.
- Interpret the regression results.
12Specifying the Regression AnalysisThe choice of
independent variables
- Univariate analysis Simple regression model - A
regression model with only one independent
variable. - Issue Cannot impose ceteris paribus
- Multivariate analysis Multiple regression model
- A regression model with multiple independent
variables.
13The Least Squares Model
- Ordinary Least Squares a statistical method that
chooses the regression line by minimizing the
squared distance between the data points and the
regression line. - Why not sum the errors? Generally equals zero.
- Why not take the absolute value of the errors? We
wish to emphasize large errors.
14Estimating a Univariate ModelDefinitions
- Y a0 a1X e
- Y Dependent Variable
- a0 the constant term
- a1 slope coefficient
- e error term
15How does OLS work? The Slope Coefficient
- OLS selects estimates of a0 and a1 so that the
sum of squared residuals is minimized. - a1 S(Xi - mean of X) (Yi - mean of Y) /
S (Xi - mean of X)2 - Intuition ß1 equals the joint variation of X and
Y (around their means) divided by the variation
of X around its mean. Thus it measures the
portion of the variation in Y that is associated
with variations in X.
16How does OLS work?The constant term
- a0 mean of Y a1 mean of X
- a0 is defined to ensure that the regression
equation does indeed pass through the means of Y
and X. - The mean value of the error term is zero, which
will only be true if the constant term is
included. - Note The value of the constant term is often
outside the realm of what is possible. Hence the
interpretation of the constant term is often
avoided.
17The Error Term
- Error term (e) random, included because we do
not expect a perfect relationship. - Sources of error
- Omitted variables
- Measurement error
- Incorrect functional form
18Multivariate Analysis
- Introducing the idea of ceteris paribus.
- One cannot impose ceteris paribus unless all
relevant variables are included in the model. - Wins a b(ORB)
- As illustrated earlier, the estimated impact of
offensive rebounds (ORB) on wins is negative. - In other words, b lt 0
- Wins c d(Missed Shots)
- The value of d lt 0
- ORB e f(Missed Shots)
- the value of f gt 0
- In other words, missed shots and offensive
rebounds are positively related. So when we
estimate wins as a function of offensive
rebounds, we are simply picking up the
relationship between wins and missed shots.
19Regression Statistics
- Standard Error of the Estimate
- Coefficient of Determination
- Adjusted Coefficient of Determination
- The F-Statistic
- The t-statistic
20Coefficient of Determination
- Coefficient of Determination Percentage of
Y-variation explained by the regression model. - Also referred to as R2
- R2 Variation Explained by Regression
- R2 ranges from 0 to 1.
21R-squared
- R-squared Explained Sum of Squares / Total Sum
of Squares - Total sum of squares Sum of the squared
difference between the actual Y and the mean of
Y, or, - TSS ?(Yi - mean of Y)2
- Explained sum of squares Sum of the squared
differences between the predicted Y and the mean
of Y, or, - ESS ?(Y - mean of Y)2
- Residual sum of squares Sum of the squared
differences between the actual Y and the
predicted Y, or, - RSS ? e2
- R2 ESS/TSS R2 1 - RSS/TSS
22Adjusted R2
- Adding any independent variable will increase R2.
To combat this problem,we often report the
adjusted R2. - Adjusted R2 1 - RSS/(n-K-1) / TSS/(n-1)
- where n observations
- K number of coefficients
23The F-Statistic
- F-Statistic Offers evidence if explained
variation in Y is significant. - F-statistics are both provided and evaluated in
Excel.
24Judging the significance of a variable
- The t-statistic estimated coefficient / standard
deviation of the coefficient. - The t-statistic is used to test the null
hypothesis (H0) that the coefficient is equal to
zero. The alternative hypothesis (HA) is that the
coefficient is different than zero. - Rule of thumb if tgt2 we believe the coefficient
is statistically different from zero. WHY? - Understand the difference between statistical
significance and economic significance.
25Multicollinearity
- Multicollinearity - more than two independent
variables exhibit a linear correlation. - Consequences
- Standard errors will rise, t-stats will fall
- Estimates will be sensitive to changes in
specification - Overall fit of regression will be unaffected
26Other Econometric Issues
- Omitted Variable Bias You cannot impose ceteris
paribus if relevant independent variables are not
included in the model. - Small Sample Bias You cannot adequately assess a
relationship with an inadequate sample.
Remember, we are trying to learn about the
underlying population.
27More Econometric Issues
- Serial Correlation violation of the assumption
that the observations of the error terms are
uncorrelated. - Consequence Standard errors are underestimated.
- Primarily occurs in time series data.
- Heteroskedasticity violation of the assumption
that the variance of the error term is constant. - Consequence Standard errors are underestimated.
- Primarily occurs in cross-sectional data