Title: Applying Statistics to Litigation Consulting
1Applying Statistics to Litigation Consulting
2WARNINGPELIGRO
When Using Statistical Analysis
- Pros
- Provides a basis or sanity check
- Widely accepted methodology
- Can develop conclusions on large amounts of data
- Cons
- Complex Equations
- Filled with hidden
- assumptions
- Many areas for opposing side to attack
3Points of Discussion
- Simple Regression Analysis
- When to use
- Interpreting regression output
- Time Series Analysis
- Problems Using Regression Analysis
4Why Use Regression?
- When we are using one variable to draw a
conclusion about another variable - Make predictions of one variable using another
- Test assumptions about the relation between
variables - Quantify the strength of the relationship between
variables
5Linear Regression Equation
Where, i1, , n Yi Dependent Variable
b0 Intercept b1 Slope Coefficient Xi
Independent Variable ei Error term
6Linear Regression Equation (cont.)
- Linear regression assumes a linear relationship
between the dependent and independent variables - Linear regression computes a line that best fits
the observations it chooses values for the
slope, b0, and intercept, b1, that minimize the
sum of the squared vertical distances between the
observations and the regression line
7Sample Regression Output
8Assumptions of The Linear Regression
- A linear relation exists between the dependent
variable and the independent variable - The independent variable is not random
- The expected value of the error term is 0
- The variance of the error term is the same for
all observations - The error term is uncorrelated across
observations - The error term is normally distributed
9What did you just say those assumptions were?
- Assumption 1
- If the independent and dependent variables DO NOT
have a linear relation, then estimating that
relation with a regression model will be INVALID - Assumptions 2 3
- Needed to ensure that the linear regression
produces the correct estimates of b0 b1 - Assumptions 4
- Is also known as the HOMOSKEDASTICITY assumption
or having equal variances - Assumptions 5
- Necessary for correctly estimating the variances
of the estimated parameters - Assumptions 6
- Allows us to easily test a particular hypothesis
about a linear regression model
10Sample Regression Output
11Coefficients
- Coefficients correspond to the bs in a standard
linear regression model - Y b0 b1X1 ei
- In our example, b1 422.09
- This means for every time period that passes, in
this case quarters, revenues will increase by
422.09 - The standard error measures the precision of the
coefficients as an estimate of the model
parameter
12Sample Regression Output
y 13950.75 422.09x
13Sample Regression Output
14R-Squared (R2) or Coefficient of Determination
- Goodness of fit for the regression model
- Measures the proportion or of the total
variation in the dependent variable explained by
the model
15R-Squared (R2) or Coefficient of Determination
Where, RSS Regression Sum of Squares TSS
Total Sum of Squares
16Sample Regression Output
17Caveats for Using R2
- A high R2 does not imply causality
- Just because your R2 is high does not mean your
regression is reliable - What does it mean? It means you must look at
other factors along with your R2
18Standard Error Estimate
- The Standard Error Estimate (SEE) or Standard
Error of the Regression measures the standard
deviation of the error estimate - SEE computes the difference between the actual
and predicted values for each dependent variable
observation
19Standard Error Equation
20Standard Error Equation (cont.)
Where, n observations SSR Sum of
Squared Residuals
21Sample Regression Output
22Testing t-Statistic
- A t-test is used to test the significance of
individual estimated coefficients - i.e. it tests whether a single regression
coefficient is significantly different from zero
23Testing t-Statistic (cont)
- The critical t-value is obtained by using a
t-distribution table and applying an error rate
and the correct df (df n-2 in simple
regression) - The t-stat in our example is 8.81 which is
greater than the critical value of 2.145
t - Distribution
a 1 - .95 .05
Critical Region at a/2 .025
0
Critical t -2.145
Critical t 2.145
24Sample Regression Output
25Confidence Intervals
- A confidence interval is an interval that we
believe includes the true parameter value, b1,
with a given degree of confidence - To compute a confidence interval, we must
- Select the significance level for the test
- Know the standard error of the estimated
coefficient
26Confidence Interval Equation
Where, CI Confidence Interval b1
Coefficient of the Independent Variable tc
Critical t-value Sb1 Standard Error of the
Coefficient
27Sample Regression Output
Look in Students- t table
28Types of Data Used
- Cross-sectional Data
- Observations of the independent and dependent
variables for the same time period - Times-series Data
- Observations of the independent and dependent
variables over time
29Time-Series Analysis
- A Time-series is a set of values of a particular
variable in different time periods (in this case
time is the independent variable) - Can be used in forecasting by estimating a linear
trend in a time-series and using that trend to
predict future values
30Linear Trend
- The simplest type of trend is a linear trend
which is expressed in the following equation.
Were regressing time (independent) with the
desired variable (dependent)
31Seasonality
- Seasonality is shows a regular pattern of
movements in a given time period - If significant seasonality exists, we can correct
this by adding a seasonal lag
32Potential Problems
- Heteroskedasticity
- Variance of the errors differs across
observations - Causes relationships to exist when they really do
not - Causes incorrect standard errors
33Sample Regression Output
Linear Pattern
Linear Pattern
No Pattern
Exponential Pattern
Parabolic Pattern
34Potential Problems (cont.)
- Serial Correlation (a.k.a. Autocorrelation)
- Regression errors are correlated across
observations - Causes incorrect standard errors
- Parameters are accurate as long as none of the
independent variables is a lagged value of the
dependent variable
35Detecting Problems
- Heteroskedasticity
- Breusch and Pagan Test
- Serial Correlation
- Durbin-Watson Test Statistic
- EXCEL DOES NOT PERFORM
- EITHER TEST!!!
36Caveats Using Regression Analysis
- Your estimate is only relevant for the time
period you are looking at, but it may be the best
estimator you have for the future - Excel works great if you want a quick dirty
regression, but has many limitations, for example
Excel does not contain tools to test for
heteroskedasticity and autocorrelation
37Questions
38Sources
- http//www.ats.ucla.edu/stat/
- http//www.surveysystem.com
- Basic Econometrics, Gujarati, Damodar N., McGraw
Hill, 1995. - The Basic Practice of Statistics, Moore, Dennis
S., W.H. Freeman and Company, 1995. - Quantitative Methods for Investment Analysis,
Defusco, McLeavey, Pinto, Runkle, Association for
Investment Management and Research, 2001