Title: Regression
1Regression
- K300 Class 27
- April 21, 2009
2Overview
- Regression versus correlation
- Sample park use problem
- Correlation analysis
- Idea of regression
- Regression analysis
- Coefficient of determination
- Prediction
- SPSS analysis
3Regression versus correlation
- Correlation
- Is there a linear relationship between two
continuous variables? - What is the strength of that relationship?
- Regression
- What is the best estimate of the mathematical
form of the linear relationship?
4Simple park use problem
- You are a park planner for a city
- You hypothesize that there is a relationship
between the use made of parks and the number of
people living within 1.5 miles of the parks - You collect data on number of users per day and
populations for six parks
5Dependent and independent variables
- In regression, we refer to y as the dependent
variable - and x as the independent variable
- Based on the assumption that the value of y
depends on the value of x - or, put another way, that changes in x cause
changes in y
6Park use data
7Park use data scattergram
8Correlation analysis
- We begin by looking at the correlation
- Gives us a measure of the strength of the
relationship - Test for significance indicates whether
relationship could have or could not have
occurred by chance - No value in doing regression if no relationship,
if it could have occurred by chance
9Calculating correlation coefficient
10Correlation between users and persons
11Hypothesis test
- Step 1 state hypothesis
- H0 ? 0, H1 ? ltgt 0
- Step 2 critical value
- a 0.05, d.f. n 2 4, CV r 0.811
- Step 3 test statistic
- r 0.835
- Step 4 make decision
- Reject null hypothesis
- Step 5 summarize the results
- Relationship between users and population, could
not have occurred by chance
12Idea of regression
- Object is to find the mathematical formula for
the straight line representing the relationship - Formula for the regression equation straight
line - Need to estimate values for a and b
- a is the y intercept y value where line crosses
y axis - b is the slope change in y divided by change in x
13Regression line through data points
14Determining the regression line
- Want to find line that is closest to the data
points - Minimize the sum of squared errors
- Take the vertical distance from each point to the
line (these are the errors) - Square the errors and sum them
- Find the line the minimizes this sum
15Errors in regression
16Regression analysis
- Formulas for a (intercept) and b (slope) make use
of the same sums used in calculating the
correlation coefficient
17Calculating a, intercept
18Calculating b, slope
19Estimated regression equation
20Variation and goodness of fit
- Total variation is the variation of the values of
y around its mean - Explained variation is the variation of the
predicted values of y from the mean - Unexplained variation is the error, difference in
actual and predicted values of y
21Variation
22Calculating variation
23Coefficient of determination
- Ratio of explained variation to total
variation - Proportion of the total variation explained
(accounted for) by the regression - Coefficient of determination is correlation
coefficient r, squared
24Coefficient of determination
- Coefficient of determination provides an
alternative way of describing the strength of the
relationship associated with the estimated
regression line - Coefficient of determination r2 0.697 says that
69.7 of the variation in the dependent variable
y is accounted for by the independent variable x
25Interpreting the correlation and regression
analysis
- Significance of correlation coefficient says
there is a linear relationship between the two
variables in the population that is not likely to
have occurred in the sample by chance - If correlation were not significant, it would not
make sense to proceed with the regression analysis
26What the regression tells you about the
relationship
- Have the estimated regression equation
- Value of the slope b 8.41 tells us that for
each additional thousand population within 1.5
miles we would expect an additional 8.41 park
users
27Using the estimated regression for prediction
- The final application of regression is using the
estimated regression equation for prediction of y
for other values of x - Suppose you are planning to build a new park
- 30 thousand people live within 1.5 miles of the
new park site - How many persons are likely to use the new park
each day?
28Prediction
29SPSS analysis
- Analysis of rent versus family income
- Interactive scatterplot
- Graphs, Chart Builder, Scatter/Dot
- Add regression line in Chart Editor
- Regression analysis
- Analyze, Regression, Linear
30Scatterplot with regression line
31Regression output goodness of fit
Correlation
Coefficient of Determination
32Regression output significance of model
Significance Test of null hypothesis ? 0 or
Population R2 0
33Regression output regression coefficients
b - slope
a y intercept