Regression - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Regression

Description:

Reject null hypothesis. Step 5 summarize the results ... Explained variation is the variation of the predicted values of y from the mean ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 34
Provided by: johnrott
Category:

less

Transcript and Presenter's Notes

Title: Regression


1
Regression
  • K300 Class 27
  • April 21, 2009

2
Overview
  • Regression versus correlation
  • Sample park use problem
  • Correlation analysis
  • Idea of regression
  • Regression analysis
  • Coefficient of determination
  • Prediction
  • SPSS analysis

3
Regression versus correlation
  • Correlation
  • Is there a linear relationship between two
    continuous variables?
  • What is the strength of that relationship?
  • Regression
  • What is the best estimate of the mathematical
    form of the linear relationship?

4
Simple park use problem
  • You are a park planner for a city
  • You hypothesize that there is a relationship
    between the use made of parks and the number of
    people living within 1.5 miles of the parks
  • You collect data on number of users per day and
    populations for six parks

5
Dependent and independent variables
  • In regression, we refer to y as the dependent
    variable
  • and x as the independent variable
  • Based on the assumption that the value of y
    depends on the value of x
  • or, put another way, that changes in x cause
    changes in y

6
Park use data
7
Park use data scattergram
8
Correlation analysis
  • We begin by looking at the correlation
  • Gives us a measure of the strength of the
    relationship
  • Test for significance indicates whether
    relationship could have or could not have
    occurred by chance
  • No value in doing regression if no relationship,
    if it could have occurred by chance

9
Calculating correlation coefficient
10
Correlation between users and persons
11
Hypothesis test
  • Step 1 state hypothesis
  • H0 ? 0, H1 ? ltgt 0
  • Step 2 critical value
  • a 0.05, d.f. n 2 4, CV r 0.811
  • Step 3 test statistic
  • r 0.835
  • Step 4 make decision
  • Reject null hypothesis
  • Step 5 summarize the results
  • Relationship between users and population, could
    not have occurred by chance

12
Idea of regression
  • Object is to find the mathematical formula for
    the straight line representing the relationship
  • Formula for the regression equation straight
    line
  • Need to estimate values for a and b
  • a is the y intercept y value where line crosses
    y axis
  • b is the slope change in y divided by change in x

13
Regression line through data points
14
Determining the regression line
  • Want to find line that is closest to the data
    points
  • Minimize the sum of squared errors
  • Take the vertical distance from each point to the
    line (these are the errors)
  • Square the errors and sum them
  • Find the line the minimizes this sum

15
Errors in regression
16
Regression analysis
  • Formulas for a (intercept) and b (slope) make use
    of the same sums used in calculating the
    correlation coefficient

17
Calculating a, intercept
18
Calculating b, slope
19
Estimated regression equation
20
Variation and goodness of fit
  • Total variation is the variation of the values of
    y around its mean
  • Explained variation is the variation of the
    predicted values of y from the mean
  • Unexplained variation is the error, difference in
    actual and predicted values of y

21
Variation
22
Calculating variation
23
Coefficient of determination
  • Ratio of explained variation to total
    variation
  • Proportion of the total variation explained
    (accounted for) by the regression
  • Coefficient of determination is correlation
    coefficient r, squared

24
Coefficient of determination
  • Coefficient of determination provides an
    alternative way of describing the strength of the
    relationship associated with the estimated
    regression line
  • Coefficient of determination r2 0.697 says that
    69.7 of the variation in the dependent variable
    y is accounted for by the independent variable x

25
Interpreting the correlation and regression
analysis
  • Significance of correlation coefficient says
    there is a linear relationship between the two
    variables in the population that is not likely to
    have occurred in the sample by chance
  • If correlation were not significant, it would not
    make sense to proceed with the regression analysis

26
What the regression tells you about the
relationship
  • Have the estimated regression equation
  • Value of the slope b 8.41 tells us that for
    each additional thousand population within 1.5
    miles we would expect an additional 8.41 park
    users

27
Using the estimated regression for prediction
  • The final application of regression is using the
    estimated regression equation for prediction of y
    for other values of x
  • Suppose you are planning to build a new park
  • 30 thousand people live within 1.5 miles of the
    new park site
  • How many persons are likely to use the new park
    each day?

28
Prediction
29
SPSS analysis
  • Analysis of rent versus family income
  • Interactive scatterplot
  • Graphs, Chart Builder, Scatter/Dot
  • Add regression line in Chart Editor
  • Regression analysis
  • Analyze, Regression, Linear

30
Scatterplot with regression line
31
Regression output goodness of fit
Correlation
Coefficient of Determination
32
Regression output significance of model
Significance Test of null hypothesis ? 0 or
Population R2 0
33
Regression output regression coefficients
b - slope
a y intercept
Write a Comment
User Comments (0)
About PowerShow.com