Simple linear regression Review, 18'5, 18'8 - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Simple linear regression Review, 18'5, 18'8

Description:

Lecture 19. Simple linear regression (Review, 18.5, 18.8) ... of the data will lie within one of the LS line. 95% of the data will lie within two of the LS ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 29
Provided by: dsma3
Category:

less

Transcript and Presenter's Notes

Title: Simple linear regression Review, 18'5, 18'8


1
Lecture 19
  • Simple linear regression (Review, 18.5, 18.8)
  • Midterm 2 Wed, April 2, 6-8pm
  • Extra office hours Tue, 2pm-5pm
  • Regular office hours Mon 12-1pm, Tue 1-2pm
  • Exam Review

2
Review of Regression Analysis
  • Goal Estimate E(YX) the regression function
  • Uses
  • E(YX) is a good prediction of Y based on X
  • E(YX) describes the relationship between Y and X
  • Simple linear regression model E(YX) is a
    straight line (the regression line)

3
Simple Linear Regression Model
  • The data are assumed
    to be a realization of
  • are the unknown parameters of the
    model. Objective of regression is to estimate
    them.
  • , the slope, is the amount that Y changes on
    average for each one unit increase in X.
  • , the standard error of estimate, is the
    standard deviation of the amount by which Y
    differs from E(YX), i.e., standard deviation of
    the errors

4
Estimation of Regression Line
  • We estimate the regression line by
    the least squares line , the line
    that minimizes the sum of squared prediction
    errors for the data.

5
Fitted Values and Residuals
  • The least squares line decomposes the data into
    two parts where
  • are called the fitted or predicted
    values.
  • are called the residuals.
  • The residuals are estimates of
    the errors

6
Estimating
  • The standard error of estimate (root mean
    squared error) is an estimate of
  • The standard error of estimate is basically
    the standard deviation of the residuals.
  • measures how useful the simple linear
    regression model is for prediction
  • If the simple regression model holds, then
    approximately
  • 68 of the data will lie within one of the
    LS line.
  • 95 of the data will lie within two of the
    LS line.

7
Example 18.2 in JMP
8
SEs of Parameter Estimates
  • From the JMP output,
  • Imagine yourself taking repeated samples of the
    prices of cars with the odometer readings
    from the population.
  • For each sample, you could estimate the
    regression line by least squares. Each time, the
    least squares line would be a little different.
  • The standard errors estimate how much the least
    squares estimates of the slope and intercept
    would vary over these repeated samples.

9
Cause-and-effect Relationships
  • A test of whether the slope is zero is a test of
    whether there is a linear relationship between x
    and y in the observed data, i.e., is a change in
    x associated with a change in y.
  • This does not test whether a change in x causes a
    change in y. Such a relationship can only be
    established based on a carefully controlled
    experiment or extensive subject matter knowledge
    about the relationship.

10
Example of Pitfall
  • A researcher measures the number of television
    sets per person X and the average life expectancy
    Y for the worlds nations. The regression line
    has a positive slope nations with many TV sets
    have higher life expectancies. Could we lengthen
    the lives of people in Rwanda by shipping them TV
    sets?

11
Using the Regression Equation
  • Before using the regression model, we need to
    assess how well it fits the data.
  • If we are satisfied with how well the model fits
    the data, we can use it to predict the values of
    y.
  • To make a prediction we use
  • Point prediction, and
  • Interval prediction

12
Point Prediction
  • Example 18.7
  • Predict the selling price of a three-year-old
    Taurus with 40,000 miles on the odometer (Example
    18.2).
  • It is predicted that a 40,000 miles car would
    sell for 14,575.
  • How close is this prediction to the real price?

13
Interval Estimates
  • Two intervals can be used for differing purposes
  • Prediction interval predicts y for a given
    value of x,
  • Confidence interval estimates the average y for
    a given x.
  • Predicts y at x
    Predicts the mean of y at x
  • y x
    E(y x)

14
Interval Estimates,Example
  • Example 18.7 - continued
  • Provide an interval estimate for the bidding
    price on a Ford Taurus with 40,000 miles on the
    odometer.
  • Two types of predictions are required
  • A prediction for a specific car
  • An estimate for the average price per car

15
Prediction vs. Confidence Intervals
  • The prediction interval attempts to cover future
    observations at given value x with probability
    0.95 (e.g.).
  • The confidence interval attempts to cover means
    of observations at a given value x with
    probability 0.95 (e.g.). The means should be
    thought of as arising in potential alternative
    studies whose data were collected the same way as
    in our study.

16
Interval Estimates,Example
  • Solution
  • A prediction interval provides the price estimate
    for a single car let xg40,000 miles

t.025,98 Approximately
17
Interval Estimates,Example
  • Solution continued
  • A confidence interval provides the estimate of
    the mean price per car for a Ford Taurus with
    40,000 miles reading on the odometer.
  • The confidence interval (95)

18
Regression Diagnostics
  • The three conditions required for the validity of
    the regression analysis are
  • the error variable is normally distributed.
  • the error variance is constant for all values of
    x.
  • The errors are independent of each other.

19
Outliers
  • An outlier is an observation that is unusually
    small or large.
  • Several possibilities need to be investigated
    when an outlier is observed
  • There was an error in recording the value.
  • The point does not belong in the sample.
  • The observation is valid.
  • Identify outliers from the scatter diagram.
  • It is customary to suspect an observation is an
    outlier if its standard residual gt 2

20
Leverage and Influential Points
  • An observation has high leverage if it is an
    outlier in the x direction.
  • An observation is influential if removing it
    would markedly change the least squares line.
  • Observations that have high leverage are
    influential if they do not fall very close to the
    least squares line for the other points.

21
18.8 Coefficient of Correlation
  • The coefficient of correlation is used to measure
    the strength of association between two
    variables.
  • The coefficient values range between -1 and 1.
  • If r -1 (negative association) or r 1
    (positive association) every point falls on the
    regression line.
  • If r 0 there is no linear pattern.
  • The coefficient can be used to test for linear
    relationship between two variables.

22
Testing the coefficient of correlation
  • To test the coefficient of correlation for linear
    relationship between X and Y
  • X and Y must be observational
  • X and Y are bivariate normally distributed

23
Testing the coefficient of correlation
  • When no linear relationship exist between the two
    variables, r 0.
  • The hypotheses are
  • H0 r 0H1 r ¹ 0
  • The test statistic is

The statistic is Student t distributed with d.f.
n - 2, provided the variables are bivariate
normally distributed.
24
Transformations
  • Suppose that the residual plot indicates
    curvature in the regression function. What do we
    do?
  • One possibility Transform x or transform y.
  • Check handout 2

25
Transformation for display.jmp
  • YSales, XDisplay Feet
  • YSales, XSquare Root of Display Feet/Log of
    Display Feet

26
Predictions with Transformations
  • Linear Fit
  • Sales -46.28718 154.90188 Square Root
    DisplayFeet
  • For 5 display feet, the average amount of sales
    is

27
Guidelines for exam
  • Preparation for exams
  • Work on lectures
  • The book (remember you are required to have one
    the red thing)
  • Work on assignments
  • Lastly, work on the practice exams (without
    looking at the solutions)
  • Comprehension questions will be similar to the
    homework.

28
Topics for Exam 2
  • 13.5 Inference for ratio of two variances
  • 13.6 Inference for difference between two
    proportions
  • Chapter 15
  • One-way ANOVA
  • Multiple Comparisons
  • Randomized Blocks
  • Two-way ANOVA (Interactions IMPORTANT)
  • Chapter 18
  • Simple Linear regression (Estimation and Testing)
  • Regression Diagnostics
  • Point and Interval Prediction
  • Assessing the model
  • Finance Application
Write a Comment
User Comments (0)
About PowerShow.com