Linear regression - PowerPoint PPT Presentation

About This Presentation
Title:

Linear regression

Description:

Linear regression J.-F. P ris University of Houston Introduction Special case of regression analysis Regression Analysis Models the relationship between Values of a ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 19
Provided by: Jehan90
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: Linear regression


1
Linear regression
  • J.-F. Pâris
  • University of Houston

2
Introduction
  • Special case of regression analysis

3
Regression Analysis
  • Models the relationship between
  • Values of a dependent variable (also called a
    response variable)
  • Values of one or more independent variables
  • Main outcome is a function
  • y f(x1, , xn)

4
Linear regression
  • Studies linear dependencies
  • y ax b
  • And more
  • y ax2 bx c
  • Is linear in a and b
  • Uses Least-Square Method
  • Assumes that departures from ideal line are to be
    random noise

5
Basic Assumptions (I)
  • Sample is representative of the whole population
  • The error is assumed to be a random variable with
    a mean of zero conditional on the independent
    variables.
  • Independent variables are error-free and linearly
    independent.
  • Errors are uncorrelated

6
Basic Assumptions (II)
  • The variance of the error is constant across
    observations
  • For very small samples, the errors must be
    Gaussian
  • Does not apply to large samples (? 30)

7
General Formulation
  • n samples of the dependent variable
  • y1, y2, , yn
  • n samples of each of the p dependent variables
  • x11, x12, , x1n
  • x21, x22, , x2n
  • xp1, xp2, , xpn

8
Objective
  • Finding
  • Y b0 b1X1 b2X2 b2Xp
  • Minimizing the sum of squares of the deviations
  • Si (yi - b0 - b1x1i - b2x2i - - bpxpi)2

9
Why the sum of squares
  • It favors big deviations
  • Less likely to result from random noise than
    large variations
  • Our objective is to estimate the function linking
    the dependent variable to the independent
    variable assuming that the experimental points
    represent random variations

10
Simplest case (I)
  • One independent variable
  • We must find
  • Y a bX
  • Minimizing the sum of squares of errors Si (yi -
    a - bxi)2

11
Simplest case (II)
  • Derive the previous expression with respect to
    the parameters a and b
  • Si -2a(yi - a - bxi) or na Si xi b Si yi
  • Si 2 xi(yi - a - bxi) or Si xi a Si xi2 b
    Si xi yi

12
Simplest case (III)
  • We obtain
  • The second expression can be rewritten

13
More notations
14
Simplest case (IV)
  • Solution can be rewritten

15
Coefficient of correlation
  • r 1 would indicate a perfect fit
  • r 0 would indicate no linear dependency

16
More complex case (I)
  • Use matrix formulation Y Xb ewhere Y is a
    column vector and X is

17
More complex case (II)
  • Solution to the problem is
  • b (XTX)-1XTy

18
Non-linear dependencies
  • Can use polynomial model
  • Y b0 b1X b2X2 b2Xp
  • Or do a logarithmic transform
  • Replace y Keatby log y K at
Write a Comment
User Comments (0)
About PowerShow.com