LINEAR REGRESSION - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

LINEAR REGRESSION

Description:

Find a mathematical equation that can relate a dependent and independent ... Personals directors explore the relationships of employee salary levels to ... – PowerPoint PPT presentation

Number of Views:299
Avg rating:3.0/5.0
Slides: 50
Provided by: notesU
Category:

less

Transcript and Presenter's Notes

Title: LINEAR REGRESSION


1
LINEAR REGRESSION CORRELATION
  • CHAPTER 6
  • BCT 2053
  • Siti Zanariah Satari, FIST/FSKKP, 2009

2
CONTENT
  • 6.1 Simple Linear Regression Analysis and
  • Correlation
  • 6.2 Relationship Test and Prediction
  • in Simple Linear Regression Analysis
  • 6.3 Multiple Linear Regression Analysis and
  • Correlation
  • 6.4 Model Selection

3
6.1 Simple Linear Regression Analysis and
Correlation
  • OBJECTIVE
  • Find a mathematical equation that can relate a
    dependent and independent variables x and y.
  • Plot a scatter diagram and graph the regression
    line
  • Calculate the strength of the linear relationship
    between x and y by correlation coefficient .

4
INTRODUCTORY CONCEPTS
  • Suppose you wish to investigate the relationship
    between a dependent variable (y) and independent
    variable (x)
  • Independent variable (x) the variables has been
    controlled
  • Dependent variable (y) the response variables
  • In other word, the value of y depends on the
    value of x.

5
Example A
Suppose you wish to investigate the relationship
between the numbers of hours students spent
studying for an examination and the mark they
achieved.
Numbers of hours students spent studying for an
examination ( x Independent variable )
the mark (y) they achieved. ( y Dependent
variable )
will cause
6
Other Examples
  • The weight at the end of a spring (x) and the
    length of the spring (y)
  • A students mark in Statistics test (x) and the
    mark in a Programming test (y)
  • The diameter of the stem of a plant (x) and the
    average length of leaf of the plant (y)

7
SCATTER DIAGRAM
  • When pairs of values are plotted, a scatter
    diagram is produced
  • To see how the data looks like and relate with
    each other
  • Exercise Plot a scatter diagram for Example A

8
LINEAR CORRELATION AND SIMPLE LINEAR LINE
  • Linear correlation
  • If the points on the scatter diagram appear to
    lie near a straight line ( Simple regression line
    )
  • Or you would say that there is a linear
    correlation between x and y
  • Exercise From the scatter diagram for Example A,
    is there any correlation between x and y?

9
Positive Linear Correlation
10
Negative Linear Correlation
11
No Correlation
No relationship between x and y
12
INFERENCES IN CORRELATION
  • The product moment correlation coefficient, r, is
    a numerical value between -1 and 1 inclusive
    which indicates the linear degree of scatter.
  • r 1 indicates perfect positive linear
    correlation
  • r -1 indicates perfect negative linear
    correlation
  • r 0 indicates no correlation

13
INFERENCES IN CORRELATION
  • The nearer the value of r is to 1 or -1, the
    closer the points on the scatter diagram are to
    the regression line
  • Nearer to 1 is strong positive linear correlation
  • Nearer to -1 is strong negative linear
    correlation
  • Exercise Calculate the correlation coefficient
    r for Example A and interpret the result.

14
THE LEAST SQUARE REGRESSION LINE
  • a mathematical way of fitting the regression line
  • The line of best fit must pass through the means
    of both sets of data, i.e. the point

15
Least square regression line of y on x
  • Exercise Find and draw the regression
    line for Example A,

16
6.2 Relationship Test and Prediction in Simple
Linear Regression Analysis
  • OBJECTIVE
  • Test the significance of regression slope.
  • Predict and estimate the new y value from the
    regression equation.

17
RELATIONSHIP TESTS AND PREDICTION IN SIMPLE
LINEAR REGRESSION ANALYSIS
  • HYPOTHESIS TESTING FOR THE SLOPE OF
    REGRESSION LINE
  • ESTIMATION AND PREDICTION

18
1 HYPOTHESIS TESTING FOR THE SLOPE OF
REGRESSION LINE
  • To test the linear relationship between x and y
  • x and y have a linear relationship if the slope
  • Test the hypothesis,

with statistic test.
where
  • If Ho is reject, x and y have a linear
    relationship
  • Exercise Test the linearity between x and y
    for Example A at

19
2. ESTIMATION AND PREDICTION
  • When x is the independent variable and you want
    to
  • estimate y for a given value of x
  • estimate x for a given value of y.
  • When neither variable is controlled and you want
    to estimate y for a given value of x
  • The regression line y on x is used to make
    prediction when there is a linear correlation
    between x and y.

20
Guideline for using regression equation
  • If there is no linear correlation, dont use the
    regression equation to make prediction
  • When using the regression equation for
    predictions, stay within the scope of the
    available sample data
  • A regression equation based on old data is not
    necessarily now.
  • Dont make predictions about a population that is
    different from the population from which the
    sample data were drawn.

21
Exercise
  • Use Example A to find
  • the estimate of y when x 10 hours
  • the estimate of x when y 75 marks

22
EXAMPLE B
  • A study is done to see whether there is a
    relationship between a mothers age and the
    number of children she has. The data are shown
    here.
  • Plot a scatter diagram to illustrate the data.
  • Compute the value of the correlation coefficient,
    r and comment on the relationship between the
    value of r and the scatter plot.
  • Find the equation of the regression line of y on
    x. Then predict the number of children of a
    mother whose age is 34.
  • Test the linearity between x and y when a 0.05.

23
SOLVE SIMPLE LINEAR REGRESSION BY EXCEL
  • Excel key in data
  • Tools Data Analysis Regression enter the
    data range (y x) ok

24
Computer Output - Excel
Strong Linear positive correlation
x and y have linear relationship ( P-value 25
6.3 Multiple Linear Regression Analysis and
Correlation
  • OBJECTIVE
  • To describe linear relationships involving more
    than two variables.
  • Interpret the computer output for multiple linear
    regression analysis and make prediction.

26
MULTIPLE LINEAR REGRESSION EQUATION
  • A multiple regression equation is use to describe
    linear relationships involving more than two
    variables.
  • A multiple linear regression equation expresses a
    linear relationship between a response variable y
    and two or more predictor variable (x1, x2,,xk).
    The general form of a multiple regression
    equation is
  • A multiple linear regression equation identify
    the plane that gives the best fit to the data

27
Notation Multiple regression equation
28
Examples of real situation
  • A manufacturer of jams wants to know where to
    direct its marketing efforts when introducing a
    new flavour. Regression analysis can be used to
    help determine the profile of heavy users of
    jams. For instance, a company might predict the
    number of flavours of jam a household might have
    at any one time on the basis of a number of
    independent variables such as, number of children
    living at home, age of children, gender of
    children, income and time spent on shopping.

29
Examples of real situation
  • Many companies use regression to study markets
    segments to determine which variables seem to
    have an impact on market share, purchase
    frequency, product ownerships, and product
    brand loyalty, as well as many other areas.

30
Examples of real situation
  • Personals directors explore the relationships of
    employee salary levels to geographic location,
    unemployment rates, industry growth, union
    membership, industry type, or competitive
    salaries.
  • Financial analysts look for causes of high stock
    prices by analysing dividend yields, earning per
    share, stock splits, consumer expectation of
    interest rates, savings levels and inflation
    rates.

31
Examples of real situation
  • Medical researchers use regression analysis to
    seek links between blood pressure and independent
    variables such as age, social class, weight,
    smoking habits and race.
  • Doctors explore the impact of communications,
    number of contacts, and age of patient on patient
    satisfaction with service.

32
Computing the Multiple Linear Regression Equation
  • By using the least square method, the multiple
    linear regression equation is given by
  • Where the estimated regression coefficients

33
EXAMPLE C
  • Assume that, a sales manager of Tackey Toys,
    needs to predict sales of Tackey products in
    selected market area. He believes that
    advertising expenditures and the population in
    each market area can be used to predict sales. He
    gathered sample of toy sales, advertising
    expenditures and the population as below. Find
    the linear multiple regression equation which the
    best fit to the data.

34
Example, cont
35
Solution
  • Since we have 2 independent variables, so the
    multiple regression equation is given by

36
SOLVE MULTIPLE LINEAR REGRESSION BY EXCEL
  • Excel key in data
  • Tools Data Analysis Regression enter the
    data range (y x) ok

37
Computer Output Microsoft Excel
38
Interpreting the Values in the Equation
  • b0 6.3972
  • The value of estimated y when x1 and x2 are both
    zero.
  • b1 20.4921
  • When the population in thousands is constant then
    the estimated toy sales increases by 20.4921
    thousands dollars for each 1000 dollars of
    advertising expenditures.
  • b2 0.2805
  • When the advertising expenditures in thousands
    dollars is constant then the estimated toy sales
    increases by 0.2805 thousands dollars for each
    1000 people in the population.

39
Making preliminary predictions with the multiple
regression equation
  • Assume that the sales manager needs a sales
    forecast for a market area Tackey Toys has
    recently spend 4,000 advertising in this market,
    which has a population of 500,000 people. So the
    point estimate of toy sales is given by

40
The Coefficient of Multiple Determinations
  • Measure the percentage of variation in the y
    variable associated with the use of the set x
    variables
  • A percentage that shows the variation in the y
    variable thats explain by its relation to the
    combination of x1 and x2.

.
where
and

when all
when all observations fall directly on the fitted
response surface, i.e. when (the regression
equation is good)
for all t.
41
Computer Output Microsoft Excel
97.4 of Tackey Toy sales in the market area is
explained by advertising expenditures and
population size
42
Multiple R and Adjusted R²
The coefficient of multiple correlations R is the
positive square root of R².
The adjusted coefficient of determination is the
multiple coefficient of determination R² modified
to account for the number of variables and the
sample size.
When we compare a multiple regression equation to
others, it is better to use the adjusted R².
43
Standard error and ANOVA test
  • Standard error
  • Measure the extent of the scatter, or dispersion,
    of the sample data points about the multiple
    regression plane.
  • ANOVA test
  • H0 neither of the independent variables is
    related to the dependent variables (b1 b2 0)
  • H1 At least one of the independent variables is
    related to the dependent variables (b1 or b2 or
    both ? 0)
  • Reject H0 if significance F

44
Computer Output Microsoft Excel
97.4 of Tackey Toy sales in the market area is
explained by advertising expenditures and
population size
45
6.4 Model Selection
  • OBJECTIVE
  • Select the best multiple linear regression model
    for any given data set.

46
TIPS Model Selection in simple way
  • Use common sense and practical considerations to
    include or exclude variables.
  • Consider the P-value (the measure of the overall
    significance of multiple regression
    equation-significance F value) displayed by
    computer output. The smaller the better.
  • Consider equation with high values of adjusted R²
    and try include only a few variables.
  • Find the linear correlation coefficient r for
    each pair of variables being considered. If 2
    predictor values have a very high r, there is no
    need to include them both. Exclude the variable
    with the lower value of r.

47
EXAMPLE D
  • The following table summarize the multiple
    regression analysis for the response variable (y)
    is weight (in pounds), and the predictor (x)
    variables are H ( height in inches), W (waist
    circumference in cm), and C (cholesterol in mg)

48
Example D, cont
  • If only one predictor variable is used to
    predict weight, which single variable is best?
    Why?
  • If exactly two predictor variables are used to
    predict weight, which two variables should be
    chosen? Why?
  • Which regression equation is best for predicting
    weight? Why?

49
CONCLUSION
  • This chapter introduces important methods
    (regression) for making inferences about a
    relationship between two or more variables and
    describing such a relationship with an equation
    that can be used for predicting value of one
    variable given the value of the other variable.

Thank You
Write a Comment
User Comments (0)
About PowerShow.com