Title: LINEAR REGRESSION
1LINEAR REGRESSION CORRELATION
- CHAPTER 6
- BCT 2053
- Siti Zanariah Satari, FIST/FSKKP, 2009
2CONTENT
- 6.1 Simple Linear Regression Analysis and
- Correlation
- 6.2 Relationship Test and Prediction
- in Simple Linear Regression Analysis
- 6.3 Multiple Linear Regression Analysis and
- Correlation
- 6.4 Model Selection
36.1 Simple Linear Regression Analysis and
Correlation
- OBJECTIVE
- Find a mathematical equation that can relate a
dependent and independent variables x and y. - Plot a scatter diagram and graph the regression
line - Calculate the strength of the linear relationship
between x and y by correlation coefficient .
4INTRODUCTORY CONCEPTS
- Suppose you wish to investigate the relationship
between a dependent variable (y) and independent
variable (x) - Independent variable (x) the variables has been
controlled - Dependent variable (y) the response variables
- In other word, the value of y depends on the
value of x.
5Example A
Suppose you wish to investigate the relationship
between the numbers of hours students spent
studying for an examination and the mark they
achieved.
Numbers of hours students spent studying for an
examination ( x Independent variable )
the mark (y) they achieved. ( y Dependent
variable )
will cause
6Other Examples
- The weight at the end of a spring (x) and the
length of the spring (y) - A students mark in Statistics test (x) and the
mark in a Programming test (y) - The diameter of the stem of a plant (x) and the
average length of leaf of the plant (y)
7SCATTER DIAGRAM
- When pairs of values are plotted, a scatter
diagram is produced - To see how the data looks like and relate with
each other - Exercise Plot a scatter diagram for Example A
8LINEAR CORRELATION AND SIMPLE LINEAR LINE
- Linear correlation
- If the points on the scatter diagram appear to
lie near a straight line ( Simple regression line
) - Or you would say that there is a linear
correlation between x and y - Exercise From the scatter diagram for Example A,
is there any correlation between x and y?
9Positive Linear Correlation
10Negative Linear Correlation
11No Correlation
No relationship between x and y
12INFERENCES IN CORRELATION
- The product moment correlation coefficient, r, is
a numerical value between -1 and 1 inclusive
which indicates the linear degree of scatter. - r 1 indicates perfect positive linear
correlation - r -1 indicates perfect negative linear
correlation - r 0 indicates no correlation
13INFERENCES IN CORRELATION
- The nearer the value of r is to 1 or -1, the
closer the points on the scatter diagram are to
the regression line - Nearer to 1 is strong positive linear correlation
- Nearer to -1 is strong negative linear
correlation - Exercise Calculate the correlation coefficient
r for Example A and interpret the result.
14THE LEAST SQUARE REGRESSION LINE
- a mathematical way of fitting the regression line
- The line of best fit must pass through the means
of both sets of data, i.e. the point
15Least square regression line of y on x
- Exercise Find and draw the regression
line for Example A,
166.2 Relationship Test and Prediction in Simple
Linear Regression Analysis
- OBJECTIVE
- Test the significance of regression slope.
- Predict and estimate the new y value from the
regression equation.
17RELATIONSHIP TESTS AND PREDICTION IN SIMPLE
LINEAR REGRESSION ANALYSIS
- HYPOTHESIS TESTING FOR THE SLOPE OF
REGRESSION LINE - ESTIMATION AND PREDICTION
181 HYPOTHESIS TESTING FOR THE SLOPE OF
REGRESSION LINE
- To test the linear relationship between x and y
- x and y have a linear relationship if the slope
with statistic test.
where
- If Ho is reject, x and y have a linear
relationship
- Exercise Test the linearity between x and y
for Example A at
192. ESTIMATION AND PREDICTION
- When x is the independent variable and you want
to - estimate y for a given value of x
- estimate x for a given value of y.
- When neither variable is controlled and you want
to estimate y for a given value of x - The regression line y on x is used to make
prediction when there is a linear correlation
between x and y.
20Guideline for using regression equation
- If there is no linear correlation, dont use the
regression equation to make prediction - When using the regression equation for
predictions, stay within the scope of the
available sample data - A regression equation based on old data is not
necessarily now. - Dont make predictions about a population that is
different from the population from which the
sample data were drawn.
21Exercise
- Use Example A to find
- the estimate of y when x 10 hours
- the estimate of x when y 75 marks
22EXAMPLE B
- A study is done to see whether there is a
relationship between a mothers age and the
number of children she has. The data are shown
here. - Plot a scatter diagram to illustrate the data.
- Compute the value of the correlation coefficient,
r and comment on the relationship between the
value of r and the scatter plot.
- Find the equation of the regression line of y on
x. Then predict the number of children of a
mother whose age is 34. - Test the linearity between x and y when a 0.05.
23SOLVE SIMPLE LINEAR REGRESSION BY EXCEL
- Excel key in data
- Tools Data Analysis Regression enter the
data range (y x) ok
24Computer Output - Excel
Strong Linear positive correlation
x and y have linear relationship ( P-value
256.3 Multiple Linear Regression Analysis and
Correlation
- OBJECTIVE
- To describe linear relationships involving more
than two variables. - Interpret the computer output for multiple linear
regression analysis and make prediction.
26MULTIPLE LINEAR REGRESSION EQUATION
- A multiple regression equation is use to describe
linear relationships involving more than two
variables. - A multiple linear regression equation expresses a
linear relationship between a response variable y
and two or more predictor variable (x1, x2,,xk).
The general form of a multiple regression
equation is - A multiple linear regression equation identify
the plane that gives the best fit to the data
27Notation Multiple regression equation
28Examples of real situation
- A manufacturer of jams wants to know where to
direct its marketing efforts when introducing a
new flavour. Regression analysis can be used to
help determine the profile of heavy users of
jams. For instance, a company might predict the
number of flavours of jam a household might have
at any one time on the basis of a number of
independent variables such as, number of children
living at home, age of children, gender of
children, income and time spent on shopping.
29Examples of real situation
- Many companies use regression to study markets
segments to determine which variables seem to
have an impact on market share, purchase
frequency, product ownerships, and product
brand loyalty, as well as many other areas.
30Examples of real situation
- Personals directors explore the relationships of
employee salary levels to geographic location,
unemployment rates, industry growth, union
membership, industry type, or competitive
salaries. - Financial analysts look for causes of high stock
prices by analysing dividend yields, earning per
share, stock splits, consumer expectation of
interest rates, savings levels and inflation
rates.
31Examples of real situation
- Medical researchers use regression analysis to
seek links between blood pressure and independent
variables such as age, social class, weight,
smoking habits and race. - Doctors explore the impact of communications,
number of contacts, and age of patient on patient
satisfaction with service.
32Computing the Multiple Linear Regression Equation
- By using the least square method, the multiple
linear regression equation is given by - Where the estimated regression coefficients
33EXAMPLE C
- Assume that, a sales manager of Tackey Toys,
needs to predict sales of Tackey products in
selected market area. He believes that
advertising expenditures and the population in
each market area can be used to predict sales. He
gathered sample of toy sales, advertising
expenditures and the population as below. Find
the linear multiple regression equation which the
best fit to the data.
34Example, cont
35Solution
- Since we have 2 independent variables, so the
multiple regression equation is given by
36SOLVE MULTIPLE LINEAR REGRESSION BY EXCEL
- Excel key in data
- Tools Data Analysis Regression enter the
data range (y x) ok
37Computer Output Microsoft Excel
38Interpreting the Values in the Equation
- b0 6.3972
- The value of estimated y when x1 and x2 are both
zero. - b1 20.4921
- When the population in thousands is constant then
the estimated toy sales increases by 20.4921
thousands dollars for each 1000 dollars of
advertising expenditures. - b2 0.2805
- When the advertising expenditures in thousands
dollars is constant then the estimated toy sales
increases by 0.2805 thousands dollars for each
1000 people in the population.
39Making preliminary predictions with the multiple
regression equation
- Assume that the sales manager needs a sales
forecast for a market area Tackey Toys has
recently spend 4,000 advertising in this market,
which has a population of 500,000 people. So the
point estimate of toy sales is given by
40The Coefficient of Multiple Determinations
- Measure the percentage of variation in the y
variable associated with the use of the set x
variables - A percentage that shows the variation in the y
variable thats explain by its relation to the
combination of x1 and x2.
.
where
and
when all
when all observations fall directly on the fitted
response surface, i.e. when (the regression
equation is good)
for all t.
41Computer Output Microsoft Excel
97.4 of Tackey Toy sales in the market area is
explained by advertising expenditures and
population size
42Multiple R and Adjusted R²
The coefficient of multiple correlations R is the
positive square root of R².
The adjusted coefficient of determination is the
multiple coefficient of determination R² modified
to account for the number of variables and the
sample size.
When we compare a multiple regression equation to
others, it is better to use the adjusted R².
43Standard error and ANOVA test
- Standard error
- Measure the extent of the scatter, or dispersion,
of the sample data points about the multiple
regression plane. - ANOVA test
- H0 neither of the independent variables is
related to the dependent variables (b1 b2 0) - H1 At least one of the independent variables is
related to the dependent variables (b1 or b2 or
both ? 0) - Reject H0 if significance F
44Computer Output Microsoft Excel
97.4 of Tackey Toy sales in the market area is
explained by advertising expenditures and
population size
456.4 Model Selection
- OBJECTIVE
- Select the best multiple linear regression model
for any given data set.
46TIPS Model Selection in simple way
- Use common sense and practical considerations to
include or exclude variables. - Consider the P-value (the measure of the overall
significance of multiple regression
equation-significance F value) displayed by
computer output. The smaller the better. - Consider equation with high values of adjusted R²
and try include only a few variables. - Find the linear correlation coefficient r for
each pair of variables being considered. If 2
predictor values have a very high r, there is no
need to include them both. Exclude the variable
with the lower value of r.
47EXAMPLE D
- The following table summarize the multiple
regression analysis for the response variable (y)
is weight (in pounds), and the predictor (x)
variables are H ( height in inches), W (waist
circumference in cm), and C (cholesterol in mg)
48Example D, cont
- If only one predictor variable is used to
predict weight, which single variable is best?
Why? - If exactly two predictor variables are used to
predict weight, which two variables should be
chosen? Why? - Which regression equation is best for predicting
weight? Why?
49CONCLUSION
- This chapter introduces important methods
(regression) for making inferences about a
relationship between two or more variables and
describing such a relationship with an equation
that can be used for predicting value of one
variable given the value of the other variable.
Thank You