Title: Chapter 4 Describing the Relation Between Two Variables
1Chapter 4Describing the Relation Between Two
Variables
- 4.1
- Scatter Diagrams Correlation
2Objectives
- Draw scatter diagrams
- Interpret scatter diagrams
- Understand the properties of the linear
correlation coefficient - Compute and interpret the linear correlation
coefficient.
3Bivariate Data
Bivariate data is data in which two variables are
measured on an individual. Do you want to use
the value of one variable to predict the value of
the other variable? If so, how will we do this?
The response variable is the variable whose value
can be explained or determined based upon the
value of the predictor variable.
4Bivariate Data
- Remember that if the type of data we collect is
observational, we can not conclude a causal
relation. - Also sometimes it is not clear which is the
predictor variable and which is the response
variable. - A lurking variable is one that is related to the
response and/or predictor variable, but is
excluded from the analysis
5Scatter Diagrams
- A scatter diagram shows the relationship between
two quantitative variables measured on the same
individual. - Each individual in the data set is represented by
a point in the scatter diagram. - The predictor variable is plotted on the
horizontal axis (x) and the response variable is
plotted on the vertical axis (y). - Do not connect the points when drawing a scatter
diagram.
6EXAMPLE Drawing a Scatter Diagram
The following data are based on a study for
drilling rock. The researchers wanted to
determine whether the time it takes to dry drill
a distance of 5 feet in rock increases with the
depth at which the drilling begins. So, depth
at which drilling begins is the predictor
variable, x, and the response variable, y is the
time (in minutes) to drill five feet. Draw a
scatter diagram of the data. Source Penner, R.,
and Watts, D.G. Mining Information. The
American Statistician, Vol. 45, No. 1, Feb. 1991,
p. 6.
7(No Transcript)
8(No Transcript)
9Interpreting Scatter Diagrams
- Since scatter diagrams show the type of relation
that exists between two variables, our goal is to
determine if there exists a - linear relation,
- a non-linear relation
- or no relation.
- The next 2 slides show the various scatter
diagrams and the type of relation implied.
10(No Transcript)
11Interpreting Scatter Diagrams
12Interpreting Scatter Diagrams
- Two variables that are linearly related can be
positively associated or negatively associated. - They are positively associated when above average
values of one variable are associated with above
average values of the corresponding variable. - That is, two variables are positively associated
when the values of the predictor variable
increase, the values of the response variable
also increase.
13Interpreting Scatter Diagrams
- Two variables that are linearly related are said
to be negatively associated when above average
values of one variable are associated with below
average values of the corresponding variable. - That is, two variables are negatively associated
when the values of the predictor variable
increase, the values of the response variable
decrease.
14Correlation
The linear correlation coefficient or Pearson
product moment correlation coefficient is a
measure of the strength of linear relation
between two quantitative variables. We use the
Greek letter ? (rho) to represent the population
correlation coefficient and r to represent the
sample correlation coefficient. We will only do
the sample correlation coefficient.
15Correlation
- Where r is the sample correlation coefficient
- Xi is the data value for the predictor variable,
- X bar is the sample mean for the predictor
variable, - Sx is the standard deviation for the predictor
variable - yi is the data value for the response variable,
- y bar is the sample mean for the response
variable, - Sy is the standard deviation for the response
variable and - n is the number of individuals in the sample.
16Properties of the Linear Correlation Coefficient
1. The linear correlation coefficient is always
between -1 and 1, inclusive. That is, -1 lt r lt
1. 2. If r 1, there is a perfect positive
linear relation between the two variables. 3. If
r -1, there is a perfect negative linear
relation between the two variables. 4. The closer
r is to 1, the stronger the evidence of positive
association between the two variables. 5. The
closer r is to -1, the stronger the evidence of
negative association between the two variables.
17Properties of the Linear Correlation Coefficient
6. If r is close to 0, there is evidence of no
linear relation between the two variables.
Because the linear correlation coefficient is a
measure of strength of linear relation, r close
to 0 does not imply no relation, just no linear
relation. 7. It is a unitless measure of
association. So, the unit of measure for x and y
plays no role in the interpretation of r.
18Correlation Coefficient
19Correlation Coefficient
20Correlation Coefficient
21Correlation Coefficient
22Correlation Coefficient
23Correlation Coefficient
24Correlation Coefficient
25Correlation Coefficient
26Correlation Coefficient
- So the correlation coefficient describes the
strength and the direction of the linear
relationship between a predictor variable and a
response variable.
27EXAMPLE Determining the Linear Correlation
Coefficient Determine the linear correlation
coefficient of the drilling data.
28(No Transcript)
29Sum8.501037 / 11 .773
30EXAMPLE Determining the Linear Correlation
Coefficient
r .773 This is a linear correlation coefficient
that implies a POSITIVE association. Note that
it only applies an association not causation. A
linear correlation coefficient computed using
observational data does not imply causation among
the variables.