Correlation - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Correlation

Description:

Bivariate Data. For bivariate data each observation consists of two values. ... Graphically bivariate data is viewed through the use of scatter plots. Example ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 33
Provided by: richardu
Category:

less

Transcript and Presenter's Notes

Title: Correlation


1
Correlation
2
Overview
  • To frame our discussion, consider

3
Outline
  • Bivariate Data
  • Correlation
  • Models

4
Bivariate Data
  • For bivariate data each observation consists of
    two values. Observations are provided as an
    ordered pair.

5
Representation
  • Graphically bivariate data is viewed through the
    use of scatter plots.

6
Example
  • On a recent project we collected data for each
    module in the software developed. The table
    contains the module size in KLOC, the number of
    control flow paths and the number of post-release
    faults.

7
(No Transcript)
8
Scatter Plot
9
Scatter Plot
  • A scatter plot shows the relationship between two
    quantitative variables measured on the same
    individual.

10
Interpreting the Scatter Plot
  • Look at
  • Form
  • Do the data points form a shape?
  • Direction
  • Does the shape formed indicate a direction?
  • Strength
  • Is there little scatter about the shape?

11
Associations
  • Two variables can be either
  • Positively associated
  • Or
  • Negatively associated

12
Positively Associated
  • Two variables are positively associated when
    above average values in one variable results in
    above average values in the other.

13
Negatively Associated
  • Two variables are negatively associated when
    above average values in one variable results in
    below average values in the other.

14
Example Data
  • Team 1 Team 2
  • Effort LOC Effort LOC
  • 16.7 6050 23.5 5030
  • 22.6 8363 12.6 7353
  • 32.2 13334 19.4 12314
  • 3.9 5942 15.6 8942
  • 17.3 3315 19.2 5315
  • 67.7 38988 22.3 30988
  • 10.1 38614 34.5 32614
  • 19.3 12762 24.8 14762
  • 10.6 13510 31.6 11510
  • 59.5 26500 29.9 28500

15
Example Scatter Plot
16
Correlation
  • The correlation measures the strength and
    direction of a linear relationship between two
    variables. Correlation is usually written as r.

17
Assumptions
  • The distribution of both variables is
    approximately normal.
  • The relation between the variables is linear.
    The bounding figure for the scatter plot is an
    ellipse.
  • Homoscedastic (homogeneity of variance)

18
Example with Correlation
19
Properties of Correlations
  • It makes no difference which variable is x and
    which is y.
  • Correlation requires both variables be
    quantitative.
  • Correlation is always between -1 and 1.
  • Correlation looks for linear relationships.

20
Interpreting r
  • lt.20 slight correlation almost no relationship
  • .20 - .40 low correlation small relationship
  • .40 - .70 moderate correlation substantial
    relationship
  • .70 - .90 high correlation marked relationship
  • gt.90 very high correlation solid relationship
  • (Use the values as absolute value. Hence, r -.25
    is a small negative relationship.)

21
Sample (r.91)
22
Relationships
  • We look for patterns in scatter plots.
  • Positive
  • Negative
  • Modeling the relationship with a straight line.

23
Visualizing Correlation
r1.0
r-0.54
r0.85
r0.42
http//www.psychstat.smsu.edu/introbook/sbk17.htm
24
Correlation Coefficient
  • Since visual inspection is difficult to rely on,
    we fall back on mathematical techniques to search
    for relationships.

25
  • The coefficient of correlation, r, measures the
    strength of the linear relationship that exists
    within a sample.

26
Alternately
27
Relation to Sum of Squares
28
Example
  • A real-estate developer is interested in
    determining the relationship between family
    income (in thousands of dollars) and the size of
    their homes (in hundreds of square feet). A
    random sample of 10 families results in the
    following data

29
Example 1
30
Example 1
31
Properties
  • -1.0rlt1.0
  • The larger r, the stronger the linear
    relationship.
  • r near zero indicates there is no linear
    relationship.
  • The sign of r tells whether the relationship is
    positive (direct) or negative (inverse).

32
Example 2
Write a Comment
User Comments (0)
About PowerShow.com