Correlation - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Correlation

Description:

... compounds associated with the discharge of firearms, napthalene, TEAC-2 and ... for a Winchester SKEET 100 cartridge stored at 7 C, shown as scatterplots ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 18
Provided by: osirisSun
Category:

less

Transcript and Presenter's Notes

Title: Correlation


1
Correlation
  • Forensic Statistics CIS205

2
Introduction
  • Chi-squared shows the strength of relationship
    between variables when the data is of count form
  • However, many variables measured in a lab are on
    a continuous scale, such as concentrations of
    chemicals, time, and most machine responses
  • The term for the strength of the relation between
    continuous variables is correlation
  • Any continuous variables which have some sort of
    systematic relationship are said to covary, and
    any variable which covaries with another is said
    to be a covariate.
  • A basic tool for the investigation of correlation
    is the scatterplot. Usually only two variables
    are plotted, but three can be accommodated.

3
Correlation Coefficient
  • A statistical measure of correlation is called
    the correlation coefficient, which can only take
    on values between -1 and 1.
  • Both 1 and -1 mean that the variables are
    absolutely related
  • 1 means that as one variable increases, so does
    the other
  • -1 means that as one variable increases, the
    other decreases.
  • 0 means that the variables are unrelated.
  • The strength of relationship is independent of
    the form of relationship. Most commonly
    relationships are linear (plotting one variable
    against another yields a straight line), next
    most commonly loglinear (a graph of one variable
    against the logarithm of the other is linear).

4
(No Transcript)
5
Ageing properties of the dye methyl violet (Grim
et al., 2002)
  • This example will be used to demonstrate the
    process involved in the calculation of a linear
    correlation coefficient
  • Laser desorption mass spectrometry was used to
    examine the ageing properties of the dye methyl
    violet, a dye used in inks from the 1950s.
  • Documents written in methyl violet ink were
    artificially aged with ultra violet radiation.
  • After various times the average molecular weight
    for the methyl violet compound was measured.
  • The raw data is shown in table 6.1, and plotted
    in figure 6.2

6
Table 6.1. Average molecular weight of the dye
methyl violet and UV irradiation time from an
accelerated ageing experiment.
Time (min) Weight (Da)
0.0 367.20
15.3 368.97
30.6 367.42
45.3 366.19
60.2 365.91
75.5 365.68
90.6 365.12
105.7 363.59
7
(No Transcript)
8
Correlation coefficient r
  • Visual inspection of Fig. 6.2 suggests that there
    is a negative linear correlation between time and
    mean molecular weight.
  • A suitable measure of this linear correlation r
    is

9
.
Time (min) x mean x (x mean x)² Weight (Da) y mean y (y mean y)² (x mean x)(y mean y)
0.0 -52.90 2798.41 367.20 0.94 0.883 -49.72
15.3 -37.61 1414.51 368.97 2.71 7.344 -101.92
30.6 -22.83 498.63 367.42 1.16 1.345 -25.90
45.3 -7.61 57.91 366.19 -0.07 0.005 0.53
60.2 7.33 53.73 365.91 -0.35 0.122 -2.57
75.5 22.61 511.21 365.68 -0.58 0.336 -13.11
90.6 37.67 1419.03 365.12 -1.14 1.300 -42.94
105.7 52.84 2792.06 363.59 -2.67 7.129 -141.08
mean x 52.89 S 9545.50 mean y 366.26 S 18.465 S -376.72
10
Substituting these values into the equation for r
we have
  • This means that as the irradiation time increases
    the average molecular weight of methyl violet
    ions decreases, and as -0.89 is close to -1, the
    negative linear relationship is quite strong

11
Significance tests for correlation coefficients
  • A linear correlation coefficient of -0.89 sounds
    quite high, but is it significantly high? Is it
    possible that such a coefficient would occur in
    data drawn randomly from a bivariate normal
    distribution?
  • Also, what about the effect of sample size? It
    makes sense that a high coefficient based on lots
    of x,y pairs is somehow more significant than an
    equal correlation based on only a few
    observations.
  • For the null hypothesis that the correlation
    coefficient is 0, a suitable test statistic is
  • t r vdf / v (1 - r²).

12
Substituting for the methyl violet example
  • t r vdf / v (1 - r²).
  • t is the ordinate (horizontal axis) on the
    t-distribution
  • df is degrees of freedom equal to n 2 (here 6
    because we have 8 x,y pairs)
  • The linear correlation coefficient was -0.89, so
  • t -0.89 v6 / v (1 - -0.89²) -4.78
  • If we look at the values of the t-distribution
    table for df 6 we see that 95 of the area is
    within 2.447.
  • Our value of -4.78 is beyond -2.447, so we can
    say that the correlation coefficient is
    significant at 95 confidence.

13
Correlation coefficients for non-linear data
  • Andrasko and Ståhling measured three compounds
    associated with the discharge of firearms,
    napthalene, TEAC-2 and nitroglycerin over a
    period of time by solid phase microextraction
    (SPME) of the gaseous residue from the expended
    cartridge.
  • They found that the concentrations of these
    compounds would decrease with time, and that this
    property would be of use in estimating the time
    since discharge for this type of cartridges.
  • Table 6.3 is a table of the peak area for
    nitroglycerine and time elapsed since discharge
    for a Winchester SKEET 100 cartridge stored at
    7C, shown as scatterplots in Figure 6.3

14
Time since discharge (days) Nitroglycerin (peak height)
1.21 218.34
2.42 216.16
3.62 100.00
4.69 75.55
7.49 56.52
9.42 50.62
11.60 31.00
14.69 41.44
21.50 15.53
25.70 14.63
29.86 10.41
37.20 5.16
42.42 7.26
15
(No Transcript)
16
Log-linear relationships
  • A common model for loss in chemistry (e.g.
    radioactive decay) is called inverse exponential
    decay, which entails a log-linear relationship
    between the two variables
  • The right hand scatterplot of Figure 6.3 shows
    the log to the base e (or natural logarithm) of
    the nitroglycerine peak height against time. Here
    we can see that the data looks much more linear.
  • The linear correlation coefficient is -0.95,
    which is quite high, and suggests that this may
    be a reasonable transformation of the variables
  • The calculations for the log-linear correlation
    coefficient are exactly the same kind as in table
    6.2, only the log to the base e of the y variable
    has been used, rather than the untransformed y.

17
The coefficient of determination
  • The coefficient of determination is a direct
    measure of how much the variance in one of the
    covariates is attributed to the other.
  • We can imagine that the total variance in the
    nitroglycerin peak is made up of two parts, that
    which is attributable to the relationship with x
    (time), and that which can be seen as random
    noise.
  • The coefficient of determination describes what
    proportion of the variance is attributable to
    relationship with time.
  • The coefficient of determination is simply the
    square of the correlation coefficient.
  • If r - 0.95, r² 0.90.
  • Often the coefficient of determination is
    described as a percentage, which in the example
    above would mean that 90 of the variance in
    nitroglycerin peak area is attributable to time.
Write a Comment
User Comments (0)
About PowerShow.com