Regression vs. Correlation - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Regression vs. Correlation

Description:

... 99 99 99 99 99 37 99 60 99 Kpar -0.23911 -0.24013 0.78394 0.89941 -0.56355 1.00000 -0.76680 0.85542 0.04579 0.1541 0.1523 – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 16
Provided by: cmay2
Category:

less

Transcript and Presenter's Notes

Title: Regression vs. Correlation


1
Regression vs. Correlation
Both Two variables Continuous
data Regression Change in X causes change in
Y Independent and dependent variables or Pred
ict X based on Y Correlation No dependence
(causation) assumed Estimate the degree to which
2 variables vary together
2
Correlation more on bivariate statistics
No dependence (causation) assumed Can call
variables XY or X1X2 Are to variables
independent, or do they covary
3
Nature of variables Nature of variables
Purpose of investigator Y random, X fixed Both random
Establish and estimate dependence of Y upon X, describe functional relationship or predict Y from X Model I regression Model II regression, with few exceptions, eg prediction
Establish and estimate association (interdependence) between X Y Meaningless Correlation co-efficient, significance only if , normally distributed
Adapted from Sokal Rolf pg 559
4
Visualize Correlation
positive
negative
Y(X2)
Y(X2)
X1
X1
Increase in X associated with increase in Y
Increase in X associated with decrease in Y
5
No correlation
No correlation
Y(X2)
Y(X2)
X1
X1
horizontal
vertical
6
Pearson product-moment correlation coefficient
Summed products of deviations of x y
? xy

r
?
? x2 ? y2
ss X ss Y
?(x-xbar) (y-ybar)

?
?(x-xbar)2 ?(y-ybar)2
7
Equivalent calculations (1)
? xy
r
(n-1) sxsy
Where sx SD X sy SD Y
8
Equivalent calculations (2)
? (Yi-Ybar)2
regression SS


(r2)
? (Yi-Ybar)2
total SS
?
regression SS
?
r r2
total SS
9
Testing significance H0 r (?) 0
Assumes that data come from bivariate normal
distribution
true population parameter
10
r
t
sr
SE of r
?
1-r2
sr

n-2
Reject null if t calc gt t?(2), ?
11
data start infile 'C\Documents and
Settings\cmayer3\My Documents\teaching\Biostatisti
cs\Lectures\monitoring data for corr.csv' dlm','
DSD input year day site depth temp DO spCond
turb pH Kpar secchi alk Chla options ls180
proc print data one set start options
ls100 proc corr var temp DO spCond turb pH
Kpar secchi alk Chla Correlations on raw
data data two set start lnturblog(turb)
Create new variables by transformation lnsecchil
og(secchi) lgturblog10(turb) lgsecchilog10(s
ecchi) sqturbsqrt(turb) sqsecchisqrt(secc
hi) proc print data three set two
Correlations on transformed data proc corr
var lnturb lnsecchi proc corr var lgturb
lgsecchi proc corr var sqturb sqsecchi data
four set two Plot raw and transformed options
ls100 proc plot plot turbsecchi plot
lnturblnsecchi plot lgturblgsecchi plot
sqturbsqsecchi run
12
Pearson
Correlation Coefficients
Prob gt r under H0 Rho0
Number of
Observations temp DO
spCond turb pH Kpar secchi
alk Chla temp 1.00000 -0.21792
0.06538 -0.14523 0.35328 -0.23911 0.15689
0.11311 0.37612 0.0302
0.5202 0.1515 0.0003 0.1541 0.1209
0.3895 0.0001 99 99
99 99 99 37 99
60 99 DO -0.21792 1.00000
0.01542 -0.21550 0.50679 -0.24013 -0.06504
0.15790 0.38699 0.0302
0.8796 0.0322 lt.0001 0.1523 0.5224
0.2282 lt.0001 99 99
99 99 99 37 99
60 99 spCond 0.06538 0.01542
1.00000 0.48214 -0.29017 0.78394 -0.51332
0.74021 0.21367 0.5202 0.8796
lt.0001 0.0036 lt.0001 lt.0001
lt.0001 0.0337 99 99
99 99 99 37 99
60 99 turb -0.14523 -0.21550
0.48214 1.00000 -0.33727 0.89941 -0.50336
0.47441 0.07208 0.1515 0.0322
lt.0001 0.0006 lt.0001 lt.0001
0.0001 0.4783 99 99
99 99 99 37 99
60 99 pH 0.35328 0.50679
-0.29017 -0.33727 1.00000 -0.56355 0.14049
-0.14061 0.61033 0.0003 lt.0001
0.0036 0.0006 0.0003 0.1654
0.2839 lt.0001 99 99
99 99 99 37 99
60 99 Kpar -0.23911 -0.24013
0.78394 0.89941 -0.56355 1.00000 -0.76680
0.85542 0.04579 0.1541 0.1523
lt.0001 lt.0001 0.0003 lt.0001
lt.0001 0.7878 37 37
37 37 37 37 37
29 37 secchi 0.15689 -0.06504
-0.51332 -0.50336 0.14049 -0.76680 1.00000
-0.49649 -0.30918 0.1209 0.5224
lt.0001 lt.0001 0.1654 lt.0001
lt.0001 0.0018 99 99
99 99 99 37 99
60 99
alk 0.11311 0.15790 0.74021
0.47441 -0.14061 0.85542 -0.49649 1.00000
0.12410 0.3895 0.2282 lt.0001
0.0001 0.2839 lt.0001 lt.0001
0.3448 60 60 60
60 60 29 60 60
60 Chla 0.37612 0.38699 0.21367
0.07208 0.61033 0.04579 -0.30918 0.12410
1.00000 0.0001 lt.0001 0.0337
0.4783 lt.0001 0.7878 0.0018 0.3448
99 99 99 99
99 37 99 60 99
13
Nonparametric statistics
?Sometimes called distribution free statistics
because they do not require that the data fit a
normal distribution ? Many nonparametric
procedures are based on ranked data. Data are
ranked by ordering them from lowest to highest
and assigning them, in order, the integer values
from 1 to the sample size.
14
Some Commonly Used Statistical Tests Some Commonly Used Statistical Tests Some Commonly Used Statistical Tests
Normal theory based test Corresponding nonparametric test Purpose of test
t test for independent samples Mann-Whitney U test Wilcoxon rank-sum test Compares two independent samples
Paired t test Wilcoxon matched pairs signed-rank test Examines a set of differences
Pearson correlation coefficient Spearman rank correlation coefficient Assesses the linear association between two variables.
One way analysis of variance (F test) Kruskal-Wallis analysis of variance by ranks Compares three or more groups
Two way analysis of variance Friedman Two way analysis of variance Compares groups classified by two different factors
From http//www.tufts.edu/gdallal/npar.htm
15
Data transformations
? Data transformation can correct deviation
from normality and uneven variance
(heteroscedasticity) ? See chapter 13 in Zar ?
Pretty much.. Whatever works, works. Some
common ones are for or proportion use asin of
square root log10 for density (/m2) ? Right
transformation can allow you to use parametric
statistics
Write a Comment
User Comments (0)
About PowerShow.com