Title: Regression Lecture 8
1Regression Lecture 8
2Aims for Today - Regression
- Drawing lines on scatterplots
- The regression line Predicting values
- Correlation
- Ranked based correlation
- Break/Handout
- Examples by Dan
- Chile and maybe being hit by a car
- How tos
3(No Transcript)
4Scatter Plot
- Plotting 2 continuous-ish variables
- Exploring their association
- One of the most used and most useful techniques
in science.
5(No Transcript)
6Several ways to make in SPSS.
7Default shows what appears to be a negative
relationship, but the graphs can be improved.
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14Graphing 3 Variables (London et al., 2007)
- 4- to 9-year olds
- 2 week recall
- 10 month recall
15Can you see the 8s?
16(No Transcript)
17(No Transcript)
18Is this the right approach?
- Fitting a straight line (a linear relationship)
19Finding the Regression Line
- Very general procedure (easily expanded)
- Simple linear regression
- Easiest way is just to draw a straight line
yourself - A more formal method has some value
-
- and finding the ß0 and ß1 which minimize Sei2
- Least Squares is also used in t test and mean
- (least absolute value is used for the median)
20(No Transcript)
21- minimizing the squared residuals min Sei2
- Is least squares regression
- better than eyeballing it?
- Are there better formal methods?
22Do you need to know the equations for ß0 and
ß1? Not reallyWould they be worth seeing
once? Probably
just look at, don't write
23Regressions sometimes used to predict values
(data based on Tytherleigh, 2002)
24Running a regression in R
lm is for Linear Model
25(No Transcript)
26r2 or adjusted r2and r or R
27(No Transcript)
28Assessing the Fit The Correlation
29Equation bit
- Top part determines whether positive or negative.
If xi and yi are same side as their means,
positive, otherwise negative. - If as one goes up, the other goes up, positive.
30Correlation Strength of the linear relationship
- Can get to it in several ways.
- The correlation squared in the proportion of
shared variance. - The correlation can range only from -1 to 1.
- Does a correlation between x and y mean x caused
y? - Does a correlation between x and y mean that
there is some causal relationship in the network
of hypotheses that include x and y? - Are the most parsimonious ones x -gt y and y -gt x?
31Significance Testing
- H0 ? (rho) 0
- Almost always use two tailed tests
- You must know the sample size
- r 0.1 is significant with n500 at 5
- r 0.4 is not significant with n20 at 5
- (Cohen sizes .1 small, .3 medium, .5 large)
32Significance Testing and Confidence IntervalsThe
equations
with df n - 2, and df 1, n - 2
33Making Confidence Intervals
- Several programs on web. http//glass.ed.asu.edu/
stats/analysis/rci.html
34(No Transcript)
35Notice the Normal and Basic Bootstraps give
impossible upper bounds.
BCa very similar to asymptotic methods
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40r .64, w/o outlier r .92, w/o influential
point r .38
41Assumptions for Significance
- Random sampling
- It must make sense to talk about the response
variable (the DV) as being continuous. - No weird patterns (or non-linear in general) in
residuals. Variance of residual homoscedastic
(ie., not varying by other variables -
heteroscedastic) - Examination of outliers
42What to do if assumptions not meet(to get data,
install and load mrt. data(crime) and attach)
43Ranked based Correlation
- Spearman's rho
- Rank the data and use Pearson's stuff for ties.
- r .94 and Spearman's rS .78.
44In SPSS and R just tick a box or change the method
Doesn't print confidence interval
45- Same correlation estimate.
- But the CI really does meet appropriate
assumptions.
46(No Transcript)
47Break Time
- Short break, we have a lot to get through
afterwards. - In 4 groups
- Look at the handout that I am about to give you.
Discuss how you would report your findings in a
scientific journal versus People magazine. Are
there any other statistics you would want to do? - Talk about what you wrote for Suppose an
undergraduate said "Since it is for looking at
differences among means, why is it called an
Analysis of Variance?" -
48Some Examples
- Chile Heat To discuss re-expression and what to
do with outliers. - Automobile Accidents To discuss using theory to
guide your statistics.
49Are smaller chiles hotter?
- How to measure length and heat.
- Length skewed
50Testing Normality
51par(mfrowc(1,2)) qqnorm(LENGTH)
qqline(LENGTH)qqnorm(log(LENGTH2.54))qqline(log
(LENGTH2.54)) par(mfrowc(1,1))
52Measuring Heat Scoville units or the number of
chiles?
53(No Transcript)
54(No Transcript)
55Command Summary
- r1 lt- lm(HEATLENGTH)
- r2 lt- lm(HEATLENGTHlt30LENGTHLENGTHlt30)
- r3 lt- lm(HEAT log(LENGTH 2.54))
-
56(No Transcript)
57(No Transcript)
58plot(r1)Nu Mex is hotter than predicted for
its length
59What to do with
-
- Genetically
- engineered.
-
- Depends on the population and purpose.
60What is a "linear model"
Y ßX e
Don't worry if you dislike matrix notation
61(No Transcript)
62Vehicle-Pedestrian Accidents
- What is the relationship between the impact
velocity of a vehicle and the throw of a
pedestrian? - A lot is known about how a body should move when
hit by a car at a certain velocity. - Good reason to suggest throwi k vi2 ei
- Dan will glance around to see if anyone
looks interested in "why" this equation makes
sense, and may skip the next two slides.
63Why Theoretical Sense?
- Body takes on impact
- horizontal velocity of
- the car, v, at an angle
- above the horizontal.
- Vertical velocity vy v sin ?
- Horizontal velocity vx v cos ?
- Time in air, t, is related only to vy. t 2
vy / g, where g is the constant for gravity on
Earth, about 10m/s2..
64- Without friction, vx is constant and thus throw
should be - and if ? is the same for all cars throw v2 k,
where k is a constant. - Thus, throwi k vi2 ei
- Simpler Only 1 unknown (k) to solve for AND it
has some empirical meaning
65Wood, Simms Walsh (2005)
66Otte's work with crash test dummies
67- reglin lt- lm(distance speed)
- regpoly lt- lm(distance speed spsq)
- regmodel lt- lm(distance spsq - 1)
This can done in SPSS too. Tick
no intercept/constant.
68(No Transcript)
69Summary
70This week's journal
- Try help(par)
- Write an equation in Word
- Access these data from web fishstock in .dat
(use read.table) or .sav (use SPSS or read.spss) - Variables are ocean (how much winter low
temperature is above freezing in Celsius) and
fishstock (gt 2cm in - thousands per cubic kilometer).
- What are the correlation and the regression
equation? - Write a sentence about the results.
71(No Transcript)