Stat 10x - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Stat 10x

Description:

Least squares regression. Imagine fitting a line through ... Q: Which c gives the least-squares fit? A: ...another property of the mean. r2. Which is smaller: ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 38
Provided by: joec7
Category:
Tags: 10x | stat

less

Transcript and Presenter's Notes

Title: Stat 10x


1
Stat 10x
  • J. Chang
  • Tuesday, 9/26/00

2
Administrative Notes
  • Go to Section on Thursday!

3
Scatterplots
Last time
  • Plot two variables simultaneously.
  • Put one variable on horizontal axis,other
    variable on vertical axis.

4
E.g. weight vs. height
Last time
5
E.g. pulse vs. weight
Last time
6
Correlation
Last time
  • Measures strength and direction of linear
    relationship between two variables.
  • Between -1 and 1.
  • 1 perfect linear relationship, positive slope
  • -1 perfect linear relationship, negative slope

7
Definition of correlation
Last time
  • That is
  • standardize each xi and yi ,
  • multiply, and
  • average

8
Rough idea of definition
Last time
?

?

9
A small example worked out in detail by hand
Last time
  • Go to blackboard People might want to take
    notes.

10
Fathers and sons data
Last time
Correlationr ? 0.5
What is average height of son whose father is 72
?
11
Descriptive statistics on father-son data
Last time
  • Fathers Mean 68 SD 3
  • Sons Mean 69 SD 3
  • Average height of son if father is 72 ?
  • A natural guess Father is 4/3 SDs above mean,
    so guess son will be 4/3 SDs above mean, or
    73

12
Best guess depends on correlation
Last time
  • Guess that son will be,not 4/3 SDs above
    mean,but correlation ? 4/3 2/3 SDs above
    mean.
  • That is, in our example, guess sons height to
    be69 (2/3) ? 3 71 inches.

13
Equation of the regression line
Last time
Just a formula for all the best guesses
14
Today
  • Notes on regression
  • least-squares regression idea
  • fraction of variability explained
  • Bivariate normal distribution
  • Analysis in strips
  • Lurking variables
  • Perils of aggregation
  • Simpsons paradox
  • An example of simulation

15
Least squares regression
Imagine fitting a line through some data
16
(Predicted or fitted ys) (error or
residual)
17
The least squares criterion
Want residuals small Minimize sum of
squared residuals
bad fit
better fit
18
Flat lines...
y c
Q Which c gives the least-squares fit?
another property of the mean
19
r2
20
Interpretation
That is,
21
Bivariate normal distributions
22
Distributions within vertical stripsin a
bivariate Normal distribution
Consider y values in a narrow vertical strip at
x. These have
  • SD within a strip is always ? sY ( sY is
    SD over all individuals)
  • If r 1 then SD in a strip is 0
  • if r 0 then SD in a strip is same as sY

23
Example
1. What percentage of students score over 75 on
final exam?
Easy 75 is (75 - 65)/10 1 SD above
mean. Answer is 1 ? ?(1) ? 0.16 (16 ).
24
Example (cont.)
mean SD LSAT scores 650 80 final exams 65 10
r 0.6
2. Among students who get 730 on LSAT, what
fraction get over 75 on final exam?
So we want fraction of N(71,8) distrib to the
right of 75. Standard score for 75 is (75-71)/8
0.5. Answer 1 ? ?(0.5) ? 0.31. (Compare
previous 0.16)
25
A Pythagorean identity
Ignoring divisions by n-1, this says Variance
of fitted values (around mean) Variance
of ys around fitted values Variance of
ys.
26
fraction of variance explained by the regression
Easily derived from the equation of the
regression line, which we know Homework?
27
Notes about regression
  • Least-squares regression is not robust
    (resistant)
  • Two kinds of interesting points
  • Outlier a point with a large residual
  • Influential point if removed, causes a large
    change in the regression line

28
A little example
?
29
little example (cont)
Outlier?
No.
Yes.
Influential?
30
(No Transcript)
31
(No Transcript)
32
Lurking variables
A variable that has an important effect but was
overlooked.
Danger ConfoundingThinking an effect is due to
one variable when it is better explained by
another (lurking) variable.
1971 study People who drink a lot have higher
incidence of bladder cancer. Correlation
noticed. Causation?
33
Lurking variables (cont.)
1993 A larger study concluded that after
adjusting for the effects of smoking, no evidence
for increased risk from coffee.
Spurious correlations The correlation is real,
but causation isnt.
34
Lurking variables (cont.)
Lurking vars can also hide real correlations.
(...or even reverse correlations)
35
More on the perils of aggregationSimpsons
paradox
Categorical data Hospital A Hospital B Died
300 50 Survived 3000 1000
If you needed surgery, which hospital would you
prefer?
36
Simpsons paradox (cont.)
Hospital A Hospital B Died 300
50 Survived 3000 1000
MaybeHospital A university medical center,
attracts seriously ill patients from wide
areaHospital B local, fewer seriously sick
patients.
37
Simpsons paradox (cont.)
Another (real) example U.C. Berkeley,
1970s Committee searched for discrimination --
higher percentage of male applicants accepted
into grad school than female. Looking at
individual depts, no evidence of admitting men
more than women -- if anything the reverse. ???
Men were applying more to dept.s with higher
acceptance rates, women applying more to depts
that were harder to get into.
Write a Comment
User Comments (0)
About PowerShow.com