Title: Midterm II Results
1Midterm II Results
- Generally good performances
- Some improved (Congrats).
- Some have more room for improvement
- Important point
- This is only about half of total score
2Midterm II Results
- Where do you stand?
- Current Total
- 0.45 MT I 0.45 MTII 0.1 HW Avg
- 91 100 A
- 82 91 B
- 73 82 C
- 64 82 D
- - 64 F
3Midterm II Results
4Stat 31, Section 1, Last Time
- Inference for Proportions
- Sample Size
- Best Guess
- Conservative
- Hypothesis testing
- 2-way tables
- Divide Populations in 2 ways
- Visualize with 2-way bar graph
- Study counts proportions (-ages)
5Small Error Last Time
- Mislabelled Cell
- Led to mistaken visual impression
- First show wrong version
- Then show right version
- All in Class Example 40
- https//www.unc.edu/marron/UNCstat31-2005/Stat31E
g40.xls
6Two-Way Tables Wrong
- Big Question
- Is there a
- relationship?
- Note tallest bars
- French Wine ?? French Music
- Italian Wine ?? Italian Music
- Other Wine ?? No Music
- Suggests there is a relationship
7Two-Way Tables Corrected
- Big Question
- Is there a
- relationship?
- Note tallest bars
- French Wine ?? French Music
- Italian Wine ?? Italian Music
- Other Wine ?? No Music
- Suggests there is a relationship
8Two-Way Tables
- Testing for independence
- What is it?
- From probability theory
- PA B PA
- i.e. Chances of A, when B is known, are same as
when B is unknown - Table version of this idea?
9Independence in 2-Way Tables
- Recall
- PA B PA
- Counts - proportions analog of these?
- Analog of PA?
- Proportions of factor A, not knowing B
- Called marginal proportions
- Analog of PAB???
10Independence in 2-Way Tables
- Marginal proportions (or counts)
- Sums along rows
- Sums along columns
- Useful to write at margins of table
- Hence name marginal
- Number of independent interest
- Also nice to put total at bottom
11Independence in 2-Way Tables
- Marginal Counts
- Class Example 40 (Wine Music), Part 3
- https//www.unc.edu/marron/UNCstat31-2005/Stat31E
g40.xls - Marginals are of independent interest
- Other wines sold best (French second)
- Italian music sold most wine
- But dont tell whole story
- E.g.Cant see same music wine is best
- Full table tells more than marginals
12Independence in 2-Way Tables
- Recall definition of independence
- PA B PA
- Counts analog of PAB???
- Recall
- So equivalent condition is
13Independence in 2-Way Tables
- Counts analog of PAB???
- Equivalent condition for independence is
- So for counts, look for
- Table Propn Row Margl Propn x Coln Margl
Propn - i.e. Entry Product of Marginals
14Independence in 2-Way Tables
- Visualize Product of Marginals for
- Class Example 40 (Wine Music), Part 4
- https//www.unc.edu/marron/UNCstat31-2005/Stat31E
g40.xls - Shows same structure
- as marginals
- But not match between
- music wine
- Good null hypothesis
15Independence in 2-Way Tables
- Independent model appears different
- But is it really different?
- Or could difference be simply explained by
natural sampling variation? - Check for statistical significance
16Independence in 2-Way Tables
- Approach
- Measure distance between tables
- Use Chi Square Statistic
- Has known probability distribution when table is
independent - Assess significance using P-value
- Set up as H0 Indep. HA Dependent
- P-value Pwhat saw or m.c. Indep.
17Independence in 2-Way Tables
- Chi-square statistic Based on
- Observed Counts (raw data),
- Expected Counts (under indep.),
- Notes
- Small for only random variation
- Large for significant departure from indep.
18Independence in 2-Way Tables
- Chi-square statistic calculation
- Class example 40, Part 5
- https//www.unc.edu/marron/UNCstat31-2005/Stat31E
g40.xls - Calculate term by term
- Then sum
- Is X2 18.3 big or small?
19Independence in 2-Way Tables
- H0 distribution of the X2 statistic
- Chi Squared (another Greek letter )
- Parameter degrees of freedom
- (similar to T distribution)
- Excel Computation
- CHIDIST (given cutoff, find area prob.)
- CHIINV (given prob area, find cutoff)
20Independence in 2-Way Tables
- Explore the distribution
- Applet from Webster West (U. So. Carolina)
- http//www.stat.sc.edu/west/applets/chisqdemo.htm
l - Right Skewed Distribution
- Nearly Gaussian for more d.f.
21Independence in 2-Way Tables
- For test of independence, use
- degrees of freedom
- (rows 1) x (cols 1)
- E.g. Wine and Music
- d.f. (3 1) x (3 1) 4
22Independence in 2-Way Tables
- E.g. Wine and Music
- P-value PObserved X2 or m.c. Indep.
- PX2 18.3 of m.c. Indep.
- PX2 gt 18.3 d.f. 4
- 0.0011
- Also see Class Example 40, Part 5
- https//www.unc.edu/marron/UNCstat31-2005/Stat31E
g40.xls
23Independence in 2-Way Tables
- E.g. Wine and Music
- P-value 0.001
- Yes-No Very strong evidence against
independence, conclude music has a statistically
significant effect - Gray-Level Also very strong evidence
24Independence in 2-Way Tables
- Excel shortcut
- CHITEST
- Avoids the (obs-exp)2 / exp calculatn
- Automatically computes d.f.
- Returns P-value
25Independence in 2-Way Tables
26And Now for Something Completely Different
- A statistics joke, from
- GARY C. RAMSEYER'S INTERNET GALLERY OF STATISTICS
JOKES - http//www.ilstu.edu/gcramsey/Gallery.html
27And Now for Something Completely Different
- A somewhat advanced society has figured how to
package basic knowledge in pill form. - A student, needing some learning, goes to the
pharmacy and asks what kind of knowledge pills
are available.
28And Now for Something Completely Different
- The pharmacist says "Here's a pill for English
literature." - The student takes the pill and swallows it and
has new knowledge about English literature!
29And Now for Something Completely Different
- "What else do you have?" asks the student.
- "Well, I have pills for art history, biology, and
world history, "replies the pharmacist. - The student asks for these, and swallows them and
has new knowledge about those subjects!
30And Now for Something Completely Different
- Then the student asks, "Do you have a pill for
statistics?" - The pharmacist says "Wait just a moment", and
goes back into the storeroom and brings back a
whopper of a pill that is about twice the size of
a jawbreaker and plunks it on the counter. - "I have to take that huge pill for statistics?"
inquires the student.
31And Now for Something Completely Different
- The pharmacist understandingly nods his head and
replies - "Well, you know statistics always was a little
hard to swallow."
32Caution about 2-Way Tables
- Simpsons Paradox
- Aggregation into tables can be dangerous
- E.g. from
- http//www.math.sfu.ca/cschwarz/Stat-301/Handout
s/node49.html - Study Admission rates to professional programs,
look for sex bias.
33Simpsons Paradox
- Admissions to Business School
- Males adted 480 / (480 120) 100
- 80
- Females adted 180 / (180 20) 100
- 90
- Better for females???
34Simpsons Paradox
- Admissions to Law School
- Males adted 10 / (10 90) 100
- 10
- Females adted 100 / (100200)100
- 33.3
- Better for females???
35Simpsons Paradox
- Combined Admissions
- Males adted 490 / (490 210) 100
- 70
- Females adted 280 / (280210)100
- 56
- Better for males???
36Simpsons Paradox
- How can the rate be higher for both females and
also males? - Reason depends on relative proportions
- Notes
- In Business (male applicants dominant), easier to
get in () - In Law (female applicants dominant), much harder
to get in
37Simpsons Paradox
- How can the rate be higher for both females and
also males? - Reason depends on relative proportions
- Notes
- In Business (male applicants dominant), easier to
get in - (660 / 800)
- In Law (female applicants dominant), much harder
to get in - (110 / 400)
38Simpsons Paradox
- Lesson
- Must be very careful about aggregation
- Worse may not be aware that aggregation has been
done. - Recall terminology Lurking Variable
- Can hide in aggregation
- Could be used for cheating
39Simpsons Paradox
40Inference for Regression
- Chapter 10
- Recall
- Scatterplots
- Fitting Lines to Data
- Now study statistical inference associated with
fit lines - E.g. When is slope statistically significant?
41Recall Scatterplot
- For data (x,y)
- View by plot
- (1,2)
- (3,1)
- (-1,0)
- (2,-1)
42Recall Linear Regression
- Idea
- Fit a line to data in a scatterplot
- To learn about basic structure
- To model data
- To provide prediction of new values
43Recall Linear Regression
- Recall some basic geometry
- A line is described by an equation
- y mx b
- m slope
m - b y intercept
b - Varying m b gives a family of lines,
- Indexed by parameters m b (or a b)
44Recall Linear Regression
- Approach
- Given a scatterplot of data
- Find a b (i.e. choose a line)
- to best fit the data
45Recall Linear Regression
- Given a line, , indexed by
- Define residuals data Y Y on line
-
- Now choose to make these small
46Recall Linear Regression
- Excellent Demo, by Charles Stanton, CSUSB
- http//www.math.csusb.edu/faculty/stanton/m262/reg
ress/regress.html - More JAVA Demos, by David Lane at Rice U.
- http//www.ruf.rice.edu/lane/stat_sim/reg_by_eye/
index.html - http//www.ruf.rice.edu/lane/stat_sim/comp_r/inde
x.html
47Recall Linear Regression
- Make Residuals gt 0, by squaring
- Least Squares adjust to
- Minimize the Sum of Squared Errors
48Least Squares in Excel
- Computation
- INTERCEPT (computes y-intercept a)
- SLOPE (computes slope b)
- Revisit Class Example 14
- https//www.unc.edu/marron/UNCstat31-2005/Stat31E
g14.xls - HW 10.3a
49Inference for Regression
- Goal develop
- Hypothesis Tests and Confidence Ints
- For slope intercept parameters, a b
- Also study prediction