Title: CAUSE Webinar: Introducing Math Majors to Statistics
1CAUSE WebinarIntroducing Math Majors to
Statistics
- Allan Rossman and Beth Chance
- Cal Poly San Luis Obispo
- April 8, 2008
2Outline
- Goals
- Guiding principles
- Content of an example course
- Assessment
- Examples (four)
3Goals
- Redesign introductory statistics course for
mathematically inclined students in order to - Provide balanced introduction to the practice of
statistics at appropriate mathematical level - Better alternative than Stat 101 or Math Stat
sequence for math majors first statistics course
4Guiding principles (Overview)
- Put students in role of active investigator
- Motivate with real studies, genuine data
- Repeatedly experience entire statistical process
from data collection to conclusion - Emphasize connections among study design,
inference technique, scope of conclusions - Use variety of computational tools
- Investigate mathematical underpinnings
- Introduce probability just in time
5Principle 1 Active investigator
- Curricular materials consist of investigations
that lead students to discover statistical
concepts and methods - Students learn through constructing own
knowledge, developing own understanding - Need direction, guidance to do that
- Students spend class time engaged with these
materials, working collaboratively, with
technology close at hand
6Principle 2 Real studies, genuine data
- Almost all investigations focus on a recent
scientific study, existing data set, or student
collected data - Statistics as a science
- Frequent discussions of data collection issues
and cautions - Wide variety of contexts, research questions
7Real studies, genuine data
- Popcorn and lung cancer
- Historical smoking studies
- Night lights and myopia
- Effect of observer with vested interest
- Kissing the right way
- Do pets resemble their owners
- Who uses shared armrest
- Halloween treats
- Heart transplant mortality
- Lasting effects of sleep deprivation
- Sleep deprivation and car crashes
- Fan cost index
- Drive for show, putt for dough
- Spock legal trial
- Hiring discrimination
- Comparison shopping
- Computational linguistics
8Principle 3 Entire statistical process
- First two weeks
- Data collection
- Observation vs. experiment (Confounding, random
assignment vs. random sampling, bias) - Descriptive analysis
- Segmented bar graph
- Conditional proportions, relative risk, odds
ratio - Inference
- Simulating randomization test for p-value,
significance - Hypergeometric distribution, Fishers exact test
- Repeat, repeat, repeat,
- Random assignment ? dotplots/boxplots/means/media
ns ? randomization test - Sampling ? bar graph ? binomial ? normal
approximation
9Principle 4 Emphasize connections
- Emphasize connections among study design,
inference technique, scope of conclusions - Appropriate inference technique determined by
randomness in data collection process - Simulation of randomization test (e.g.,
hypergeometric) - Repeated sampling from population (e.g.,
binomial) - Appropriate scope of conclusion also determined
by randomness in data collection process - Causation
- Generalizability
10Principle 5 Variety of computational tools
- For analyzing data, exploring statistical
concepts - Assume that students have frequent access to
computing - Not necessarily every class meeting in computer
lab - Choose right tool for task at hand
- Analyzing data statistics package (e.g.,
Minitab) - Exploring concepts Applets (interactivity,
visualization) - Immediate updating of calculations spreadsheet
(Excel)
11Principle 6 Mathematical underpinnings
- Primary distinction from Stat 101 course
- Some use of calculus but not much
- Assume some mathematical sophistication
- E.g., function, summation, logarithm,
optimization, proof - Often occurs as follow-up homework exercises
- Examples
- Counting rules for probability
- Hypergeometric, binomial distributions
- Principle of least squares, derivatives to find
minimum - Univariate as well as bivariate setting
- Margin-of-error as function of sample size,
population parameters, confidence level
12Principle 7 Probability just in time
- Whither probability?
- Not the primary goal
- Studied as needed to address statistical issues
- Often introduced through simulation
- Tactile and then computer-based
- Addressing how often would this happen by
chance? - Examples
- Hypergeometric distribution Fishers exact test
for 22 table - Binomial distribution Sampling from random
process - Continuous probability models as approximations
13Content of Example Course (ISCAM)
Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6
Data Collection Observation vs. experiment, confounding, randomization Random sampling, bias, precision, nonsampling errors Paired data Independent random samples Bivariate
Descriptive Statistics Conditional proportions, segmented bar graphs, odds ratio Quantitative summaries, transformations, z-scores, resistance Bar graph Models, Probability plots, trimmed mean Scatterplots, correlation, simple linear regression
Probability Counting, random variable, expected value empirical rule Bermoulli processes, rules for variances, expected value Normal, Central Limit Theorem
Sampling/ Randomization Distribution Randomization distribution for Randomization distribution for Sampling distribution for X, Large sample sampling distributions for , Sampling distributions of , OR, Chi-square statistic, F statistic, regression coefficients
Model Hypergeometric Binomial Normal, t Normal, t, log-normal Chi-square, F, t
Statistical Inference p-value, significance, Fishers Exact Test p-value, significance, effect of variability Binomial tests and intervals, two-sided p-values, type I/II errors z-procedures for proportions t-procedures, robustness, bootstrapping Two-sample z- and t-procedures, bootstrap, CI for OR Chi-square for homogeneity, independence, ANOVA, regression
14Assessments
- Investigations with summaries of conclusions
- Worked out examples
- Practice problems
- Quick practice, opportunity for immediate
feedback, adjustment to class discussion - Homework exercises
- Technology explorations (labs)
- e.g., comparison of sampling variability with
stratified sampling vs. simple random sampling - Student projects
- Student-generated research questions, data
collection plans, implementation, data analyses,
report
15Example 1 Friendly Observers
- Psychology experiment
- Butler and Baumeister (1998) studied the effect
of observer with vested interest on skilled
performance
A vested interest B no vested interest Total
Beat threshold 3 8 11
Do not beat threshold 9 4 13
Total 12 12 24
How often would such an extreme experimental
difference occur by chance, if there was no
vested interest effect?
16Example 1 Friendly Observers
- Students investigate this question through
- Hands-on simulation (playing cards)
- Computer simulation (Java applet)
- Mathematical model
- counting techniques
17Example 1 Friendly Observers
- Focus on statistical process
- Data collection, descriptive statistics,
inferential analysis - Arising from genuine research study
- Connection between the randomization in the
design and the inference procedure used - Scope of conclusions depends on study design
- Cause/effect inference is valid
- Use of simulation motivates the derivation of the
mathematical probability model - Investigate/answer real research questions in
first two weeks
18Example 2 Sleep Deprivation
- Physiology Experiment
- Stickgold, James, and Hobson (2000) studied the
long-term effects of sleep deprivation on a
visual discrimination task
(3 days later!)
sleep condition n Mean StDev Median IQR
deprived 11 3.90 12.17 4.50
20.7 unrestricted 10 19.82 14.73 16.55
19.53
How often would such an extreme experimental
difference occur by chance, if there was no sleep
deprivation effect?
19Example 2 Sleep Deprivation
- Students investigate this question through
- Hands-on simulation (index cards)
- Computer simulation (Minitab)
- Mathematical model
p-value.0072
p-value? .002
20Example 2 Sleep Deprivation
- Experience the entire statistical process again
- Develop deeper understanding of key ideas
(randomization, significance, p-value) - Tools change, but reasoning remains same
- Tools based on research study, question not for
their own sake - Simulation as a problem solving tool
- Empirical vs. exact p-values
21Example 3 Infants Social Evaluation
- Sociology study
- Hamlin, Wynn, Bloom (2007) investigated whether
infants would prefer a toy showing helpful
behavior to a toy showing hindering behavior - Infants were shown a video with these two kinds
of toys, then asked to select one - 14 of 16 10-month-olds selected helper
- Is this result surprising enough (under null
model of no preference) to indicate a genuine
preference for the helper toy?
22Example 3 Infants Social Evaluation
- Simulate with coin flipping
- Then simulate with applet
23Example 3 Infants Social Evaluation
- Then learn binomial distribution, calculate exact
p-value
24Example 3 Infants Social Evaluation
- Learn probability distribution to answer
inference question from research study - Again the analysis is completed with
- Tactile simulation
- Technology simulation
- Mathematical model
- Modeling process of statistical investigation
- Examination of methodology, further questions in
study - Follow-ups
- Different number of successes
- Different sample size
25Example 4 Sleepless Drivers
- Sociology case-control study
- Connor et al (2002) investigated whether those in
recent car accidents had been more sleep deprived
than a control group of drivers
No full nights sleep in past week At least one full nights sleep in past week Sample sizes
case drivers (crash) 61 510 571
control drivers (no crash) 44 544 588
26Example 4 Sleepless Drivers
Sample proportion that were in a car crash Sleep
deprived .581 Not sleep deprived .484
Odds ratio 1.48
How often would such an extreme observed odds
ratio occur by chance, if there was no sleep
deprivation effect?
27Example 4 Sleepless Drivers
- Students investigate this question through
- Computer simulation (Minitab)
- Empirical sampling distribution of odds-ratio
- Empirical p-value
- Approximate mathematical model
28Example 4 Sleepless Drivers
- SE(log-odds)
- Confidence interval for population log odds
- sample log-odds z SE(log-odds)
- Back-transformation
- 90 CI for odds ratio 1.05 2.08
29Example 4 Sleepless Drivers
- Students understand process through which they
can investigate statistical ideas - Students piece together powerful statistical
tools learned throughout the course to derive new
(to them) procedures - Concepts, applications, methods, theory
30For more information
- Investigating Statistical Concepts, Applications,
and Methods (ISCAM), Cengage Learning,
www.cengage.com - Instructor resources www.rossmanchance.com/iscam/
- Solutions to investigations, practice problems,
homework exercises - Instructors guide
- Sample syllabi
- Sample exams