Title: Probability and Statistics Todays Goals
1Probability and StatisticsTodays Goals
- Understand what constitutes an acceptable sample
for a scientific study - Understand the pitfalls in observational studies
- HW 10 (due Wed. April 22) Ch 5 27,68, 69 Plus
One web problem. - Article will be due April 24
2What is statistics?
- Statistics are numbers that summarize the results
of a study. - Statistical inference formalizes the process of
learning through observation. - Statistics is the field that studies how to
efficiently collect informative data, explore and
interpret these data and draw conclusions based
on them.
3Statistical Inference
- Designing Experiments
- What data should be collected?
- Sampling questions
- Analyzing data from experiments
- Sample statistics
- Pictorial representations
- Confidence intervals
- Hypothesis testing.
- Regressions for correlation analysis
4Example 1
- An engineer starting a company has developed a
new coating for pipes to resist corrosion. He
believes that it will perform better than
standard coatings and wants to convince clients
that the new coating is better. - He obtains a measure of corrosion by applying new
coating to one section of the pipe and standard
coating to a second section of the pipe. - First pipe (new coating) corrosion free 256 days,
- second pipe (standard coating) corrosion free 243
days. - Can we conclude that the new coating works better
than the old coating? - What would you do differently?
5What can we do to improve things?
- Collect multiple observations.
- So suppose we collect 100 observations of the
coating effectiveness. (Controlling for the
exposure for coated and non-coated) - We then compute the mean number of corrosion free
days for both treatments, the new coating and the
standard coating. - Suppose now that the mean number of corrosion
free days for the new coating is 250 and the mean
number of corrosion free days for the standard
coating is 242. - Can we conclude that the coating with the better
mean effectiveness will definitely outperform the
coating with the worse mean effectiveness?
6Example 2
- A parts' supplier tells us that only 1 out of
every 100 batteries shipped to us will have a
lifetime of less than 100 hours. - We are shipped a lot of 100,000 batteries.
- We could look at all of the batteries, but to do
so would cost us way too much money. - What might we decide to do?
7Example 2
- Suppose we decided to randomly sample 200
batteries. - And suppose 10 of the 200 batteries have life
less than 100 hours. - What can we say to the manufacturer that might
convince him or her that the shipment we received
does not meet the specifications? - In this case, we want to say something like the
following - If only one out of every 100 batteries has a
lifetime of less than 100 hours, then the
probability that in a random sample of 200
batteries 10 would have lifetimes less than 100
hours is very small, say less than 1 in 1000. - How could we have gotten such a bad sample?
8Notation
- Population collection of all potential
observations. - Sample the subset of the population that is
actually observed. A collection of observations. - Observing all units in the population (called
census) is usually expensive or infeasible. - Statistical inference extrapolates from a sample
to the population being sampled.
9Examples
- Population Diameters of all shafts in a lot.
- Sample Diameters of the shafts that are
actually measured. - Population Employment status of all eligible
adults in the US. - Sample Employment status of subjects who are
interviewed. - Population Lifetimes of the items made by a
certain manufacturing process. - Sample Lifetimes of the subset of items tested
10Inferential Statistics
- Make inferences about a given population based on
a sample drawn from it. The inferences support
decisions. - E.g. Deciding whether or not to accept the lot
based on the diameters of the sample taken. - Issues
- The sample must be representative of the
population. - Random sample
- The inferences/decisions based on a sample may be
in error ?Sampling Error - Quantified by the tools of probability.
11Exercise 1
- A consumer magazine article asks,
- HOW SAFE IS THE AIR IN AIRPLANES? and then
says that its study of air quality is based on
measurements taken on 158 different flights of
U.S. based airlines. - Identify the population and the sample.
12Exercise 2
- For each of the following situations, identify
the population and the sample and comment on
whether you think that the sample is or isn't
representative. - A member of Congress wants to know what his
constituents think of proposed legislation on
health insurance. His staff reports that 228
letters have been received on the subject, of
which 193 oppose the legislation. - A machinery manufacturer purchases voltage
regulators from a supplier.There are reports that
variation in the output voltage of the regulators
is affecting the performance of the finished
products. To assess the quality of the supplier's
production, the manufacturer subjects a sample of
5 regulators from the last shipment to careful
laboratory analysis.
13Exercise 2
- For each of the following situations, identify
the population and the sample and comment on
whether you think that the sample is or isn't
representative. - A member of Congress wants to know what his
constituents think of proposed legislation on
health insurance. His staff reports that 228
letters have been received on the subject, of
which 193 oppose the legislation. - True if the sample is representative
- False if not
14Exercise 2
- For each of the following situations, identify
the population and the sample and comment on
whether you think that the sample is or isn't
representative. - A machinery manufacturer purchases voltage
regulators from a supplier.There are reports that
variation in the output voltage of the regulators
is affecting the performance of the finished
products. To assess the quality of the supplier's
production, the manufacturer subjects a sample of
5 regulators from the last shipment to careful
laboratory analysis. - True if sample is representative
- False if not
15Sampling
- In order for the inferences to be reliable, we
must have a representative sample - A haphazard or opportunistic sample is prone to
bias - Sampling designs
- Simple Random Sample (SRS)
- Independence the selection of one unit has no
influence on the selection of other units - Lack of bias each unit has the same chance of
being chosen - All possible samples are equally
likely.
16What might cause samples to not be random?
17What might cause samples to not be random?
- A professor uses students.
- A survey is done by phone.
- who might get missed?
- A survey is done on the internet.
- who might get missed?
- A survey is left in a doctors office.
- who might choose to complete or not complete?
18Sampling Designs
- Stratified random sample
- Divide the population into relatively homogeneous
subpopulations, called strata - Take a SRS from each stratum
- e.g. estimation of out-of-spec raw material can
be done by taking separate SRS from the materials
supplied by different vendors - More accurate estimate plus separate estimates
for each vendor
19Example
- You want to determine the relationship between
the weather and the number of children driven to
school in cars. - Is it better to observe everyday for a week or
- five Thursdays in a row?
- Why?
- What if you had the resources to observe on 50
days?
20Designing Experimentscontrols
- A key question in all scientific explorations is
compared to what? - Having a control is essential for interpreting
the results of a study. - A controlled study is one of the following
- A study containing a control group
- A study in which the investigator assigns
treatment and non-treatment (control).
21Designing Experimentscontrols
- A key question in all scientific explorations is
compared to what? - Having a control is essential for interpreting
the results of a study. - A controlled study is one of the following
- A study containing a control group
- A study in which the investigator assigns
treatment and non-treatment (control). - When there is no control, evidence is often
called anecdotal, meaning that, while it may be
true, it doesnt prove anything.
22Exercise 3
- Which of the following are based on a sample, and
not just anecdotal? - Seatbelts are a bad safety measure 10 drivers
saved their lives by being thrown free of their
car wrecks last year in Massachusetts. - Out of 102 customers at a quick-stop market, 31
made their purchases with a credit card. - Out of 100 cars manufactured in the night shift,
95 were fully defect-free. - Hoan says that adding a few drops of almond
flavoring makes her cookies taste better.
23Anecdotal vs sample
- Seatbelts are a bad safety measure 10 drivers
saved their lives by being thrown free of their
car wrecks last year in Massachusetts. - What kind of data would we want to look at to see
if seatbelts are a good idea or not?
24Designing ExperimentsRandomized versus
Observational
- In a randomized study the researcher assigns
treatment randomly. - In an observational study, the researcher
observes differences between two groups
treatment and control but does not assign
treatment.
25Designing ExperimentsRandomized versus
Observational
- Observational studies can be very hard to
interpret, since the subjects have often
self-selected into the treatment and control
groups.
26Observational Studies Example 1
- Pet a day keeps the doctor away
- Those who owned pets had contact with a doctor
8.42 times in a year. - Those who did not own pets had contact 9.49
times. - What interpretation is being made?
- Could there be another interpretation?
27Observational Studies Example 1
- Pet a day keeps the doctor away
- Those who owned pets had contact with a doctor
8.42 times in a year. - Those who did not own pets had contact 9.49
times. - Does Pet cause good health or
- Does good health cause pet or
- Does something else cause both?
28?
Good Health
Pet Ownership
29?
Good Health
Pet Ownership
30A desire to care for others
Good Health
?
Pet Ownership
31Do storks bring babies?
?
Human baby births
Stork Nests
32Do babies causing stork nests?
?
Human baby births
Stork Nests
33Are babies born (and stork nests built) in
particular months?
month of year
?
Human baby births
Stork Nests
34Observational Studies Example 2
- Likely to have a nervous breakdown?
35Observational Studies Example 2
- Likely to have a nervous breakdown?
Does marriage make women crazy? Do crazy women
make a point of getting married? Is there a 3rd
issue, say rich woman are more likely to marry
and more likely to be crazy.
36Observational Studies Example 3
- A common finding is that African Americans do not
go to college at the same rates as the general
population. - It has been hypothesized that this could be
explained by income. - But, it is consistently found that African
Americans do not go to college at the same rates,
even controlling for income. - Discussion over?