Probability and Statistics Todays Goals - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Probability and Statistics Todays Goals

Description:

Are babies born (and stork nests built) in particular months? Human baby births. Stork Nests. month of year. 34. Observational Studies: Example 2 ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 37
Provided by: erinb45
Category:

less

Transcript and Presenter's Notes

Title: Probability and Statistics Todays Goals


1
Probability and StatisticsTodays Goals
  • Understand what constitutes an acceptable sample
    for a scientific study
  • Understand the pitfalls in observational studies
  • HW 10 (due Wed. April 22) Ch 5 27,68, 69 Plus
    One web problem.
  • Article will be due April 24

2
What is statistics?
  • Statistics are numbers that summarize the results
    of a study.
  • Statistical inference formalizes the process of
    learning through observation.
  • Statistics is the field that studies how to
    efficiently collect informative data, explore and
    interpret these data and draw conclusions based
    on them.

3
Statistical Inference
  • Designing Experiments
  • What data should be collected?
  • Sampling questions
  • Analyzing data from experiments
  • Sample statistics
  • Pictorial representations
  • Confidence intervals
  • Hypothesis testing.
  • Regressions for correlation analysis

4
Example 1
  • An engineer starting a company has developed a
    new coating for pipes to resist corrosion. He
    believes that it will perform better than
    standard coatings and wants to convince clients
    that the new coating is better.
  • He obtains a measure of corrosion by applying new
    coating to one section of the pipe and standard
    coating to a second section of the pipe.
  • First pipe (new coating) corrosion free 256 days,
  • second pipe (standard coating) corrosion free 243
    days.
  • Can we conclude that the new coating works better
    than the old coating?
  • What would you do differently?

5
What can we do to improve things?
  • Collect multiple observations.
  • So suppose we collect 100 observations of the
    coating effectiveness. (Controlling for the
    exposure for coated and non-coated)
  • We then compute the mean number of corrosion free
    days for both treatments, the new coating and the
    standard coating.
  • Suppose now that the mean number of corrosion
    free days for the new coating is 250 and the mean
    number of corrosion free days for the standard
    coating is 242.
  • Can we conclude that the coating with the better
    mean effectiveness will definitely outperform the
    coating with the worse mean effectiveness?

6
Example 2
  • A parts' supplier tells us that only 1 out of
    every 100 batteries shipped to us will have a
    lifetime of less than 100 hours.
  • We are shipped a lot of 100,000 batteries.
  • We could look at all of the batteries, but to do
    so would cost us way too much money.
  • What might we decide to do?

7
Example 2
  • Suppose we decided to randomly sample 200
    batteries.
  • And suppose 10 of the 200 batteries have life
    less than 100 hours.
  • What can we say to the manufacturer that might
    convince him or her that the shipment we received
    does not meet the specifications?
  • In this case, we want to say something like the
    following
  • If only one out of every 100 batteries has a
    lifetime of less than 100 hours, then the
    probability that in a random sample of 200
    batteries 10 would have lifetimes less than 100
    hours is very small, say less than 1 in 1000.
  • How could we have gotten such a bad sample?

8
Notation
  • Population collection of all potential
    observations.
  • Sample the subset of the population that is
    actually observed. A collection of observations.
  • Observing all units in the population (called
    census) is usually expensive or infeasible.
  • Statistical inference extrapolates from a sample
    to the population being sampled.

9
Examples
  • Population Diameters of all shafts in a lot.
  • Sample Diameters of the shafts that are
    actually measured.
  • Population Employment status of all eligible
    adults in the US.
  • Sample Employment status of subjects who are
    interviewed.
  • Population Lifetimes of the items made by a
    certain manufacturing process.
  • Sample Lifetimes of the subset of items tested

10
Inferential Statistics
  • Make inferences about a given population based on
    a sample drawn from it. The inferences support
    decisions.
  • E.g. Deciding whether or not to accept the lot
    based on the diameters of the sample taken.
  • Issues
  • The sample must be representative of the
    population.
  • Random sample
  • The inferences/decisions based on a sample may be
    in error ?Sampling Error
  • Quantified by the tools of probability.

11
Exercise 1
  • A consumer magazine article asks,
  • HOW SAFE IS THE AIR IN AIRPLANES? and then
    says that its study of air quality is based on
    measurements taken on 158 different flights of
    U.S. based airlines.
  • Identify the population and the sample.

12
Exercise 2
  • For each of the following situations, identify
    the population and the sample and comment on
    whether you think that the sample is or isn't
    representative.
  • A member of Congress wants to know what his
    constituents think of proposed legislation on
    health insurance. His staff reports that 228
    letters have been received on the subject, of
    which 193 oppose the legislation.
  • A machinery manufacturer purchases voltage
    regulators from a supplier.There are reports that
    variation in the output voltage of the regulators
    is affecting the performance of the finished
    products. To assess the quality of the supplier's
    production, the manufacturer subjects a sample of
    5 regulators from the last shipment to careful
    laboratory analysis.

13
Exercise 2
  • For each of the following situations, identify
    the population and the sample and comment on
    whether you think that the sample is or isn't
    representative.
  • A member of Congress wants to know what his
    constituents think of proposed legislation on
    health insurance. His staff reports that 228
    letters have been received on the subject, of
    which 193 oppose the legislation.
  • True if the sample is representative
  • False if not

14
Exercise 2
  • For each of the following situations, identify
    the population and the sample and comment on
    whether you think that the sample is or isn't
    representative.
  • A machinery manufacturer purchases voltage
    regulators from a supplier.There are reports that
    variation in the output voltage of the regulators
    is affecting the performance of the finished
    products. To assess the quality of the supplier's
    production, the manufacturer subjects a sample of
    5 regulators from the last shipment to careful
    laboratory analysis.
  • True if sample is representative
  • False if not

15
Sampling
  • In order for the inferences to be reliable, we
    must have a representative sample
  • A haphazard or opportunistic sample is prone to
    bias
  • Sampling designs
  • Simple Random Sample (SRS)
  • Independence the selection of one unit has no
    influence on the selection of other units
  • Lack of bias each unit has the same chance of
    being chosen
  • All possible samples are equally
    likely.

16
What might cause samples to not be random?
17
What might cause samples to not be random?
  • A professor uses students.
  • A survey is done by phone.
  • who might get missed?
  • A survey is done on the internet.
  • who might get missed?
  • A survey is left in a doctors office.
  • who might choose to complete or not complete?

18
Sampling Designs
  • Stratified random sample
  • Divide the population into relatively homogeneous
    subpopulations, called strata
  • Take a SRS from each stratum
  • e.g. estimation of out-of-spec raw material can
    be done by taking separate SRS from the materials
    supplied by different vendors
  • More accurate estimate plus separate estimates
    for each vendor

19
Example
  • You want to determine the relationship between
    the weather and the number of children driven to
    school in cars.
  • Is it better to observe everyday for a week or
  • five Thursdays in a row?
  • Why?
  • What if you had the resources to observe on 50
    days?

20
Designing Experimentscontrols
  • A key question in all scientific explorations is
    compared to what?
  • Having a control is essential for interpreting
    the results of a study.
  • A controlled study is one of the following
  • A study containing a control group
  • A study in which the investigator assigns
    treatment and non-treatment (control).

21
Designing Experimentscontrols
  • A key question in all scientific explorations is
    compared to what?
  • Having a control is essential for interpreting
    the results of a study.
  • A controlled study is one of the following
  • A study containing a control group
  • A study in which the investigator assigns
    treatment and non-treatment (control).
  • When there is no control, evidence is often
    called anecdotal, meaning that, while it may be
    true, it doesnt prove anything.

22
Exercise 3
  • Which of the following are based on a sample, and
    not just anecdotal?
  • Seatbelts are a bad safety measure 10 drivers
    saved their lives by being thrown free of their
    car wrecks last year in Massachusetts.
  • Out of 102 customers at a quick-stop market, 31
    made their purchases with a credit card.
  • Out of 100 cars manufactured in the night shift,
    95 were fully defect-free.
  • Hoan says that adding a few drops of almond
    flavoring makes her cookies taste better.

23
Anecdotal vs sample
  • Seatbelts are a bad safety measure 10 drivers
    saved their lives by being thrown free of their
    car wrecks last year in Massachusetts.
  • What kind of data would we want to look at to see
    if seatbelts are a good idea or not?

24
Designing ExperimentsRandomized versus
Observational
  • In a randomized study the researcher assigns
    treatment randomly.
  • In an observational study, the researcher
    observes differences between two groups
    treatment and control but does not assign
    treatment.

25
Designing ExperimentsRandomized versus
Observational
  • Observational studies can be very hard to
    interpret, since the subjects have often
    self-selected into the treatment and control
    groups.

26
Observational Studies Example 1
  • Pet a day keeps the doctor away
  • Those who owned pets had contact with a doctor
    8.42 times in a year.
  • Those who did not own pets had contact 9.49
    times.
  • What interpretation is being made?
  • Could there be another interpretation?

27
Observational Studies Example 1
  • Pet a day keeps the doctor away
  • Those who owned pets had contact with a doctor
    8.42 times in a year.
  • Those who did not own pets had contact 9.49
    times.
  • Does Pet cause good health or
  • Does good health cause pet or
  • Does something else cause both?

28
?
Good Health
Pet Ownership
29
?
Good Health
Pet Ownership
30
A desire to care for others
Good Health
?
Pet Ownership
31
Do storks bring babies?
?
Human baby births
Stork Nests
32
Do babies causing stork nests?
?
Human baby births
Stork Nests
33
Are babies born (and stork nests built) in
particular months?
month of year
?
Human baby births
Stork Nests
34
Observational Studies Example 2
  • Likely to have a nervous breakdown?

35
Observational Studies Example 2
  • Likely to have a nervous breakdown?

Does marriage make women crazy? Do crazy women
make a point of getting married? Is there a 3rd
issue, say rich woman are more likely to marry
and more likely to be crazy.
36
Observational Studies Example 3
  • A common finding is that African Americans do not
    go to college at the same rates as the general
    population.
  • It has been hypothesized that this could be
    explained by income.
  • But, it is consistently found that African
    Americans do not go to college at the same rates,
    even controlling for income.
  • Discussion over?
Write a Comment
User Comments (0)
About PowerShow.com