Concepts of Probability and Statistics Statistical Intervals - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Concepts of Probability and Statistics Statistical Intervals

Description:

However, alternative notion is that coin is 2-headed is well supported by evidence at hand. The key is to determine which notion is best supported by the evidence. 10 ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 39
Provided by: leona6
Category:

less

Transcript and Presenter's Notes

Title: Concepts of Probability and Statistics Statistical Intervals


1
Concepts of Probability and StatisticsStatistic
al Intervals
2
Probability and Statistics - Best Illustrated
with an example
  • Suppose I flip a coin 10 times and get 8H, 2T,
    but I dont know the coin is really fair. What
    is my chance of tossing heads on the next throw?
  • A) If I only have sample information at my
    disposal, I estimate 80.
  • B) If I consider the total population of
    possible tosses, I would say 50.
  • These two cases illustrate the basic difference
    between probability and statistics.

3
  • Probability deals with the likelihood of
    observing an event arising from a known process
    (model).
  • i) Based on deductive reasoning
  • ii) Begin with knowing something about the
    population (e.g. that the coin is fair)
  • iii) Reason from population to the sample (e.g.
    how likely will the next toss be H?)
  • Statistics approaches the problem backwards
    given a collection of observed data (the sample),
    one of many possible subsets of data, what can be
    said about the process (the entire population of
    data)?
  • i) Based on inductive reasoning
  • ii) Begin with knowing only about the sample
    results, not the population (e.g. 8H and 2T)
  • iii) Reason from sample to population (e.g. is
    the coin fair based on sample results.

4
  • Since we dont usually know the behaviour of the
    overall population, but only about the sample
    results we collect, we are forced to make
    statistical inferences in order to estimate
    typical behaviour in the population from the
    sample results.
  • As the coin tossing example illustrates, even if
    the average behaviour of the coin is to land on H
    50 of the time, specific sample results will
    vary from experiment to experiment.
  • So to adequately estimate or predict the coins
    true average behaviour, one must carefully
    account for between-sample variability.

5
(No Transcript)
6
  • In an environmental pollution setting, the sample
    results will vary from period to period even if
    no contamination has occurred. Why?
  • Variation in lab measurements of concentration of
    individual samples.
  • Sampling variability from field collection and
    handling.
  • Natural variation in background levels of
    pollutants.
  • These factors combine to give random variation in
    sample results that will be seen whether or not
    contamination has occurred.
  • Despite the sample fluctuations due to random
    variation, we still want to know for example if
    compliance concentrations are significantly
    higher than background concentrations on average.

7
  • Note that the degree of sample fluctuation in
    background and compliance point data relative to
    the difference in average background and
    compliance point concentration levels plays a
    crucial role in distinguishing background
    behaviour from compliance point behaviour.
  • Only by careful measurement of sample variability
    can we accurately make statistical inferences
    about the behaviour of the overall population.
  • For example, consider the question
  • Is the long-term average concentration level at
    the compliance point greater than the
    background levels?

8
  • One way to answer the above question is to set
    up a hypothesis test using the results of sample
    groundwater analyses.
  • A hypothesis test makes a decision as to which of
    two competing notions is closer to reality or
    truth.
  • It is a type of statistical inference that reason
    from sample results back to the population .
  • It is used in environmental monitoring settings
    since samples are costly to analyze only limited
    data are typically available for statistical
    purposes.

9
  • Example Flip a coin 100 times and get all
    heads.
  • What do we decide about the coin? What is the
    chance of getting heads on the next toss?
  • Answer Chance is 100. Why? Because coin is
    almost certainly two-headed!
  • Notion being tested is whether the coin is fair
    or not.
  • If we say it is fair, what evidence can we use to
    support our claim?
  • Prob (100H in 100 tosses of fair coin) (1/2)100
    approx.. zero.
  • However, alternative notion is that coin is
    2-headed is well supported by evidence at hand.
  • The key is to determine which notion is best
    supported by the evidence.

10
  • In an environmental setting (monitoring and
    remediation), first make sure that the hypothesis
    being tested is appropriate.
  • In detection monitoring, this becomes Ho No
    contamination versus Ha evidence of
    contamination. E.g. innocent until proven
    guilty
  • In remediation, the null hypothesis changes to
    Ho guilty until proven innocent or dirty
    until proven clean.
  • Then choose a statistical test that measures
    whether the sample data side better with Ho or Ha

11
Point Estimation
  • The sample median and sample mean estimate the
    corresponding center points of a population.
    Such estimates are called point estimates.
  • For example, point estimators for the 100-year
    flood might be
  • a) the largest flood which occurred during 100
    years or record.
  • b) Q0.99 meanQ stdQ x Z0.99, using the
    mean and standard deviation of the flood record
    (assumes a normal distribution of the Qs).
  • c) Q0.99 expmeanlogQ stdlogQ x K0.99,
    where K0.99 is the P3 distribution frequency
    factor for a skewness g.
  • d) Q0.99 from regional equations.

12
Things to Consider in Selecting an Estimator
  • It should have little or no BIAS
  • It should have low MEAN SQUARE ERROR.
  • It should be RESISTANT. I.e. not affected by a
    few unusual values.
  • It should be ROBUST. I.e. Its MSE should
    compare favourably with wide range of assumptions
    (e.g. distribution).
  • It should be REPRODUCIBLE. Others should be able
    to repeat the calculation with no difference in
    results.

13
Interval Estimation
  • We want to estimate a statistical interval
    because a point estimate tells us nothing about
    the variability of the statistic. Since any
    statistic is itself a random variable, it is thus
    very important to know how it might fluctuate.
  • E.g. There is a big difference between
  • 20 ppm ? 10 ppm and 20 ppm ? 2 ppm.
  • Interval estimates are intervals which have a
    stated probability of containing the true
    population value.
  • The intervals are wider for data sets having
    greater variability.

14
  • Interval estimates can provide three pieces of
    information which point estimates cannot
  • 1. A statement of the probability or likelihood
    that the interval contains the true population
    value (its reliability).
  • - confidence intervals.
  • 2. A statement of the likelihood that a single
    data point with specified magnitude comes from
    the population under study.
  • - prediction intervals.
  • 3. A statement of the likelihood that the
    interval contains a certain proportion of all
    population values.
  • - tolerance intervals.

15
Differences Between the Interval Types
  • Statistical intervals have different uses
    depending on the purpose in mind
  • Astronaut example
  • An astronaut awaiting his tour of duty on the
    space shuttle is not concerned about what happens
    on average during such flights (confidence
    interval), nor what happens 95 of all flights
    (tolerance interval), but rather with what will
    happen on his or her specific flights (prediction
    interval).

16
  • Casino example
  • A player is concerned with what he or she will
    win on the next few bets (prediction interval)
    the casino owners care about their average
    winnings in order to make a profit (confidence
    interval) while the roulette operator who makes
    a commission on each bet lost by a player is
    concerned about long-run proportion of lost bets
    (tolerance interval).
  • Remember, the type of interval used can make a
    huge difference in the resulting decision -- in
    general, the widths of confidence, tolerance, and
    prediction intervals will be very different on
    the same sample data.

17
  • The width of the interval indicates the amount of
    potential error or variability associated with
    the sample average.
  • The width depends on three factors
  • i. Estimated standard deviation of sample data,
    s.
  • ii. Level of confidence chosen beforehand.
  • iii. Sample size, n.
  • To reduce the width of a random interval, either
  • i. Increase the sample size, or
  • ii. Lower the acceptable confidence level (e.g.
    from 95 to 90).

18
Parametric, Non-parametric, Symmetric and
Asymmetric Intervals
  • Parametric
  • assumes data (or transformed data) are normally
    distributed.
  • Non-parametric
  • distribution free (based on ranks).
  • Symmetric
  • interval divided equally on either side of true
    value. Usually used with parametric intervals.
  • Asymmetric
  • Interval not divided equally on either side of
    true value. It is used with skewed data (e.g.
    lognormal data).

19
Sampling Distributions
  • Probability of statistics e.g. mean, median,
    stdev, etc. Used for constructing statistical
    intervals.

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
  • While the previous statement is true, when
    dealing with environmental data however, the
    distribution of data are usually positively
    skewed. In these cases, use of the
    t-distribution may not be appropriate.
  • Sometimes the t-distribution can be used after a
    suitable transformation of the data is made. E.g.
    after a log transform.
Write a Comment
User Comments (0)
About PowerShow.com