Chapter 5 Sampling and Statistics - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Chapter 5 Sampling and Statistics

Description:

5.1 Sampling and Statistics. Typical statistical problem: We have a random variable X with pdf f(x) or pmf p(x) unknown. Either f(x) and p(x) are completely unknown. – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 34
Provided by: westgaEdu
Learn more at: https://www.westga.edu
Category:

less

Transcript and Presenter's Notes

Title: Chapter 5 Sampling and Statistics


1
Chapter 5Sampling and Statistics
  • Math 6203
  • Fall 2009
  • Instructor Ayona Chatterjee

2
5.1 Sampling and Statistics
  • Typical statistical problem
  • We have a random variable X with pdf f(x) or pmf
    p(x) unknown.
  • Either f(x) and p(x) are completely unknown.
  • Or the form of f(x) or p(x) is known down to the
    parameter ?, where ? may be a vector.
  • Here we will consider the second option.
  • Example X has an exponential distribution with ?
    unknown.

3
  • Since ? is unknown, we want to estimate it.
  • Estimation is based on a sample.
  • We will formalize the sampling plan
  • Sampling with replacement.
  • Each draw is independent and Xs have the same
    distribution.
  • Sampling without replacement.
  • Each draw is not independent but Xs still have
    the same distribution.

4
Random Sample
  • The random variables X1, X2, ., Xn constitute a
    random sample on the random variable X if they
    are independent and each has the same
    distribution as X. We will abbreviate this by
    saying that X1, X2, ., Xn are iid i.e.
    independent and identically distributed.
  • The joint pdf can be given as

5
Statistic
  • Suppose the n random variables X1, X2, ., Xn
    constitute a sample from the distribution of a
    random variable X. Then any function TT(X1, X2,
    ., Xn ) of the sample is called a statistic.
  • A statistic, TT(X1, X2, ., Xn ), may convey
    information about the unknown parameter ?. We
    call the statistics a point estimator of ?.

6
5.2 Order Statistics
7
Notation
  • Let X1 , X2 , .Xn denote a random sample from a
    distribution of the continuous type having a pdf
    f(x) that has a support S (a, b), where -8 alt
    xlt b 8. Let Y1 be the smallest of these Xi, Y2
    the next Xi in order of magnitude,., and Yn the
    largest of the Xi. That is Y1 lt Y2 lt ltYn
    represent X1 , X2 , .Xn, when the latter is
    arranged in ascending order of magnitude. We call
    Yi the ith order statistic of the random sample
    X1 , X2 , .Xn.

8
Theorem 5.2.1
  • Let Y1 lt Y2 lt ltYn denote the n order statistics
    based on the random sample X1 , X2 , .Xn from a
    continuous distribution with pdf f(x) and support
    (a,b). Then the joint pdf of Y1 , Y2 , .Yn is
    given by,

9
Note
  • The joint pdf of any two order statistics, say
  • Yi lt Yj can be written as

10
Note
  • Yn - Y1 is called the range of the random
    sample.
  • (Y1 Yn )/2 is called the mid-range
  • If n is odd then Y(n1)/2 is called the median
    of the random sample

11
5.4 more on confidence intervals
12
The Statistical Problem
  • We have a random variable X with density f(x,?),
    where ? is unknown and belongs to the family of
    parameters O.
  • We estimate ? with some statistics T, where T is
    a function of the random sample X1 , X2 , .Xn.
  • It is unlikely that value of T gives the true
    value of ?.
  • If T has a continuous distribution then P(T
    ?)0.
  • What is needed is an estimate of the error of
    estimation.
  • By how much did we miss ??

13
Central Limit Theorem
  • Let ?0 denote the true, unknown value of the
    parameter ?. Suppose T is an estimator of ? such
    that
  • Assume that sT2 is known.

14
Note
  • When s is unknown we use s(sample standard
    deviation) to estimate it.
  • We have a similar interval as obtained before
    with the s replaced with st.
  • Note t is the value of the statistic T.

15
Confidence Interval for Mean µ
  • Let X1 , X2 , .Xn be a random sample from the
    distribution with unknown mean µ and unknown
    standard deviation s.

16
Note
  • We can find confidence intervals for any
    confidence level.
  • Let Za/2 as the upper a/2 quantile of a standard
    normal variable.
  • Then the approximate (1- a)100 confidence
    interval for ?0 is

17
Confidence Interval for Proportions
  • Let X be a Bernoulli random variable with
    probability of success p.
  • Let X1 , X2 , .Xn be a random sample from the
    distribution of X.
  • Then the approximate (1- a)100 confidence
    interval for p is

18
5.5 Introduction to Hypothesis Testing
19
Introduction
  • Our interest centers on a random variable X which
    has density function f(x,?), where ? belongs to
    O.
  • Due to theory or preliminary experiment, suppose
    we believe that

20
  • The hypothesis H0 is referred to as the null
    hypothesis while H1 is referred to as the
    alternative hypothesis.
  • The null hypothesis represents no change.
  • The alternative hypothesis is referred to the as
    research workers hypothesis.

21
Error in Hypothesis Testing
  • The decision rule to take H0 or H1 is based on a
    sample X1 , X2 , .Xn from the distribution of X
    and hence the decision could be wrong.

True State of Nature True State of Nature
Decision Ho is true H1 is true
Reject Ho Type I Error Correct Decision
Accept Ho Correct Decision Type II Error
22
  • The goal is to select a critical region from all
    possible critical regions which minimizes the
    probabilities of these errors.
  • In general this is not possible, the
    probabilities of these errors have a see-saw
    effect.
  • Example if the critical region is F, then we
    would never reject the null so the probability of
    type I error would be zero but then probability
    of type II error would be 1.
  • Type I error is considered the worse of the two.

23
Critical Region
  • We fix the probability of type I error and we try
    and select a critical region that minimizes type
    II error.
  • We saw critical region C is of size a if
  • Over all critical regions of size a, we want to
    consider critical regions which have lower
    probabilities of Type II error.

24
  • We want to maximize
  • The probability on the right hand side is called
    the power of the test at ?.
  • It is the probability that the test detects the
    alternative ? when ? belongs to w1 is the true
    parameter.
  • So maximizing power is the same as minimizing
    Type II error.

25
Power of a test
  • We define the power function of a critical region
    to be
  • Hence given two critical regions C1 and C2 which
    are both of size a, C1 is better than C2 if

26
Note
  • Hypothesis of the form H0 p p0 is called
    simple hypothesis.
  • Hypothesis of the form H1 p lt p0 is called a
    composite hypothesis.
  • Also remember a is called the significance level
    of the test associated with that critical region.

27
Test Statistics for Mean
28
5.7 Chi-Square Tests
29
Introduction
  • Originally proposed by Karl Pearson in 1900
  • Used to check for goodness of fit and
    independence.

30
Goodness of fit test
  • Consider the simple hypothesis
  • H0 p1 p10 , p2 p20 , , pk-1 pk-1,0
  • If the hypothesis H0 is true, the random variable
  • Has an approximate chi-square distribution with
    k-1 degrees of freedom.

31
Test for Independence
  • Let the result of a random experiment be
    classified by two attributes.
  • Let Ai denote the outcomes of the first kind and
    Bj denote the outcomes for the second kind.
  • Let pij P(Ai Bj )
  • The random experiment is said to be repeated n
    independent times and Xij will denote the
    frequencies of an event in Ai Bj

32
(No Transcript)
33
  • The random variable
  • Has an approximate chi-square distribution with
    (a-1)(b-1) degrees of freedom provided n is
    large.
Write a Comment
User Comments (0)
About PowerShow.com