Title: Continuous Probability Distributions
1Continuous Probability Distributions
- Uniform Probability Distribution
- Normal Probability Distribution
- Exponential Probability Distribution
- Other continuous probability distributions
2Continuous Probability Distributions
- A continuous random variable can assume any value
in an interval on the real line or in a
collection of intervals. - It is not possible to talk about the probability
of the random variable assuming a particular
value. - Instead, we talk about the probability of the
random variable assuming a value within a given
interval. - The probability of the random variable assuming a
value within some given interval from x1 to x2 is
defined to be the area under the graph of the
probability density function between x1 and x2.
3Uniform Probability Distribution
- A random variable is uniformly distributed
whenever the probability is proportional to the
intervals length. - Uniform Probability Density Function
-
- f(x) 1/(b - a) for a lt x lt b
- 0 elsewhere
- where
- a smallest value the variable can assume
- b largest value the variable can assume
4Uniform Probability Distribution
- Expected Value of x
-
- E(x) (a b)/2
- Variance of x
- Var(x) (b - a)2/12
-
- where
- a smallest value the variable can assume
- b largest value the variable can assume
5Graph of the Normal Probability Density Function
6Characteristics of the Normal Probability
Distribution
- The shape of the normal curve is often
illustrated as a bell-shaped curve. - Two parameters, m (mean) and s (standard
deviation), determine the location and shape of
the distribution. - The highest point on the normal curve is at the
mean, which is also the median and mode. - The mean can be any numerical value negative,
zero, or positive. - The normal curve is symmetric.
- The standard deviation determines the width of
the curve larger values result in wider, flatter
curves. - The total area under the curve is 1 (.5 to the
left of the mean and .5 to the right). - Probabilities for the normal random variable are
given by areas under the curve.
7Normal Probability Density Function
- where
- ? mean
- ? standard deviation
- ? 3.14159
- e 2.71828
8Standard Normal Probability Distribution
- A random variable that has a normal distribution
with a mean of zero and a standard deviation of
one is said to have a standard normal probability
distribution. - The letter z is commonly used to designate this
normal random variable. - Converting to the Standard Normal Distribution
- We can think of z as a measure of the number of
standard deviations x is from ?.
9Exponential Probability Density Function
- where µ mean e 2.71828
- xgt0
- The exponential distribution is commonly used to
measure time between events occurring. -
10The Gamma Distribution
- The Gamma distribution is an extension to the
exponential distribution - where xgt0, agt0, ßgt0
- ?(a)(a-1)! for a1,2,..
- The Chi-squared distribution, ? a2, is closely
related to the gamma distribution
11Student t distribution
- The student t distribution is closely related to
the normal and gamma distributions and plays and
important role in certain statistical testing
procedures.
12- Sampling and Sampling Distributions
- Simple Random Sampling
- Point Estimation
- Introduction to Sampling Distributions
- Sampling Distribution of
- Sampling Distribution of p
- Interval Estimation
- Interval estimation of a population mean large
and small sample cases - Determining the sample size
- Interval estimation of a population proportion
- Hypothesis Testing
- Tests about a population mean large and small
sample cases - Tests about a population proportion
13Statistical Inference
- The purpose of statistical inference is to obtain
information about a population from information
contained in a sample. - A population is the set of all the elements of
interest. - A sample is a subset of the population.
- The sample results provide only estimates of the
values of the population characteristics. - A parameter is a numerical characteristic of a
population. - With proper sampling methods, the sample results
will provide good estimates of the population
characteristics
14Point Estimation
- In point estimation we use the data from the
sample to compute a value of a sample statistic
that serves as an estimate of a population
parameter. - We refer to as the point estimator of the
population mean ?. - s is the point estimator of the population
standard deviation ?. - p is the point estimator of the population
proportion ?.
15Sampling Distribution of p
- The sampling distribution of p is the probability
distribution of all possible values of the sample
proportion - Expected Value of p
- E( p ) ?
-
- where ? the population proportion
- Standard Deviation of p
- sp is referred to as the standard error of the
proportion.
16Sampling Distribution of p
- The sampling distribution of p can be
approximated by a normal distribution whenever
the sample size is large. - A sample can be considered large when both np 5
and n(1-p) 5.
17Interval Estimation
- A point estimate only gives us a single estimate
for population parameter and does not take into
account the variability in the data or the sample
size. - The standard error of the sampling distribution
of x is a measure of the reliability or precision
of x as an estimate of µ. - The standard error can be used to construct a
confidence interval for the population mean, µ. - Confidence - the level of confidence that the
interval will contain µ. - The confidence level is normally set at 95.
1895 Confidence Interval for a Population Mean,
µLarge Sample Case (n gt 30)
With s unknown where x is the sample
mean s is the sample standard
deviation n is the sample size
1995 Confidence Interval for a Population Mean,
µSmall Sample Case (n lt 30)
- Population is Not Normally Distributed
- The only option is to increase the sample size to
n gt 30 and use the large sample interval
estimation - procedures.
- Population is Normally Distributed and ? is
Unknown - The appropriate interval estimate is based on the
t distribution.
20t Distribution
- The t distribution is a family of similar
probability distributions. - A specific t distribution depends on a parameter
known as the degrees of freedom. - (e.g., for a problem with 20 elements, degrees of
freedomdf20-119) - As the number of degrees of freedom increases,
the difference between the t distribution and
the standard normal probability distribution
becomes smaller and smaller. - A t distribution with more degrees of freedom
has less dispersion. - The mean of the t distribution is zero.
2195 Confidence Interval for a Population Mean,
µSmall Sample Case (n lt 30)
- where x is the sample mean
- s is the sample standard
- deviation
- n is the sample size
22Hypothesis Testing
- Hypothesis testing can be used to determine
whether a statement about the value of a
population parameter should or should not be
rejected. - The null hypothesis, denoted by H0 , is a
tentative assumption about a population
parameter. - The alternative hypothesis, denoted by H1, is the
opposite of what is stated in the null hypothesis
and is often referred to as the hypothesis of
interest.
23Null and Alternative Hypotheses about a
Population Mean
- The equality part of the hypotheses always
appears in the null hypothesis. - A hypothesis test about the value of a population
mean ?? usually takes the following form (where
?0 is the hypothesised value of the population
mean). - H0 ? ?0
- H1 ? ? ?0
24The Steps of Hypothesis Testing
- Determine the appropriate hypotheses.
- Select the test statistic for deciding whether or
not to reject the null hypothesis. - Collect the sample data
- Use a statistical package to compute the test
statistic and the p-value - If p-valuelt0.05 then reject H0
- Make sensible conclusions based on the decision
to reject H0 or not.
25Tests about a Population Mean Large-Sample Case
(n gt 30)
- Test Statistic
- Rejection rule
- Reject H0 if Zgt1.96 or Zlt-1.96
26Tests about a Population MeanSmall-Sample Case
(n lt 30)
- Assuming that the population is normally
distributed - Test Statistic
- Rejection rule
- Reject H0 if TgttQ2.5,n-1 or Tlt-tQ2.5,n-1
27The Use of p-Values
- The p value is the probability of obtaining a
sample result that is at least as unlikely as
what is observed. - The p value can be used to make the decision in a
hypothesis test - Reject H0 if the p value lt 0.05
28Null and Alternative Hypotheses for a Population
Proportion
- The equality part of the hypotheses always
appears in the null hypothesis. - In general, a hypothesis test about the value of
a population proportion p usually takes the
following form (where p0 is the hypothesized
value of the population proportion). - H0 p p 0
- H1 p ? p 0
29Example
- The following nucleotide distribution was
observed -
- Question 1 Compute estimates of the nucleotide
probabilities - P(A) 2000/8100 0.247
- P(C) 2100/8100 0.259
- P(G) 1500/8100 0.185
- P(T) 2500/8100 0.309
- Question 2 Test the null hypothesis that the
nucleotide probabilities are equal, that is - H0 p1 p2 p3 p4 ¼
- using a goodness of fit test based on the
chi-square distribution
30- Given that
- H0 p1 p2 p3 p4 ¼,
- expected counts for A, C, G, and T are
8100/42025 - For Chi-square Test, the following expression is
used to find a critical value - eexpected value (e.g., mean value)
- o actual value
- X2 (2000-2025) 2/2025 (2100-2025) 2/2025
(1500-2025) 2/2025 (2500-2025)2/2025 - X2 250.62
- As we have 4 elements,
- Degrees of freedom (df) 4 - 1 3
- from the Chi-Square Distributions table (see next
page), the critical value for 95 confidence
interval (or 5 significance level alpha) is
found to be 7.8147 (Note that if a confidence
level is not specified in the question, you may
consider this as 95) - Conclusion Reject H0 at 5 significance level as
X2 is higher than the critical value. There is
evidence that the probabilities are not all
equal. There are more A and T than expected. It
can be seen from the Question 1
31(No Transcript)