Title: Simple Comparative Experiments
1Simple Comparative Experiments
2ProbabilityDistributions
3Discrete Random Variable
- P(Xx) denotes the probability of an event, or
the probability that X assumes the value x
4Probability Mass Function
- For a discrete random variable X with possible
values x1, x2, , xn, the probability mass
function is - f (xi) P(Xxi)
- Because f (xi) is defined as a probability, f(xi)
? 0 for all xi and the sum of f (xi) from i 1
to n is equal to 1.
5Data Distribution Displays
6Cumulative Distribution
- The cumulative distribution function of a random
discrete variable X, denoted as F(x) - F(x) P(X ? x) Sum f (x)
- for all values less than or equal to x.
- Probability rules
- F(x) P(X ? x) Sum f (x) for all values less
than x or equal to x - 0 ? F(x) ? 1
- If x ? y, then F(x) ? F(y)
7Continuous Random Variable
- Probability is described by a probability density
function f (x) - Similar to density in other physical systems
- Probability of a value occurring within a
specified interval is the area under f (x)
between the end points of the interval
8Probability Density Function
- A function f (x) is a probability density
function of the continuous random variable X if
for any interval of real numbers a, b, - f (x) gt 0
- Integral over the real number line 1
- P(a lt X lt b) integral evaluated over the range
a, b
9Probability Density Function
10Mathematical Operations
- Mean of a Probability Distribution
- Variance of a Probability Distribution
- Expected Value of a Function of R.V.
- Review Formulas in Text
- Review Elementary Relationships
11Samplingand Sampling Distributions
12Inferential Statistics
13Measures ofCentral Tendency
- Mean
- Average score in the distribution
- Use pilot study mean to identify levels
- Seriously affected by extreme scores
- Median
- Middle score in the distribution
- Use with the mean value to decide levels
- Mode
- Most frequent (typical) score in the distribution
- Not affected by extreme scores
14Measures of Variability
- Range
- Difference between the largest and smallest value
R xmax - xmin - Sample Variance estimates s2
- Sample Standard Deviation estimates s
15Parameter Estimation
- Statistic
- A number calculated from sample data that should
closely estimate its corresponding parameter in
the population - Properties of a Good Statistic
- A statistic is an unbiased estimator of ? iff
- A statistic is a consistent estimator of ?
iff - The efficiency of a statistic is the measure of
its variance relative to that of the unbiased
estimator with the smallest variance
16Sampling Distributions
- For a sample of n independent observations from
any distribution with a finite variance - Distribution of the Sample Mean
- X is an unbiased estimator of the population mean
m - EXm
- VarX? 2/n
- Distribution of the Sample Variance
- s2 is an unbiased estimator of the population
variance ? 2 - Es2? 2
- However, s is not an unbiased estimator of ? ,
since - Es??
17Central Limit Theorem
- For ANY population, the distribution of the
sample mean will approach a Normal distribution
for large sample sizes - Sample Mean Distribution Properties
- Mean m
- Variance s2 / n
2.15 13.59 34.13
34.13 13.59 2.15 Std. Dev. -3s -2s
-1s m 1s 2s 3s z
score - 3 - 2 - 1 0
1 2 3
18Standard Normal
- Formed from z-scores
- Z scores indicate the deviation of raw scores
from the sample mean in units of Std. Dev. - Properties z N(0,1)
19Distribution Theory
- Population- target group
- Sample- subgroup of population
- 3 Distributions
- Population
- SamplE
- SamplING
20Examples of Sampling Distributions
- David Lanes Rice Virtual Lab
- http//www.ruf.rice.edu/lane/rvls.html
21More on Sampling Distributions
- Need for Sampling Distributions
- Provides a link between a statistic and our real
interest, parameter values - Describes degree of approximation involved with a
statistic - Used to calculate a statistics probability given
a Null hypothesis about the parameter
22Hypothesis Testing
23Making Statistical Inferences
- Compare Treatment vs. Control Groups
- Sample mean of each group should estimate its
respective population mean - If from the same population, differences in the
means are due only to sampling error - If from different populations, treatment effect
is significant (i.e., treatment caused
differences) - Significance Level (a)
- Criterion set by experimenter
- Statement that the probability of an observed
mean difference is strictly due to chance - Conventionally, difference should occur less than
5 times in 100 if by chance alone (a .05)
24Decision Outcomes
Correct Retention
Correct Rejection
25Decision Errors
- Type I Error (a)
- Probability of rejecting H0 when it is true
- Differences are attributed to the treatment when
sampling error was the true cause - Type II Error (b)
- Probability of retaining H0 when it is false
- Differences due to a treatment are attributed to
mere chance - For fixed sample size n and (a and b) are
inversely related - Both can be reduced only by increasing n
26Power and Effect Size
- Power (1-b)
- Probability of a correct rejection
- Increases with larger sample sizes
- Too much power can result in the detection of
unimportant differences - Effect Size
- Separation distance between the means of the null
and alternate hypotheses - Gives some perspective on the practical
importance of any significant differences found
27Confidence Intervals
- An interval that will contain a parameter of
interest approximately (1- a)100 of the time - Example
- If observations are NID(m,s2), where s2 is known,
then - will contain the true mean (1- a)100 of the
time - If observations are NID(m,s2), where s2 is
unknown, then - will contain the true mean (1- a)100 of the
time - David Lane and Rice Virtual Lab
28Hypothesis Testing
- Null Hypothesis (H0)
- States no difference exists between events
- Assumed correct until evidence to the contrary
- Always an equality (e.g., H0 m 50)
- Alternate Hypothesis (Ha or H1)
- Often the formal hypothesis of the experiment
- One-tailed m lt 50 or m gt 50
- Two-tailed m ? 50
- Critical Region
- Set of values that leads to rejecting H0
29Tests of a Single Mean
- Assumptions
- XNID(m, s2)
- Hypothesis
- H0 m m0
- Ha m ? m0
- Test Statistic
- Case 1 Large sample (n? 30) or s2 known
- Case 2 Small sample or s2 unknown
30Tests on Two Means
- Assumptions
- XNID(m, s2)
- Hypothesis
- H0 m1 m2 H0 m1 - m2 0
- Ha m1 ? m2 Ha m1 - m2 ? 0
- Test Statistic
- Case 1 Large sample (n? 30) or s12 and s22
known
or
31Tests on Two Means
- Case 2 s12 and s22 unknown but equal (Must
Test!) - Test for Equal Variances
- Test Statistic
where
32Tests on Two Means
- Case 3 s12 and s22 unknown and unequal
- Test Statistic
- NOTE Round df down to the nearest whole number
to use the t-table. - Rejection Region
where
33Dependent Samples
- Paired Sample t-test
- Assumptions
- Dependent samples
- Take differences for each pair d x1 - x2
- Test Statistic
- Advantages and Disadvantages of Pairing
- Reducing the variability (sd) yields a larger
calculated t value and increases the likelihood
of rejecting the null hypothesis, BUT - We lose degrees of freedom (cut in half) which
makes the test less sensitive since the critical
t value in the table will be larger
where n number of pairs
34Tests on a Single Variance
- Assumptions
- XNID (m, s2)
- Hypothesis
- H0 s2 s02
- Ha s2 ? s02 or s2 gt s02 or s2 lt s02
- Test Statistic
- Confidence Interval
35Tests on Two Variances
- Assumptions
- X1NID (m1, s12) and X2NID (m2, s22)
- Hypothesis
- H0 s12 s22
- Ha s12 ? s22
- Test Statistic
- Rejection Region