Title: 13' NONPARAMETRIC STATISTIC
1 13. NONPARAMETRIC STATISTIC
- 13.1 SINGLE POPULATION INTERFERENCE THE SIGN
TEST - 13.2 THE MANN-WHITNEY U TEST
- 13.3 COMPARING TWO POPULATIONS THE WILCOXON RANK
SUM TEST FOR INDEPENDENT SAMPLE - 13.4 COMPARING TWO POPULATION THE WILCOXON
SIGNED RANK TEST FOR THE PAIRED DIFFERENCE
EXPERIMENT - 13.5 THE KRUSKAL-WALLIS H-TEST FOR A COMPLETELY
RANDOMIZED DESIGN - 13.6 THE FRIEDMAN Fr-TEST FOR A RANDOMIZED BLOCK
DESIGN - 13.7 SPEARMANRANK CORRELATION COEFFICIENT
213. NONPARAMETRIC STATISTIC
- 13.0.1 NONPARAMETRIC STATISTICAL METHODS
- Statistical techniques for comparing two or more
populations that are based on an ordering of the
sample measurements according to their relative
magnitudes, which requires fewer or less
stringent assumptions concerning the nature of
the probability distributions of the populations.
- 13.0.2 NONPARAMETRIC TESTS
- The counterparts of the t- and F-tests compare
the probability distributions of the sampled
populations rather than specific parameters of
these populations (such as the means and
variances). - Most nonparametric methods use the relative ranks
of the sample observations. These test
particularly valuable when it is unable to obtain
numerical measurements of the phenomena but are
able to rank them in comparison to each other. - Rank statistics- statistics based on ranks of
measurements.
313.1 SINGLE POPULATION INTERFERENCE THE SIGN
TEST
- Relatively simple nonparametric procedure for
testing hypotheses about the central tendency of
a nonnormal probability distribution. Sign test
provides inferences about the population median ?
rather than the population means µ. - ? is the 50th percentile of the distribution and
as such is less affected by the skewness of the
distribution and the presence of outliers
(extreme observations).
4Table 13a
- A simple nonparametric test in the case of paired
samples is provided by the sign test. - This test consist of taking the difference
between the numbers of defective bolts for each
day and writing only the sign of the difference,
e.g. for day 1 we have 47-71, which is negative. - From the table 1, we obtain the sequence of
signs - - - - - - - -
- - - (i.e. 3 pluses and 9 minuses). Its show that by
using a two tailed test of this distribution at
the 0.05 significance level, there is no
difference between the machine at this level.
5A) Sign Test for a Population Median ?
- ONE-TAILED TEST
- H0 ? ?0
- Ha ? gt ?0 or Ha ? lt ?0
-
- Test statistic
- S Number of sample
measurements greater - than ?0 or S number of
- measurements less than ?0.
-
- TWO TAILED TEST
- H0 ? ?0
- Ha ? ? ?0
- S Larger of S1 and S2,
- where S1 is the number of measurements less
than ?0 and S2 is the number of measurements
greater than ?0
6 Observation significant level
p-value P(x ? S) p-value 2P(x ? S)
where x has a binomial distribution with
parameters n and p 0.5 (Use Table
II, Appendix A) Rejection region
Reject H0 if p-value ? 0.05
Assumption The sample is selected
randomly from a continuous probability
distribution. Note No assumptions
need to be made about the shape of the
probability distribution. Â
7Â B) Large-Sample Sign Test for a Population
Median ?
- ONE-TAILED TEST
- H0 ? ?0
- Ha ? gt ?0 or Ha ? lt ?0
- Test statistic z
-
- TWO TAILED TEST
- H0 ? ?0
- Ha ? ? ?0
8Note S is calculated as known in the previous
box. We subtract 0.5 from S as the correction
for continuity. The null hypothesized mean value
is np 0.5n, and the standard deviation is
- Rejection region z gt z? Rejection region
z gt z?/2 - where tabulated z values can be found inside the
front cover.
913.2 The Mann-Whitney U Test
- This test deciding two samples whether or not
there is a difference between the samples, or
equivalently, whether or not they come from same
population.
10- The Mann-Whitney U Test consist of the following
step - Combine all sample value in an array from the
smallest to the largest, and assign rank to all
this value. If two or more samples values are
identical, the samples are each assigned a rank
equal to the mean that would otherwise be
assigned. - Find the sum of the ranks for each the samples
(R1 and R1), where N1 and N2 are respective
sample size (For convenience, choose N1 N2). - To test the difference between the rank sums use
the statistic - corresponding to sample 1.
11- The sampling distribution of U is symmetrical and
has a mean and variance given, respectively, by
the formulas - If N1 and N2 are both a least equal to 8, it turn
out that the distribution of U is nearly normal
12- Remark 3
- A value corresponding to sample 2 is given by the
statistics - Value corresponding to statistics between sample
1 and sample 2 is related. - We also have
- Where, NN1N2.
- Remark 4
- The statistic U in value corresponding by the
statistic to sample 1 is the total number of
times that sample 1 values precede sample 2
values when all sample values are arranged in
increasing order of magnitude. This provide an
alternative counting method for finding U.
1313.3 COMPARING TWO POPULATIONS THE
WILCOXON RANK SUM TEST FOR INDEPENDENT SAMPLE
- Wilcoxon Rank Sum Test
- Â To test the hypothesis that the probability
distributions associated with the two populations
are equivalent. - Rank Sum
- The totals of the rank for each of the two
sample. - Â
1413.3.1 Wilcoxon Rank Sum Test Independent
Samples
- ONE-TAILED TEST
- H0 Two sampled
populations have identical probability
distributions. - Ha The probability distribution for
population A is shifted to the right of that for
B.
- TWO TAILED TEST
- H0 Two sampled populations have identical
probability distributions. - Ha The probability distribution for population
A is shifted to the left or to the right of that
for B.
15- Test statistic
- The rank sum T associated
- with the sample with fewer
- measurements (if sample
- sizes are equal, either rank
- sum can be used.)
- Test statistic
- The rank sum T associated
- with the sample with fewer
- measurements(if sample
- sizes are equal, either rank
- sum can be used.)
-
16- Rejection region
- Assuming the smaller
- sample size is associated
- with distribution A, (if
- sample sizes are equal, we
- use the rank sum TA), we
- reject the null hypothesis if
- TA ? TU
- where Tu is the upper value
- given by Table XII in
- Appendix A for the chosen
- one- tailed ? value
- Rejection region
- T ? TL or T ? TU
- where TL is the lower value
- given by Table XII in
- Appendix A for the chosen
- two- tailed ? value and Tu
- is the upper value from
- Table XII
17Note If the one- sided alternative is that the
probability distribution for A is shifted to the
left of B (and TA is the test statistic), we
reject null hypothesis if TA?TL
- Assumptions 1. The two sample are random and
- independent.
- 2. The two probability distributions
- from which the samples are drawn
- are continuous.
- Ties
- Assign tied measurements the average of the rank
they would receive if they were unequal but
occurred in successive order. For example, if the
third-ranked and fourth-ranked measurement is
tied, assign each a rank of - (34)/2 3.5
-
1813.3.2 Wilcoxon Rank Sum Test Large
Independent Samples
- ONE-TAILED TEST
- H0 Two sampled populations have
- identical probability distributions.
- Ha The probability distribution for
population A is shifted to the right of that for
B.
- TWO TAILED TEST
- H0 Two sampled populations have identical
probability distributions. - Ha The probability distribution for population
A is shifted to the left or to the right of
that for B.
19- Test statistic z
-
- Rejection region z gt z?
Rejection region z gt z?/2 - Assumptions n1?10 and n2?10 Assumptions
n1?10 and n2? 10
2013.4 COMPARING TWO POPULATION THE WILCOXON
SIGNED RANK TEST FOR THE PAIRED DIFFERENCE
EXPERIMENT13.4.1 Wilcoxon Rank Sum Test for a
Paired Difference Experiment
- ONE-TAILED TEST
- H0 Two sampled populations have identical
probability distributions. - Ha The probability distribution for
population A is shifted to the right of that for
population B.
- TWO TAILED TEST
- H0 Two sampled populations have identical
probability distributions. - Ha The probability distribution for population
A is shifted to the right or to the left of that
for population B.
21- Test statistic
- T_, the negative rank sum
- (we assume the differences
- are computed by subtracting
- each paired B measurement
- from the corresponding A
- measurement)
- Rejection region
- T_ ? T0 where T0 is found in
- Table XIII (in Appendix A)
- for the one-tailed significance
- level ? and the number of
- untied pairs, n.
-
- Test statistic
- T, the smaller of the positive and negative rank
sums T and T_ - Rejection region
- T ? T0 where T0 is found
- in Table XIII (in Appendix A)
- for the two-tailed significance
- level ? and the number of
- untied pairs, n.
22Note If the alternative hypothesis is that the
probability distribution for A is shifted to the
left of B, we used T as the test statistic and
reject H0 if T ? T0
- Assumptions 1. The sample of differences is
randomly - selected from the
population of differences. - 2. The probability distribution from which
the - sample of paired differences is drawn is
- continuous.
- Ties
- Assign tied absolute differences the average of
the ranks they - would received if they were unequal but occurred
in - successive order. For example, if the
third-ranked and fourth - ranked differences are tied, assign both a rank
of (34)/23.5
2313.4.2 Wilcoxon Rank Sum Test for a Paired
Difference Experiment Large Sample
- ONE-TAILED TEST
- H0 Two sampled populations have identical
probability distributions. - Ha The probability distribution for
population A is shifted to the right of that for
population B.
- TWO TAILED TEST
- H0 Two sampled populations have identical
probability distributions. - Ha The probability distribution for population
A is shifted to the right or to the left of that
for population B.
24Test statistic z
- Rejection region z gt z? Rejection
region z gt z?/2 - Assumptions n?25
Assumptions n?25
2513.5 THE KRUSKAL-WALLIS H-TEST FOR A
COMPLETELY RANDOMIZED DESIGN
- 13.5.1 The Kruskal-Wallis H Test
- This test is for deciding whether or not two
samples come from the same population. - Where
- k Samples of size N1, N2, N3, , Nk
- N Total size of all samples (N1 N2 N3,
Nk ) - Suppose further that the data from all the
samples taken together are ranked and that the
sums of the ranks for the k samples are R1, R2,
, Rk, respectively. - Equation shows - Sampling distribution of H is
very nearly a chi-square distribution with k-1
degrees of freedom, provided that N1, N2, N3, ,
Nk are all at least 5. - Its provides a nonparametric method in the ANOVA
for one-way classification, or one-factor
experiments and generalization can be made.
26- 13.5.2 The Kruskal-Wallis H-Test for Comparing
p Probability Distributions - H0 The p probability distribution are identical
- Ha At least two of the p probability
distribution differ in location.
27Test statistic H
- where
- nj Number of measurements in sample j
- Rj Rank sum for sample j, where the rank of
each - measurement is computed according to its
- relative magnitude in the totality of
data for the - p samples
- n Total sample size n1 n2 .
np
28Rejection region H lt with (p 1) degrees
of freedom
- Assumptions 1. The p samples are random and
independent. - 2. There are 5 or more measurements in each
- sample.
- 3. The p probability distributions from
which - the samples are drawn are continuous.
- Ties
- Assign tied measurements the average of the ranks
they would - received if they were unequal but occurred in
successive order. - For example, if the third-ranked and
fourth-ranked measurements are tied, assign both
a rank of (34)/2 3.5.The number of ties should
be small relative to the total number of the
observations.
2913.6 THE FRIEDMAN Fr-TEST FOR A RANDOMIZED
BLOCK DESIGN
- 13.6.1 Friedman Fr-Test for a Randomized
Block Design - H0 The probability distribution for the p
treatments are - identical.
- Ha At least two of the probability
distributions differ in - location.
30Test statistic Fr
- Where
- b Number of blocks
- p number of treatments
- Rj Rank sum of jth treatment, where the rank
- of each measurements is
computed relative - to its position within its own
block. - Rejection region H lt with (p 1) degrees of
freedom
31Assumptions 1. The treatments are randomly
assigned to experimental units
within the blocks. 2. The measurements can be
ranked within the blocks. 3. The p
probability distributions from
which the samples within each block
are drawn are
continuous.Ties Assign tied measurements
within a block the average of the ranks they
would receive if they were unequal but occurred
in successive order. For example, if the
third-ranked and fourth-ranked measurements are
tied, assign each a rank of (34)/2 3.5. The
number of ties should be small relative to the
total number of observations.
32 13.7 SPEARMANRANK CORRELATION COEFFICIENT
- Where
- ui Rank of the ith observation in sample 1
- vi Rank of the ith observation in sample 1
- n Numbers of pairs of observations (number of
- observation in each sample)
33Shortcut formula for rs
where di ui-vi (difference in the ranks of the
ith observation for sample 1 and 2)
3413.7.1 Spearman s Nonparametric Test for Rank
Correlation
- ONE-TAILED TEST
- H0 ? 0
- Ha ? gt 0 (or Ha ? lt 0 )
- TWO TAILED TEST
- H0 ? 0
- Ha ? ? 0
Test statistic rs, the sample rank correlation
(see the formula for calculating rs).
35- Rejection region rs gt rs,?
- (or rs lt -rs,? when Ha ?slt0)
- where rs,? is the value from
- Table XIV corresponding to
- the upper-tail area ? and n
- pairs of observations.
- Rejection region rs gtrs,?/2
- where rs,?/2 is the value from
- Table XIV corresponding to
- the upper-tail area ?/2 and n
- pairs of observations.
36Assumptions 1. The sample of experimental units
on which the two variables
are measured is randomly
selected. 2. The probability distributions of
the two variables are
continuous.
- Ties
- Assign tied measurements the average of the ranks
they would received if they were unequal but
occurred in successive order. For example, if the
third-ranked and fourth-ranked measurements are
tied, assign each a rank of (34)/2 3.5. The
number of ties should be small relative to the
total number of observations.
3713.7.2 Spearman's Rank Correlation (rs)
- To measure the correlation of two variables, X
and Y. - When precise values of the variables is
unavailable, the data may be ranked from 1 to N
in order to size, importance, etc. - If X and Y are ranked in such a manner,
coefficient of rank correlation is given by - Where
- D denotes the differences between the rank of
corresponding of X and Y . - N the number of pairs of value (X,Y) in the data.