Title: Introduction to nonparametric statistics
1Introduction to non-parametric statistics
- Cord Heuer and Peter Davies
- EpiCentre, Massey University
2Ingredients
- Relationship between parametric and
non-parametric procedures - Circumstances for using NP procedures
- Sign and rank transformations
- Introduce most common NP procedures
- Case study - NP vs. P tests
- Limitations of NP methods
- Example abortion rates
3What options ?
- Robustness
- Outliers
- Transforming data
- Other distributions
- Non-parametric tests
4Do a 30 second pulse count!
5Types of Data
- Nominal - no numerical value
- Ordinal - order or rank
- Discrete - counts
- Continuous - interval, ratio
6Parametric statistics
- Inference based on parameters (m, s) of
distributions - Require measurement on continuous (interval or
ratio) scale - Other assumptions (e.g. homogeneity of variance)
7Parametric procedures assume a distribution, e.g.
normal
8Types of data and analysis
- Nominal
- Ordinal
- Discrete
- Continuous
Non-parametric
Parametric
9What to do with non-normal data?
- Ignore and proceed
- Transform data and use parametric methods
- Use non-parametric procedures
10Non-parametric statistics
- Inference does not rely on estimation of
distribution parameters - Distribution-free statistics
- Developed for nominal and ordinal data or data of
unknown distribution - Can be used with continuous and discrete data
when assumptions of parametric tests are not met
11Why non-parametric statistics?
- Need to analyse
- Crude data (nominal, ordinal)
- Data derived from small samples
- Data that do not follow a normal distribution
- Data of unknown distribution
12Departures from normality are common
13Transformation of data to meet assumptions
Log transformation of right skewed distribution
14Annual incidence of abortion in 602 dairy herds
15Use non-parametric statistics when
- Nominal data or data converted to counts and
measurement scale ignored - Ordinal data or data converted to ranks
16Non-parametric statistics
- All common parametric tests have non-parametric
counterparts - Use with continuous data involves loss of
information and lower power - Non-parametric procedures can have greater power
when data not normal - If assumptions hold, use parametric methods
- NP methods also have one assumption
- independence
17Non-parametric options
18Signs and ranks
- NP methods use relatively simple approaches to
data - Signs and ranks
- Higher order data transformed to signs or ranks
for NP analysis
19Signs and the sign transformation
- Information in the data ignored apart from
direction that each point differs from a
reference point - Better (), worse (-), no change
- Comparison of blood pressure when receiving
treatment vs. placebo
20Sign transformation - blood pressure
- Paired data
- Difference in BP determined by subtracting value
on treatment from value on placebo - Could use actual values (paired t-test)
- Convert each difference to signs ( or -)
ignoring zero values - Use minimal information in data, therefore loss
of power
21Ranks
- Transform data to ordinal form as ranks
- More information retained than with signs
- Rank tests have greater power than sign tests
225.4 5.6
105.8 103.6
23Ties
OBS Rank 12 1 13 2 14 3 15 4.5
15 4.5 16 6 17 7 18 8 18 8 18 8 19 11
20 12
- Ties occur when two observations return the same
value - Ties assigned the mean rank of the tied values
- Software packages detect and adjust for tied
values
24Case study - response to vaccination against
canine parvovirus
- Paired serum samples collected from adult dogs
before and 2 weeks after vaccination - Serological results reported as titres using
2-fold dilution (140 180 1160.) - Interest in whether vaccination affects test
results - Examine distributions
25Nature of the data?
26Histogram of test results before vaccination
27Histogram of test results after vaccination
28Options for analysis
29Paired t-test
30The Sign test
- Most crude and insensitive test
- Ideal quick and dirty test
- If sign test shows significance but method B does
not, question method B - Ignores information other than direction
- Can be used when distribution is asymmetric
31The sign test
- Under null hypothesis, and - values should
occur with equal probability and mean difference
should equal zero - Ignore any zero values and count and - values
- P () P (-) 0.5 (null hypothesis)
- Binomial test of observed data
32Do a 30 second pulse count!
33(No Transcript)
34Sign test of vaccination data
35Sign test of vaccination data
Probability of throwing 0 or 1 heads in 13
tosses of a coin
36Wilcoxon signed ranks test
- Based on ranks - takes magnitude into account
- Higher power than sign test
- more weight to pairs that show large differences
than to pairs that show small differences - Use whenever one sample t-test used
- Can also test the hypothesis that 2 variables
have the same distribution - BUT data must have a symmetric distribution
37Wilcoxon signed ranks test
- Rank the absolute value (i.e. ignoring sign) of
differences from smallest to largest, ignoring
values of zero. - Sum the ranks assigned to positive values, then
to negative values. - The smaller value of the positive or negative
rank sums is the Wilcoxon signed rank statistic
(W). - P probability of this rank sum occurring
under the null hypothesis
38Wilcoxon signed ranks test
- H0 each observation is from a symmetrical
distribution with mean zero - Positive and negative results equally likely
- Total sum of ranks fixed by N (n n-)
- Need only consider one group (smallest)
- If N 10, sum of ranks 55
- If sum of ranks of n- 18, then sum of ranks
of n must equal 37
39(No Transcript)
40Wilcoxon signed ranks test
- With 13 pairs of which 3 were zero, there are 10
pairs available for analysis - 10 pairs could give a total of 55 ranks if all
were on the same side (10911) 55 - Ho 27/55 expected (if half were above/below 0)
- And 18/55 observed
- 10/55 would correspond to p 0.10
- Not enough evidence to reject Ho
41Wilcoxon signed ranks test
- If N gt 16 use approximation
- Where T smaller rank sum
N 10 (ie. lt16!)
42Wilcoxon signed ranks test - vaccination data
43Log transformed results before vaccination
44Log transformed results after vaccination
45Paired t-test after log transformation
46Summary of vaccine data
- All methods indicate significant difference
- t - test on untransformed data has lowest power
- WSR has greater power than ST
- t-test after transformation has highest power
- Transformation does not affect ST
- Transformation can affect WSR (rank order)
- aKolmogorov-Smirnow test of transformed values
fits a normal distribution
47Independent groups
- 2 groups
- Wilcoxon-Mann-Whitney test
- gt2 groups
- Kruskal Wallis test
- gt2 groups and categorical covariates
- Friedman test
48Correlation
Spearman rank correlation rs 1 - 6SUM(diff2)
/ (n(n2-1)) Example rs 1 6 8 /
(7(72-1)) 0.857