Title: Nonparametric Approaches
1Non-parametric Approaches
2Non-parametric?
- Non-parametric or distribution-free tests have
more lax and/or different assumptions - Properties
- No assumption about the underlying distribution
being normal - More sensitive to medians than means (which is
good if youre interested in the median) - Some may not be very affected by outliers
- Rank tests
3Parametric vs. Non-parametric
- Parametric tests will typically be used when
assumptions are met as they will usually have
more power - Though the non-parametric tests might be able to
match that power under certain conditions - Non-parametric tests are often used with small
samples or violations of assumptions
4Some commonly used nonparametric analyses
- Chi-Square
- Chi-Square analysis involves categorical
variables/frequency data only - Example
- Party Republican Democrat
- Vote Yes No
- In this case, we cannot meet the assumptions
common to our typical tests, but the goal would
still be to understand the relationship between
the variables involved - Chi-square analysis examines such relationships
regarding frequencies of cells, and we can still
get measures of the strength of the association - See effect size handout
5Common Rank tests
- Wilcoxon t for independent and dependent samples,
Mann-Whitney U - Kruskal-Wallis, Friedman for more than 2 groups
- Basic procedure
- Rank the DV and get sums of the ranks for the
groups - Construct a test statistic based on the ranked
data - Advantage
- Normality not necessary
- Insensitive to outliers
- Disadvantage
- Ranked data is not in original units and so
therefore may be less interpretable - May lack power, particularly when parametric
assumptions hold
6Transformation of data
- So if I dont like my data I just change it?
- Think about what youre studying
- Is depression a function of Likert scale
questions? - Is reaction time inherently related to learning?
- Tukey reexpressions
- Our original numbers are already useful fictions,
and if we think of them as such, transforming
them into something else may not seem so
far-fetched
7Some common transformations
- Logarithmic Positively skewed
- Square root Count data
- e.g.
- Reciprocal (1/x) When there are very extreme
outliers - Arcsine Proportional data
- e.g.
- Other measures of location Heavy tailed data
- e.g. Trimmed mean
8When to transform?
- Not something to think about doing straight away
at any little sign of trouble - Even if your groups are skewed in a similar
manner parametric tests may hold - Shop around
- Try different transformations to see if one works
better for your problem regarding the
distribution of values (but not just to get a sig
p-value)
9Note
- Transformations will not necessarily solve
problems with outliers - Also, if inferences are based on e.g. the mean of
the transformed data, we cannot simply transform
the values back to the original and act as though
the inferences still hold (e.g. for µ) - In the end, wed rather keep our data in original
units and those transformations should be a last
resort
10More recent developments
- The Bootstrap
- The basic idea involves sampling with replacement
from the sample data to produce random samples of
size n - Each of these samples provides an estimate of the
parameter of interest - Repeating the sampling a large number of times
provides information on the variability of the
estimate i.e. its standard error - Necessary for any inferential test
11TV Example
How many hours of TV watched yesterday
12Bootstrap
- 1000 samples
- Distribution of Means of each sample ?
- Mean 3.951
13How?
basic R function boot(11000) for (i in boot)
bootimean(sample(TVdata, replaceT)) mean(boot)
hist(boot, col"blue", border"red")
uses the bootstrap package library(bootstrap) boo
tmeanbootstrap(TVdata, thetamean,
nboot1000) mean(bootmeanthetastar) hist(bootmean
thetastar)
14Bootstrap
- Hypothetical situation
- If we cannot assume normality, how would we go
about getting a confidence interval for a
particular statistic? - How would you get a confidence interval for
robust measures and other statistics? - Solution
- Resample (with replacement) from our own data
based on its distribution - Treat our sample distribution as a population
distribution and take random samples from it - So what we have done is, instead of assuming some
sampling distribution of a particular shape and
size, weve created it ourselves and derived our
interval estimate from it - From this we can create confidence intervals and
perform other inferential procedures
15Hypothesis Testing
- Comparing independent groups
- Step 1 compute the bootstrap mean and bootstrap
sd as before, but for each group - Each time you do so, calculate T
- This creates your own t distribution.
16Hypothesis Testing
- Use the quantile points corresponding to your
confidence level from it in computing your
confidence interval on the difference betweens,
rather than the tcv from typical distributions - Note however that your T will not be the same
for the upper and lower bounds - Unless your bootstrap distribution was perfectly
symmetrical - Not likely to happen
17So why use?
- Accuracy and control of type I error rate
- Most of the problems associated with both
accuracy and maintenance of type I error rate are
reduced using bootstrap methods compared to
Students t - Wilcox goes further to suggest that there may be
in fact very few situations, if any, in which the
traditional approach offers any advantage over
the bootstrap approach - The problem of outliers and the basic statistical
properties of means and variances as remain
however