Nonparametric Approaches - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Nonparametric Approaches

Description:

Though the non-parametric tests might be able to match that power under certain conditions ... Non-parametric tests are often used with small samples or ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 18

Provided by: mjc9

Category:

more less

Transcript and Presenter's Notes

Title: Nonparametric Approaches

1
Non-parametric Approaches

The Bootstrap

2
Non-parametric?

Non-parametric or distribution-free tests have
more lax and/or different assumptions
Properties
No assumption about the underlying distribution
being normal
More sensitive to medians than means (which is
good if youre interested in the median)
Some may not be very affected by outliers
Rank tests

3
Parametric vs. Non-parametric

Parametric tests will typically be used when
assumptions are met as they will usually have
more power
Though the non-parametric tests might be able to
match that power under certain conditions
Non-parametric tests are often used with small
samples or violations of assumptions

4
Some commonly used nonparametric analyses

Chi-Square
Chi-Square analysis involves categorical
variables/frequency data only
Example
Party Republican Democrat
Vote Yes No
In this case, we cannot meet the assumptions
common to our typical tests, but the goal would
still be to understand the relationship between
the variables involved
Chi-square analysis examines such relationships
regarding frequencies of cells, and we can still
get measures of the strength of the association
See effect size handout

5
Common Rank tests

Wilcoxon t for independent and dependent samples,
Mann-Whitney U
Kruskal-Wallis, Friedman for more than 2 groups
Basic procedure
Rank the DV and get sums of the ranks for the
groups
Construct a test statistic based on the ranked
data
Advantage
Normality not necessary
Insensitive to outliers
Disadvantage
Ranked data is not in original units and so
therefore may be less interpretable
May lack power, particularly when parametric
assumptions hold

6
Transformation of data

So if I dont like my data I just change it?
Think about what youre studying
Is depression a function of Likert scale
questions?
Is reaction time inherently related to learning?
Tukey reexpressions
Our original numbers are already useful fictions,
and if we think of them as such, transforming
them into something else may not seem so
far-fetched

7
Some common transformations

Logarithmic Positively skewed
Square root Count data
e.g.
Reciprocal (1/x) When there are very extreme
outliers
Arcsine Proportional data
e.g.
Other measures of location Heavy tailed data
e.g. Trimmed mean

8
When to transform?

Not something to think about doing straight away
at any little sign of trouble
Even if your groups are skewed in a similar
manner parametric tests may hold
Shop around
Try different transformations to see if one works
better for your problem regarding the
distribution of values (but not just to get a sig
p-value)

9
Note

Transformations will not necessarily solve
problems with outliers
Also, if inferences are based on e.g. the mean of
the transformed data, we cannot simply transform
the values back to the original and act as though
the inferences still hold (e.g. for µ)
In the end, wed rather keep our data in original
units and those transformations should be a last
resort

10
More recent developments

The Bootstrap
The basic idea involves sampling with replacement
from the sample data to produce random samples of
size n
Each of these samples provides an estimate of the
parameter of interest
Repeating the sampling a large number of times
provides information on the variability of the
estimate i.e. its standard error
Necessary for any inferential test

11
TV Example
How many hours of TV watched yesterday
12
Bootstrap

1000 samples
Distribution of Means of each sample ?
Mean 3.951

13
How?

Two Examples

basic R function boot(11000) for (i in boot)
bootimean(sample(TVdata, replaceT)) mean(boot)
hist(boot, col"blue", border"red")
uses the bootstrap package library(bootstrap) boo
tmeanbootstrap(TVdata, thetamean,
nboot1000) mean(bootmeanthetastar) hist(bootmean
thetastar)
14
Bootstrap

Hypothetical situation
If we cannot assume normality, how would we go
about getting a confidence interval for a
particular statistic?
How would you get a confidence interval for
robust measures and other statistics?
Solution
Resample (with replacement) from our own data
based on its distribution
Treat our sample distribution as a population
distribution and take random samples from it
So what we have done is, instead of assuming some
sampling distribution of a particular shape and
size, weve created it ourselves and derived our
interval estimate from it
From this we can create confidence intervals and
perform other inferential procedures

15
Hypothesis Testing

Comparing independent groups
Step 1 compute the bootstrap mean and bootstrap
sd as before, but for each group
Each time you do so, calculate T
This creates your own t distribution.

16
Hypothesis Testing

Use the quantile points corresponding to your
confidence level from it in computing your
confidence interval on the difference betweens,
rather than the tcv from typical distributions
Note however that your T will not be the same
for the upper and lower bounds
Unless your bootstrap distribution was perfectly
symmetrical
Not likely to happen

17
So why use?

Accuracy and control of type I error rate
Most of the problems associated with both
accuracy and maintenance of type I error rate are
reduced using bootstrap methods compared to
Students t
Wilcox goes further to suggest that there may be
in fact very few situations, if any, in which the
traditional approach offers any advantage over
the bootstrap approach
The problem of outliers and the basic statistical
properties of means and variances as remain
however