Title: Statistics for Linguistics Students
1Statistics for Linguistics Students
- Michaelmas 2004
- Week 4
- Bettina Braun
- www.phon.ox.ac.uk/bettina
2Overview
- Discussion of last assignment
- z-distribution vs. t-distribution
- Between-subjects design vs. Within-subjects
design - t-tests
- for independent samples
- for dependent samples
3Exercise z-scores
1) The mean pause duration in a read text is
200ms with a standard deviation of 50ms. For the
calculations please specify how you reached your
conclusion! a) Is this a statistic or a
parameter? If we are interested in describing
this particular read test, then its a parameter.
If we use this text to draw inferences about
pause duration in any text then its a
statistic. b) What proportion of the data is
above 70ms?z2.60.47 of the data lie below
70ms99.53 of the data lie above 70ms c) What
proportion of the data falls between 100ms and
300ms?z22,28 lie below 100ms and 2.28 lie
above 300ms95.44 lie between 100ms and 300ms
4Exercise sampling distribution
- 2) If we have a sample size of 50, what does the
sampling distribution of the means look like if
the population is - U-shaped
- skewed-left, and
- normally distributed?
- Because of the central limit theorem, the
sampling distribution of the mean will be
normally distributed, irrespective of the form of
the parent distribution
5Exercise central limit theorem, standard error
- 3) What happens, if the sample size increases
for the following statistics. Does the - estimated mean increase, decrease, or stay
approximately the same? Why?Stays the same as
the sample mean is an adequate estimate for the
population mean (central limit theorem) - standard error increase, decrease, or stay
approximately the same? Why?Standard error
decreases with the square root of the sample size
(see formula for standard error)
6What are frequency data?
- Number of subjects/events in a given category
- You can then test whether the observed
frequencies deviate from your expected
frequencies - E.g. In an election, there is an a priori change
of 50-50 for each candidate.
7X2-test
- Null-hypothesis there is no difference between
expected and observed frequency - Data
- Calculation
Kerry supporter Bush supporter
observed
expected
8X2-test
- Limitations
- All raw data for X2 must be frequencies
- Each subject or event is counted only once(if we
wish to find out whether boys or girls are more
likely to pass or fail a test, we might observe
the performance of 100 children on a test. We may
not observe the performance of 25 children on 4
tests, however) - The total number of observations should be
greater than 20 - The expected frequency in any cell should be
greater than 5
9Looking up the p-value
- Degrees of freedom
- If there is one independent variabledf (a
1) - Iif there are two independent variablesdf
(a-1)(b-1)
10Exercise dependent and independent variables
- Generally, in hypothesis testing, the independent
variable is hardly ever interval. Mostly it is
nominal, or ordinal - Differentiate between
- Number of independent variables (e.g. gender and
exam year for score example gt 2) - Levels of an independent variable are the number
of values it can take (e.g. gender generally 2) - The null-hypothesis is formulated to deny a
relation between dependent and independent
variable
11Exercise dependent and independent variables
- Imagine you have a text-to-speech synthesis
system. You are interested to find out whether
the acceptability (from 1 to 5) is increased if
you model short pauses at syntactic phrases. - dependent variable acceptability (ordinal data)
- independent variable TTS with/without pause
model (2 levels) - Null-Hypothesis Duration model does not
influence acceptability rating
12Exercise dependent and independent variables
- Subjects learned 20 nonsense-words presented
visually. 30 minutes later they were tested for
retention. The next day, the same subjects
learned another 20 nonsense-words, this time in a
combined visual and auditory presentation. Again,
after 30 minutes they were tested for retention.
The researcher measured the number of correct
nonsense-words. - dependent variable number of correct responses
(interval data) - independent variable kind of presentation (2
levels) - null hypothesis The number of correct responses
will be the same in the two conditions
13Further influencing factors
- Besides the independent variable, there might be
further factors that influence your dependent
variable. - Other factors might be confounded with our
independent variable (e.g. in the nonword
retention task, the audio-visual presentation was
on a different day than the auditory
presentation. Presentation kind can thus be
confounded with presentation time) - Systematic error
14Counterbalancing
- To avoid confounding variables, the conditions
have to be counterbalanced. Examle - Half the subjects are doing the auditory
presentation first and the audio-visual
presentation second - Half the subjects are doing the task in opposite
order - We often have a group of subjects to perform the
task (not just one subject) - Also, in linguistic research, we often use
multiple repetitions or different lexicalisations
for a given condition (e.g. different words that
all have a CVCV strucure)
15Exercise drawing error-bars
- Variables need to have the correct type!
- Error bars show the 95 confidence interval for
the mean (i.e. the mean and the area where 95 of
the data fall in) - One independent variable
- Simple error bar for groups of variables
- Two independent variables
- Clustered error bar for groups of variables
16Exercise drawing error-bars
Clustered error bars for two independent variables
17Example testing if a sample is drawn from a
given population
- A lecturer at Oxford University expects that
students at this university have a higher
IQ-score than the average British population. - Since records are taken, he knows that the mean
IQ-score in Britain is 200 with a standard
deviation of 32
18Experimental Procedure
- The Null-hypothesis H0 is that the IQ of Oxford
students is no different from the general public. - He randomly selects 40 students and gives them
the standard IQ test. - This results in an IQ-score of 210
- Questions
- Can he conclude that Oxford students have a
higher IQ? - Can he compare his sample to the population?
19Comparison to population
- The sample mean cannot directly be compared to
the whole population, but to the sampling
distribution of the sample mean (with samples of
size n40). - The sampling distribution has the same mean as
the population (200) and the standard error of
20Calculating z-score
- Since the sampling distribution will be normally
distributed (for n gt 30), we can calculate the
z-score to see how likely a mean of 210 is, given
the null-hypothesis were true -
There is a chance of 2.4 that the sample mean
falls within the sampling distribution
21What if the population is unknown?
- Often, we compare two different samples and we do
not know the population parameters (e.g. are
exam scores of the year 1990 and 2000 from the
same distribution?) - Independent variable ( levels?)
- Dependent variable (type?)
22What if the population is unknown?
- Often, we compare two different samples and we do
not know the population parameters (e.g. are
exam scores of the year 1990 and 2000 from the
same distribution?) - Independent variable ( levels?)exam year (2
levels) - Dependent variable (type?)exam score (interval
data)
23Hypothesis
- Null-hypothesis The scores in the 2 exam years
were drawn from the same distribution - Comparison of the means of the two populations
(estimated from two representatitve samples) - What statistical test do we have to perform?
24Between-subjects design (completely randomised)
- All comparisons between the different conditions
are based on comparisons between different
(groups of) subjects - Each subject provides data for only one research
condition - ExampleYou want to test whether the pitch of
children under the age of 10 is dependent on
their gender (a given child is either male or
female!)
25Within-subjects design (repeated measures)
- All comparisions between different conditions are
based on comparisons within the same group of
subjects - Each subject provides data for all experimental
conditions (as many scores as experimental
conditions) - ExampleYou want to test whether the number of
reading errors is higher when a subject is sober
or slightly drunk.
26Why is this difference important?
- On average, two scores from P1 and two scores
from P2 will be more alike than two scores, one
from P1 and one from P2 - Scores from one person on the same task will be
correlated this is taken into account by
within-subjects tests. - If between-subjects test is used for
within-subjects design, we may fail to find an
effect (type II error) - If within-subjects test is used for
between-subjects design, we might find an effect
that is actually not there (type I error)
27Example
- You want to test whether the precontext has an
effect on the prosodic realisation of
sentence-initial accents. - You construct 20 sentences, which can appear in
two different contexts, say contrastive and
non-contrastive. - Then you ask 20 subjects to read the 40 short
paragraphs and measure the pitch height of the
initial accent and the duration of the initial
word. - You want to know if accents are realised
differently in contrastive and non-contrastive
context.
28Difficult cases
- Different classes of dependent variables
- If you are interested in articulatory precision
at two different speech rates, you might measure
the formant values of the vowels and the number
of sound elisions - These two dependent variables are taken from the
same speaker but this is not a within-subjects
design
29Difficult cases
- More than one measurement per subject, combined
to give one score - You are interested in the formant values of male
and female /a/. You have a list of 20 words,
containing an /a/. Each group of 10 speakers
reads the 20 words and you measure the formant
values. Then you build the mean formant value of
/a/ for every speaker - Since the analysis is performed on only one score
per subject, no within-subjects design
30Which statistical test, when youve score data
(parametric tests)?
Between, within, mixed?
Significance test
Number of indepen-dent variables?
Indep. t-Test (2 levels)
One
One-way ANoVA
Between
Two-/Three-way ANoVA
More than one
Paired t-Test (2 levels)
One
a x s ANoVA
Within
b x b (x c) x s ANoVA
More than one
Mixed
31Assumptions for statistical tests on score data
(parametric tests)
- The scores must be from an interval scale
- The scores must be normally distributed in the
population - The variances in the conditions must be
homogenious - Note You can perform parametric tests only if
these assumptions are met!
32T-Test
- Students T-test
- How likely is it that two samples are taken from
the same population? - T-test looks at the ratio of the difference in
group means to the variance
Sample 1 Sample 2
Figure taken from http//esa21.kennesaw.edu/module
s/basics/exercise3/3-8.htm
33T-Tests
- Calculating t-statistic
- Comparable to z-statistic, but dependent on the
degrees of freedoms (df) - Degrees of freedom (df)
- Independent t-test N1N2-2
- Paired t-test N-1
- The critical t-value for a 0.05 (5 risk of
finding an effect that is not actually there) is
dependent on df
34T-distribution
- The more degrees of freedom, the closer the
closer the t- distribution is to the normal
distribution
35T-Table
36One-tailed vs. two-tailed predictions
- If we predict a direction of the difference, we
are making a one-tailed prediction - If we predict that there is a difference
(irrespective of direction), we are making a
two-tailed prediction - If there is not enough evidence for a directional
difference, a two-tailed test is safe.
37Example
- Hypothesis reaction time in cond a is
significantly different from cond b - Null-hypothesis the reaction times are not
different in conditions a and b
38Independent t-test in SPSS
- Organise independent and dependent variables in
separate columns!
39Independent t-test in SPSS
- Independent variable(s)Test variable(s)
- Dependent variableGrouping variable
You have to specify the levels of the independent
variable (can only have two!)
40How to interpret the output?
Descriptive statistics
If p gt 0.05, variances are homogenious
There is an effect of condition on rt
41How to interpret the output?
- Group statistics (descriptive statistics for the
conditions) - Independent samples test
- Levenes test for equality of variances(if p gt
0.05, then variances are homogenious) - t-test for equality of means
- t-value
- df (N-2)
- Significance level (2-tailed)
- mean difference (difference between the means)
42What do we report?
- There is a significant effect of condition on
reaction time. The average reaction time in
condition a was 238.7ms longer than in condition
b (t 6.12, df 62, p lt 0.001). - Interpretation?
43Paired t-test in SPSS
- Variables of different conditions have to be in
parallel columns. - Click on variables to compare and then
44How to interpret the output?
- Paired samples statistic (descriptive statistics)
- Paired samples correlation (naturally, there
should be a rather strong correlation. Subjects
with a low rt will have a slow one in both
conditions) - Paired samples t-test(t, df (N-1), significance
level)
45What if the basic assumptions are not met
- For example
- if the distributions are very skewed
- if you have ordinal data instead of interval data
- You have to use non-parametric tests
- There is a whole range of non-parametric tests
Ill only show the most common ones
46Non-parametric statistical tests (for one
independent variable only)
Between, within, mixed?
Significance test
Number of levels of independent variable?
Mann-Whitney Test
Two
Between
Kruskal-Wallis Test
More than two
Two
Wilcoxon Signed Ranks Test
Within
Freedman Test
More than two