Title: MA in English Linguistics Experimental design and statistics II
1MA in English LinguisticsExperimental design and
statistics II
Sean Wallis Survey of English Usage University
College London s.wallis_at_ucl.ac.uk
2Outline
- Plotting data with Excel
- The idea of a confidence interval
- Binomial ? Normal ? Wilson
- Interval types
- 1 observation
- The difference between 2 observations
- From intervals to significance tests
3Plotting graphs with Excel
- Microsoft Excel is a very useful tool for
- collecting data together in one place
- performing calculations
- plotting graphs
- Key concepts of spreadsheet programs
- worksheet - a page of cells (rows x columns)
- you can use a part of a page for any table
- cell - a single item of data, a number or text
string - referred to by a letter (column), number (row),
e.g. A15 - each cell can contain
- a string e.g. Speakers
- a number 0, 23, -15.2, 3.14159265
- a formula A15, A1523, SQRT(A15),
SUM(A15C15)
4Plotting graphs with Excel
- Importing data into Excel
- Manually, by typing
- Exporting data from ICECUP
- Manipulating data in Excel to make it useful
- Copy, paste columns, rows, portions of tables
- Creating and copying functions
- Formatting cells
- Creating and editing graphs
- Several different types (bar chart, line chart,
scatter, etc) - Can plot confidence intervals as well as points
- You can download a useful spreadsheet for
performing statistical tests - www.ucl.ac.uk/english-usage/statspapers/2x2chisq.x
ls
5Recap the idea of probability
- A way of expressing chance
- 0 cannot happen
- 1 must happen
- Used in (at least) three ways last week
- P true probability (rate) in the population
- p observed probability in the sample
- a probability of p being different from P
- sometimes called probability of error, pe
- found in confidence intervals and significance
tests
6The idea of a confidence interval
- All observations are imprecise
- Randomness is a fact of life
- Our abilities are finite
- to measure accurately or
- reliably classify into types
- We need to express caution in citing numbers
- Example (from Levin 2013)
- 77.27 of uses of think in 1920s data have a
literal (cogitate) meaning
7The idea of a confidence interval
- All observations are imprecise
- Randomness is a fact of life
- Our abilities are finite
- to measure accurately or
- reliably classify into types
- We need to express caution in citing numbers
- Example (from Levin 2013)
- 77.27 of uses of think in 1920s data have a
literal (cogitate) meaning
Really? Not 77.28, or 77.26?
8The idea of a confidence interval
- All observations are imprecise
- Randomness is a fact of life
- Our abilities are finite
- to measure accurately or
- reliably classify into types
- We need to express caution in citing numbers
- Example (from Levin 2013)
- 77 of uses of think in 1920s data have a
literal (cogitate) meaning
9The idea of a confidence interval
- All observations are imprecise
- Randomness is a fact of life
- Our abilities are finite
- to measure accurately or
- reliably classify into types
- We need to express caution in citing numbers
- Example (from Levin 2013)
- 77 of uses of think in 1920s data have a
literal (cogitate) meaning
Sounds defensible. But how confident can we be
in this number?
10The idea of a confidence interval
- All observations are imprecise
- Randomness is a fact of life
- Our abilities are finite
- to measure accurately or
- reliably classify into types
- We need to express caution in citing numbers
- Example (from Levin 2013)
- 77 (66-86) of uses of think in 1920s data have
a literal (cogitate) meaning
11The idea of a confidence interval
- All observations are imprecise
- Randomness is a fact of life
- Our abilities are finite
- to measure accurately or
- reliably classify into types
- We need to express caution in citing numbers
- Example (from Levin 2013)
- 77 (66-86) of uses of think in 1920s data have
a literal (cogitate) meaning
Finally we have a credible range of values -
needs a footnote to explain how it was
calculated.
12Binomial ? Normal ? Wilson
- Binomial distribution
- Expected pattern of observations found when
repeating an experiment for a given P (here, P
0.5) - Based on combinatorial mathematics
13Binomial ? Normal ? Wilson
- Binomial distribution
- Expected pattern of observations found when
repeating an experiment for a given P (here, P
0.5) - Based on combinatorial mathematics
- Other values of P have differentexpected
distribution patterns
0.3
0.1
0.05
14Binomial ? Normal ? Wilson
- Binomial distribution
- Expected pattern of observations found when
repeating an experiment for a given P (here, P
0.5) - Based on combinatorial mathematics
- Binomial ? Normal
- Simplifies the Binomial distribution(tricky to
calculate) to two variables - mean P
- P is the most likely value
- standard deviation S
- S is a measure of spread
F
S
P
p
15Binomial ? Normal ? Wilson
- Binomial distribution
- Binomial ? Normal
- Simplifies the Binomial distribution(tricky to
calculate) to two variables - mean P
- standard deviation S
- Normal ? Wilson
- The Normal distribution predictsobservations p
given a populationvalue P - We want to do the opposite predict the true
population value P from an observation p - We need a different interval, the Wilson score
interval
F
p
P
16Binomial ? Normal
- Any Normal distribution can be defined by only
two variables and the Normal function z
? population mean P
? standard deviationS ? P(1 P) / n
F
- With more data in the experiment, S will be
smaller
z . S
z . S
0.5
0.3
0.1
0.7
p
17Binomial ? Normal
- Any Normal distribution can be defined by only
two variables and the Normal function z
? population mean P
? standard deviationS ? P(1 P) / n
F
z . S
z . S
- 95 of the curve is within 2 standard deviations
of the expected mean
- the correct figure is 1.95996!
- the critical value of z for an error level of
0.05.
2.5
2.5
95
0.5
0.3
0.1
0.7
p
18Binomial ? Normal
- Any Normal distribution can be defined by only
two variables and the Normal function z
? population mean P
? standard deviationS ? P(1 P) / n
F
z . S
z . S
- 95 of the curve is within 2 standard deviations
of the expected mean
- The tail areas
- For a 95 interval, total 5
2.5
2.5
95
0.5
0.3
0.1
0.7
p
19The single-sample z test...
- Is an observation p gt z standard deviations from
the expected (population) mean P?
- If yes, p is significantly different from P
F
observation p
z . S
z . S
2.5
2.5
P
0.5
0.3
0.1
0.7
p
20...gives us a confidence interval
- The interval about p is called the Wilson score
interval (w, w)
observation p
- This interval reflects the Normal interval about
P - If P is at the upper limit of p,p is at the
lower limit of P
F
w
w
(Wallis, 2013)
P
2.5
2.5
0.5
0.3
0.1
0.7
p
21...gives us a confidence interval
- The Wilson score interval (w, w) has a
difficult formula to remember
22...gives us a confidence interval
- The Wilson score interval (w, w) has a
difficult formula to remember
- You do not need to know this formula!
- You can use the 2x2 spreadsheet!
- www.ucl.ac.uk/english-usage/statspapers/2x2chisq.
xls
23An example uses of think
- Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods - This is the graph wecreated in Excel
- http//corplingstats.wordpress.com/2012/04/03/plot
ting-confidence-intervals/
24An example uses of think
- Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods - This is the graph wecreated in Excel
- Not an alternation study
- Categories are not choices
- The graph plots the probability of
readingdifferent uses of theword think (given
thewriter used the word)
- http//corplingstats.wordpress.com/2012/04/03/plot
ting-confidence-intervals/
25An example uses of think
- Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods - This is the graph wecreated in Excel
- Has Wilson score intervals for eachpoint
- http//corplingstats.wordpress.com/2012/04/03/plot
ting-confidence-intervals/
26An example uses of think
- Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods - This is the graph wecreated in Excel
- Has Wilson score intervals for eachpoint
- It is easy to spot whereintervals overlap
- A quick test forsignificant difference
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
27An example uses of think
- Magnus Levin (2013) examined uses of think in the
TIME corpus in three time periods - Wilson score intervalsfor each point
- It is easy to spot whereintervals overlap
- A quick test forsignificant difference
- No overlap significant
- Overlaps point ns
- Otherwise test fully
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
28A quick test for significant difference
- No overlap significant
- Overlaps point ns
- Otherwise test fully
w1
p1
w2
w1
p2
w2
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
29A quick test for significant difference
- No overlap significant
- Overlaps point ns
- Otherwise test fully
w1
Upper bound
p1
Observed probability
w2
w1
Lower bound
p2
w2
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
30Test 1 Newcombes test
- This test is used when data is drawn from
different populations (different years, groups,
text categories) - We calculate a new Newcombe-Wilson interval (W,
W) - W -?(p1 w1)2 (w2 p2)2
- W ?(w1 p1)2 (p2 w2)2
(Newcombe, 1998)
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
31Test 1 Newcombes test
- This test is used when data is drawn from
different populations (different years, groups,
text categories) - We calculate a new Newcombe-Wilson interval (W,
W) - W -?(p1 w1)2 (w2 p2)2
- W ?(w1 p1)2 (p2 w2)2
- We then compare W lt (p2 p1) lt W
(Newcombe, 1998)
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
32Test 1 Newcombes test
- This test is used when data is drawn from
different populations (different years, groups,
text categories) - We calculate a new Newcombe-Wilson interval (W,
W) - W -?(p1 w1)2 (w2 p2)2
- W ?(w1 p1)2 (p2 w2)2
- We then compare W lt (p2 p1) lt W
(Newcombe, 1998)
(p2 p1) lt 0 fall
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
33Test 1 Newcombes test
- This test is used when data is drawn from
different populations (different years, groups,
text categories) - We calculate a new Newcombe-Wilson interval (W,
W) - W -?(p1 w1)2 (w2 p2)2
- W ?(w1 p1)2 (p2 w2)2
- We then compare W lt (p2 p1) lt W
- We only need tocheck the innerinterval
(Newcombe, 1998)
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
34Test 2 2 x 2 chi-square
- This test is used when data is drawn from the
same population of speakers (e.g. grammar -gt
grammar) - We put the data into a 2 x 2 table
- www.ucl.ac.uk/english-usage/statspapers/2x2chisq.x
ls
(Wallis, 2013)
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
35Test 2 2 x 2 chi-square
- This test is used when data is drawn from the
same population of speakers (e.g. grammar -gt
grammar) - We put the data into a 2 x 2 table
- www.ucl.ac.uk/english-usage/statspapers/2x2chisq.x
ls - The test uses the formula ?2 ?(o e)2
- where e r x c / n
e
(Wallis, 2013)
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
36Expressing change
- Percentage difference is a very common idea
- X has grown by 50 or Y has fallen by 10
- We can calculate percentage difference by
- d d / p1 where d p2 p1
- We can put Wilson confidence intervals on d
- BUT Percentage difference can be very misleading
- It depends heavily on the starting point p1
(might be 0) - What does it mean to say
- something has increased by 100?
- it has decreased by 100?
- It is better to simply say that
- the rate of cogitate uses of think fell from
77 to 59
- http//corplingstats.wordpress.com/2012/08/14/plot
ting-confidence-intervals-2/
37Summary
- We analyse results to help us report them
- Graphs are extremely useful!
- You can include graphs and tables in your essays
- If a result is not significant, say so and move
on - Dont say it is nearly significant or
indicative - An error level of 0.05 (or 95 correct) is OK
- Some people use 0.01 (99) but this is not really
better - Wilson confidence intervals tell us
- Where the true value is likely to be
- Which differences between observations are likely
to be significant - If intervals partially overlap, perform a more
precise test
38Summary
- Always say which test you used, e.g.
- We compared cogitate uses of think with other
uses, between the 1920s and 1960s periods, and
this was significant according to ?2 at the 0.05
error level. - Tell your reader that you have plotted (e.g.)
95 Wilson confidence intervals in a footnote
to the graph. - For advice on deciding which test to use, see
- http//corplingstats.wordpress.com/2012/04/11/choo
sing-right-test/ - The tests you will need in one spreadsheet
- www.ucl.ac.uk/english-usage/statspapers/2x2chisq.x
ls
39References
- Levin, M. 2013. The progressive in modern
American English. In Aarts, B., J. Close, G.
Leech and S.A. Wallis (eds). The Verb Phrase in
English Investigating recent language change
with corpora. Cambridge CUP. - Newcombe, R.G. 1998. Interval estimation for the
difference between independent proportions
comparison of eleven methods. Statistics in
Medicine 17 873-890 - Wallis, S.A. 2013. z-squared The origin and
application of ?². Journal of Quantitative
Linguistics 20 350-378. - Wilson, E.B. 1927. Probable inference, the law of
succession, and statistical inference. Journal of
the American Statistical Association 22 209-212 - Assorted statistical tests
- www.ucl.ac.uk/english-usage/staff/sean/resources/2
x2chisq.xls