Title: Using Newcastle United to teach statistics
1Using Newcastle United to teach statistics
2School
- Statistics is important as it teaches pupils
how to design good experiments, how to handle and
explore data and how to test a hypothesis. - Many pupils consider statistics boring and
useless. Why? - Many examples in maths textbooks are fake.
- How interesting are socks or samples of the
heights of Yr10 boys? - What is the point?
- It has been noted in several reports that there
is a tendency for girls to do statistics, and
boys to do mechanics.
3Essay Aims
- Use football to teach probability and statistics
to pupils. - Incorporate higher level mathematics to develop
ideas further. - To use various statistical methods to analyse
Newcastles historical performance and predict
future success.
4Assumptions
- For the purpose of the essay I will assume
- That each season is independent
- The confidence of players is not affected .
- I will ignore any possible effect that key player
absences has.
5Data
Year Played Won PCT DIFF
Pos Top Scorers Total 1905-06 38
18 0.4736 26 4
17 1906-07 38 22 0.5789
26 1 20 1907-08 38
15 0.3947 11 4
19 1908-09 38 24 0.6315
24 1 11 1909-10 38
19 0.5000 14 4
26 . . . 2002-03 38
21 0.5536 19 3
17 2003-04 38 13 0.3421
12 5 22 2004-05 38
10 0.2631 -10 14
7 2005-06 38 17 0.4473
5 7 10 2006-07 38
11 0.2894 -9 13
11 When using examples to develop the ideas
presented, I will use what I think would be the
most interesting factor for the pupils Top
Scorers Total. I will only be using the data
from seasons where there were at least 38 league
games played. I am left with 91 cases to
consider.
6Basic Probability
- 4-4-2
- Probability of getting sent off?
- Midfielder4/11
- After this Goalkeeper1/10
- Midfielder AND Goalkeeper2/55
7Correlation
Here we will let xTop Scorers Total and
yPosition So with our data
-973.8235522 3202.900721
2746.132584 r-0.3283584
- Correlation indicates the strength and direction
of a linear relationship between two random
variables. - We will use the Pearson product-moment
correlation coefficient.
This is not a very strong correlation. It is
important to know that this does not mean that
there is no relationship between the two, only
that there is no linear relationship.
8Poisson
- Try to fit a model introduced before University.
- where E(X)
18.89011 - So what is P(X30)?
- P(X30) 1- 0.006463651
9Goodness of fit
- where is the
observed value in cell i - and
is the expected value for cell i
We need to pool cells so that each cell has an
at least of 5, so in the cases where the
cells are pooled we sum the
Our value is 15.00208103 Using table we
see that p0.13 Where did it go wrong? At both
ends the observed values are roughly double
that of the expected values.
10ANOVA
Notice that there are two Pr(gtF) that are much
bigger than the others. These are the two terms
to delete.
- Full model pospctdiffgoalpctdiffpctgoaldi
ffgoal - Df Sum Sq Mean Sq F
value Pr(gtF) - pct 1 2073.91 2073.91
421.7644 lt 2.2e-16 - diff 1 187.99 187.99
38.2300 2.181e-08 - goal 1 33.01 33.01
6.7128 0.01128 - pctdiff 1 27.68 27.68
5.6299 0.01994 - pctgoal 1 4.58 4.58
0.9311 0.33734 - diffgoal 1 5.92 5.92
1.2042 0.27561 - Residuals 84 413.05 4.92
- Reduced Model pospctdiffgoalpctdiff
- Df Sum Sq Mean Sq F
value Pr(gtF) - pct 1 2073.91 2073.91
421.1016 lt 2.2e-16 - diff 1 187.99 187.99
38.1699 2.088e-08 - goal 1 33.01 33.01
6.7023 0.01130 - pctdiff 1 27.68 27.68
5.6210 0.01998 - Residuals 86 423.55 4.92
All of these Pr(gtF) are small. This is our
final model.
112008 League Position?
- Predicted statistics for the end of the 2007/08
season - Percentage of games won 0.29
- Goal Difference -23.75
- Top Goal Scorers Tally 11.083
- Putting these values into the model it is
predicted that Newcastle will finish the season
in 16th place.
12What else?
- For the essay I will
- Give relevant proofs.
- Look at ways to improve my linear model.
- See if my model differs when I use Forward
Selection and Backward Elimination and when I
change the order of the terms - Carry out some influence analysis for my linear
model.