Using Newcastle United to teach statistics - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Using Newcastle United to teach statistics

Description:

Statistics is important as it teaches pupils how to design good experiments, how ... After this Goalkeeper=1/10. Midfielder AND Goalkeeper=2/55. Correlation ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 13
Provided by: npcsx
Category:

less

Transcript and Presenter's Notes

Title: Using Newcastle United to teach statistics


1
Using Newcastle United to teach statistics
  • Simon McIntyre

2
School
  • Statistics is important as it teaches pupils
    how to design good experiments, how to handle and
    explore data and how to test a hypothesis.
  • Many pupils consider statistics boring and
    useless. Why?
  • Many examples in maths textbooks are fake.
  • How interesting are socks or samples of the
    heights of Yr10 boys?
  • What is the point?
  • It has been noted in several reports that there
    is a tendency for girls to do statistics, and
    boys to do mechanics.

3
Essay Aims
  • Use football to teach probability and statistics
    to pupils.
  • Incorporate higher level mathematics to develop
    ideas further.
  • To use various statistical methods to analyse
    Newcastles historical performance and predict
    future success.

4
Assumptions
  • For the purpose of the essay I will assume
  • That each season is independent
  • The confidence of players is not affected .
  • I will ignore any possible effect that key player
    absences has.

5
Data
Year Played Won PCT DIFF
Pos Top Scorers Total 1905-06 38
18 0.4736 26 4
17 1906-07 38 22 0.5789
26 1 20 1907-08 38
15 0.3947 11 4
19 1908-09 38 24 0.6315
24 1 11 1909-10 38
19 0.5000 14 4
26 . . . 2002-03 38
21 0.5536 19 3
17 2003-04 38 13 0.3421
12 5 22 2004-05 38
10 0.2631 -10 14
7 2005-06 38 17 0.4473
5 7 10 2006-07 38
11 0.2894 -9 13
11 When using examples to develop the ideas
presented, I will use what I think would be the
most interesting factor for the pupils Top
Scorers Total. I will only be using the data
from seasons where there were at least 38 league
games played. I am left with 91 cases to
consider.
6
Basic Probability
  • 4-4-2
  • Probability of getting sent off?
  • Midfielder4/11
  • After this Goalkeeper1/10
  • Midfielder AND Goalkeeper2/55

7
Correlation
Here we will let xTop Scorers Total and
yPosition So with our data
-973.8235522 3202.900721
2746.132584 r-0.3283584
  • Correlation indicates the strength and direction
    of a linear relationship between two random
    variables.
  • We will use the Pearson product-moment
    correlation coefficient.

This is not a very strong correlation. It is
important to know that this does not mean that
there is no relationship between the two, only
that there is no linear relationship.
8
Poisson
  • Try to fit a model introduced before University.
  • where E(X)
    18.89011
  • So what is P(X30)?
  • P(X30) 1- 0.006463651

9
Goodness of fit
  • where is the
    observed value in cell i
  • and
    is the expected value for cell i

We need to pool cells so that each cell has an
at least of 5, so in the cases where the
cells are pooled we sum the
Our value is 15.00208103 Using table we
see that p0.13 Where did it go wrong? At both
ends the observed values are roughly double
that of the expected values.
10
ANOVA
Notice that there are two Pr(gtF) that are much
bigger than the others. These are the two terms
to delete.
  • Full model pospctdiffgoalpctdiffpctgoaldi
    ffgoal
  • Df Sum Sq Mean Sq F
    value Pr(gtF)
  • pct 1 2073.91 2073.91
    421.7644 lt 2.2e-16
  • diff 1 187.99 187.99
    38.2300 2.181e-08
  • goal 1 33.01 33.01
    6.7128 0.01128
  • pctdiff 1 27.68 27.68
    5.6299 0.01994
  • pctgoal 1 4.58 4.58
    0.9311 0.33734
  • diffgoal 1 5.92 5.92
    1.2042 0.27561
  • Residuals 84 413.05 4.92
  • Reduced Model pospctdiffgoalpctdiff
  • Df Sum Sq Mean Sq F
    value Pr(gtF)
  • pct 1 2073.91 2073.91
    421.1016 lt 2.2e-16
  • diff 1 187.99 187.99
    38.1699 2.088e-08
  • goal 1 33.01 33.01
    6.7023 0.01130
  • pctdiff 1 27.68 27.68
    5.6210 0.01998
  • Residuals 86 423.55 4.92

All of these Pr(gtF) are small. This is our
final model.
11
2008 League Position?
  • Predicted statistics for the end of the 2007/08
    season
  • Percentage of games won 0.29
  • Goal Difference -23.75
  • Top Goal Scorers Tally 11.083
  • Putting these values into the model it is
    predicted that Newcastle will finish the season
    in 16th place.

12
What else?
  • For the essay I will
  • Give relevant proofs.
  • Look at ways to improve my linear model.
  • See if my model differs when I use Forward
    Selection and Backward Elimination and when I
    change the order of the terms
  • Carry out some influence analysis for my linear
    model.
Write a Comment
User Comments (0)
About PowerShow.com