Title: A STATISTICAL STUDY OF
1A STATISTICAL STUDY OF MANCHESTER UNITED IN
THE PREMIER LEAGUE
2CONTENTS
- Perception of Statistics
- Why Manchester United and the Premier League?
- Summary Statistics
- Hypothesis Testing
- Chi Square Test
- Summary
3Perception of Statistics
MEANINGLESS
MIND-NUMBING
UNEXCITING
BORING
POINTLESS
TEDIOUS
STATISTICS
4Perception of Statistics
Statistics are like bikinis. What they reveal
is suggestive, but what they conceal is vital.
Aaron Levenstein
Lies, damned lies, and statistics Mark Twain
5Perception of Statistics
- Problem
- Boring Data Sets.
- How many pupils are interested in the duration of
light bulbs? - Solution
- Something adolescents can relate to.
- Something fun.
- So what did I choose to study?
6MANCHESTER UNITED
7Why Manchester United and the Premiership?
- Football
- Tangible and interesting.
- Manchester United
- Reigning champions of the Premiership.
- Fan base of 333 million (5 of the world
population). - Most popular football team in the world?
- Premiership
- Makes for nice statistical analysis.
8Premier League
- Contests Englands top 20 teams.
- Each team plays all others twice, once at home
and once away. - 38 matches per season (August-May).
- 3 points Win, 1 point Draw, 0 Lose.
- Most points wins.
- Bottom 3 teams relegated to the Football League
Championship.
9Summary Statistics
- Three types of average
- Mean
- Median
- Mode
- Three types of spread
- Range
- Interquartile Range (IQR)
- Standard Deviation
10Summary Statistics Average
- For a discrete data set x1, x2,,xn arranged in
ascending order the averages have the following
properties
Mode
Median
Mean
?
Denoted
Mode
Method
Value that occurs most frequently in the data
set.
11Summary Statistics Spread
- For a discrete data set x1, x2,,xn arranged in
ascending order the spreads are as follows
Interquartile Range
Standard Deviation
Range
s
Denoted
Range
IQR
Method
IQR Q3 Q1
Range xmax - xmin
12Summary Statistics Quartiles
- Three quartiles
- Q1 (Lower Quartile) 25th percentile.
- Q2 (Median) 50th percentile.
- Q3 (Upper Quartile) 75th percentile.
- Calculating Quartiles
- Q2 median
- Then for
- n even split data set into lower and upper
halves. - n odd discard the median before splitting into
halves. - Q1 median of the lower half of the data set.
- Q3 median of the upper half of the data set.
13Summary Statistics Standard Deviation Example
- Find
- the total number in the sample n
- the mean of the sample
- the deviation
- the deviation squared
- the sum of all , i1,,n
- the Standard Deviation using the formula
GOALS SCORED AT HOME AGAINST CHELSEA (8 years)
xi
i
0.25
-0.5
1.5
1
1
0.25
-0.5
1.5
1
2
0.25
-0.5
1.5
1
3
0.25
-0.5
1.5
1
4
0.25
0.5
1.5
2
5
2.25
-1.5
1.5
0
6
2.25
1.5
1.5
3
7
2.25
1.5
1.5
3
8
14Summary Statistics Examples
- We can calculate the averages for the number of
goals scored and conceded over the past four
seasons by Manchester United in the Premier
League. This takes into account 152 games! - Likewise, we can calculate the spreads for such
data
15Summary Statistics Examples
- Goals Scored Against Top Performing Teams (past 8
seasons) - Goals Conceded Against Top Performing Teams (past
8 seasons)
16Summary Statistics Boxplot
- Quick analysis of the distribution of data
- Q1, Q2, Q3
- Whiskers
- Outliers
- Whiskers end on the last given value which is
within or on the upper and lower boundaries. - Lower Boundary 1.5 (IQR)
- Upper Boundary 1.5 (IQR)
- Any value which lies outside these boundaries is
extreme and is called an Outlier.
17- Goals Scored per Game (4
seasons)
- Shots per Game (4 seasons)
Outlier
- Fouls per Game (4 seasons)
- Yellow Cards per Game (4 seasons)
18- Goals Scored per Game against Top Performing
Teams (8 seasons)
- Goals Conceded per Game against Top Performing
Teams (8 seasons)
19- Goals Scored and Conceded per Match per Season
- Fouls and Yellow Cards per Match per Season
20Summary Statistics Histogram
- Graphical display of the frequency of disjoint
categories of the data. - Area of a bar represents number of observations
in the category. - Mathematically
- a mapping mi that counts the number of
observations (n) falling into k categories such
that
21- Goals Scored per Match at Home and Away (4
seasons)
22- Shots Taken and Shots on Target per Match (4
seasons)
23Hypothesis Testing
- Test based on statistical data where you choose
between two hypotheses - H0 vs HA
- H0 must be stated and exists solely to be
falsified by the sample. - Set a value for a, the significance level. This
is the probability for rejecting the null
hypothesis when it is true. - If sample has a smaller probability than a,
reject H0.
24Chi-Square Test
- When data can be split into two categorical
variables such as games with yellow cards dealt
and those without and win/no win, we call this
categorical data. - The chi-squared test (?2 test) finds whether
there is significant evidence of a relationship
between two such variables. - Test for independence.
25Chi-Square Test Method
- Observed Values 2 x 2 contingency table (rc2)
Sample size
- nij number in row i, column j.
- ni. sum of row i.
- n.j sum of column j.
- pij probability of being in row i, column
j. - pi. probability in row i.
- p.j probability in column j.
26Chi-Square Test
Rows and Columns independent P(A n B) P(A) x
P(B)
- H0
- Estimate each
- Under H0
- Cell (i,j)
-
- Test Statistic
Binomial Oij B(n,pij) E(Oij) npij
27Chi-Square Test
28Chi-Square Test
- Large values of ?2 provide evidence against H0
(suggests there may be a relationship). - For large n, given H0, the sampling distribution
is chi-square. - Degrees of Freedom
- d.f. (r 1)(c 1)
- 1 (for a 2 x 2 table)
29Chi-Square Test Example
- Two categorical variables
- Aim To find whether there is a significant
relationship between yellow cards being given and
outcome of the match when played at home. - H0 independence of rows and columns.
- Significance Level a 0.05.
30(No Transcript)
31Chi-Square Test Example
32Chi-Square Test Example
- ?2 Table
- .
- 0.919 1.767 1.875 3.605
- 8.166
- p-value (from tables for 1 d.f.) 0.004269
- p-value is evidence that outcome of the match
and yellow cards given are
related.
33Summary
- Statistics should be interesting and fun.
- One of many data sets.
- Pupils could pick their own
- Favourite football team.
- Formula1.
- Tennis etc.
- Perform hypothesis tests for paired data
- T-test
- Sign test
- Wilcoxon Rank Sum Test
34(No Transcript)
35THANK YOU