A STATISTICAL STUDY OF - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

A STATISTICAL STUDY OF

Description:

'Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. ... How many pupils are interested in the duration of light bulbs? ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 36
Provided by: npcsx
Category:

less

Transcript and Presenter's Notes

Title: A STATISTICAL STUDY OF


1
A STATISTICAL STUDY OF MANCHESTER UNITED IN
THE PREMIER LEAGUE
2
CONTENTS
  • Perception of Statistics
  • Why Manchester United and the Premier League?
  • Summary Statistics
  • Hypothesis Testing
  • Chi Square Test
  • Summary

3
Perception of Statistics
MEANINGLESS
MIND-NUMBING
UNEXCITING
BORING
POINTLESS
TEDIOUS
STATISTICS
4
Perception of Statistics
Statistics are like bikinis.  What they reveal
is suggestive, but what they conceal is vital.
Aaron Levenstein
Lies, damned lies, and statistics Mark Twain
5
Perception of Statistics
  • Problem
  • Boring Data Sets.
  • How many pupils are interested in the duration of
    light bulbs?
  • Solution
  • Something adolescents can relate to.
  • Something fun.
  • So what did I choose to study?

6
MANCHESTER UNITED
7
Why Manchester United and the Premiership?
  • Football
  • Tangible and interesting.
  • Manchester United
  • Reigning champions of the Premiership.
  • Fan base of 333 million (5 of the world
    population).
  • Most popular football team in the world?
  • Premiership
  • Makes for nice statistical analysis.

8
Premier League
  • Contests Englands top 20 teams.
  • Each team plays all others twice, once at home
    and once away.
  • 38 matches per season (August-May).
  • 3 points Win, 1 point Draw, 0 Lose.
  • Most points wins.
  • Bottom 3 teams relegated to the Football League
    Championship.

9
Summary Statistics
  • Three types of average
  • Mean
  • Median
  • Mode
  • Three types of spread
  • Range
  • Interquartile Range (IQR)
  • Standard Deviation

10
Summary Statistics Average
  • For a discrete data set x1, x2,,xn arranged in
    ascending order the averages have the following
    properties

Mode
Median
Mean
?
Denoted
Mode

Method
Value that occurs most frequently in the data
set.
11
Summary Statistics Spread
  • For a discrete data set x1, x2,,xn arranged in
    ascending order the spreads are as follows

Interquartile Range
Standard Deviation
Range

s
Denoted
Range
IQR
Method
IQR Q3 Q1
Range xmax - xmin
12
Summary Statistics Quartiles
  • Three quartiles
  • Q1 (Lower Quartile) 25th percentile.
  • Q2 (Median) 50th percentile.
  • Q3 (Upper Quartile) 75th percentile.
  • Calculating Quartiles
  • Q2 median
  • Then for
  • n even split data set into lower and upper
    halves.
  • n odd discard the median before splitting into
    halves.
  • Q1 median of the lower half of the data set.
  • Q3 median of the upper half of the data set.

13
Summary Statistics Standard Deviation Example
  • Find
  • the total number in the sample n
  • the mean of the sample
  • the deviation
  • the deviation squared
  • the sum of all , i1,,n
  • the Standard Deviation using the formula

GOALS SCORED AT HOME AGAINST CHELSEA (8 years)
xi
i
0.25
-0.5
1.5
1
1
0.25
-0.5
1.5
1
2
0.25
-0.5
1.5
1
3
0.25
-0.5
1.5
1
4
0.25
0.5
1.5
2
5
2.25
-1.5
1.5
0
6
2.25
1.5
1.5
3
7
2.25
1.5
1.5
3
8
14
Summary Statistics Examples
  • We can calculate the averages for the number of
    goals scored and conceded over the past four
    seasons by Manchester United in the Premier
    League. This takes into account 152 games!
  • Likewise, we can calculate the spreads for such
    data

15
Summary Statistics Examples
  • Goals Scored Against Top Performing Teams (past 8
    seasons)
  • Goals Conceded Against Top Performing Teams (past
    8 seasons)

16
Summary Statistics Boxplot
  • Quick analysis of the distribution of data
  • Q1, Q2, Q3
  • Whiskers
  • Outliers
  • Whiskers end on the last given value which is
    within or on the upper and lower boundaries.
  • Lower Boundary 1.5 (IQR)
  • Upper Boundary 1.5 (IQR)
  • Any value which lies outside these boundaries is
    extreme and is called an Outlier.

17
  • Goals Scored per Game (4
    seasons)
  • Shots per Game (4 seasons)

Outlier
  • Fouls per Game (4 seasons)
  • Yellow Cards per Game (4 seasons)

18
  • Goals Scored per Game against Top Performing
    Teams (8 seasons)
  • Goals Conceded per Game against Top Performing
    Teams (8 seasons)

19
  • Goals Scored and Conceded per Match per Season
  • Fouls and Yellow Cards per Match per Season

20
Summary Statistics Histogram
  • Graphical display of the frequency of disjoint
    categories of the data.
  • Area of a bar represents number of observations
    in the category.
  • Mathematically
  • a mapping mi that counts the number of
    observations (n) falling into k categories such
    that

21
  • Goals Scored per Match at Home and Away (4
    seasons)

22
  • Shots Taken and Shots on Target per Match (4
    seasons)

23
Hypothesis Testing
  • Test based on statistical data where you choose
    between two hypotheses
  • H0 vs HA
  • H0 must be stated and exists solely to be
    falsified by the sample.
  • Set a value for a, the significance level. This
    is the probability for rejecting the null
    hypothesis when it is true.
  • If sample has a smaller probability than a,
    reject H0.

24
Chi-Square Test
  • When data can be split into two categorical
    variables such as games with yellow cards dealt
    and those without and win/no win, we call this
    categorical data.
  • The chi-squared test (?2 test) finds whether
    there is significant evidence of a relationship
    between two such variables.
  • Test for independence.

25
Chi-Square Test Method
  • Observed Values 2 x 2 contingency table (rc2)

Sample size
  • nij number in row i, column j.
  • ni. sum of row i.
  • n.j sum of column j.
  • pij probability of being in row i, column
    j.
  • pi. probability in row i.
  • p.j probability in column j.

26
Chi-Square Test
Rows and Columns independent P(A n B) P(A) x
P(B)
  • H0
  • Estimate each
  • Under H0
  • Cell (i,j)
  • Test Statistic

Binomial Oij B(n,pij) E(Oij) npij
27
Chi-Square Test
  • Expected Table
  • ?2 Table

28
Chi-Square Test
  • Large values of ?2 provide evidence against H0
    (suggests there may be a relationship).
  • For large n, given H0, the sampling distribution
    is chi-square.
  • Degrees of Freedom
  • d.f. (r 1)(c 1)
  • 1 (for a 2 x 2 table)

29
Chi-Square Test Example
  • Two categorical variables
  • Aim To find whether there is a significant
    relationship between yellow cards being given and
    outcome of the match when played at home.
  • H0 independence of rows and columns.
  • Significance Level a 0.05.

30
(No Transcript)
31
Chi-Square Test Example
  • Observed
  • ?2
  • Expected

32
Chi-Square Test Example
  • ?2 Table
  • .
  • 0.919 1.767 1.875 3.605
  • 8.166
  • p-value (from tables for 1 d.f.) 0.004269
  • p-value is evidence that outcome of the match
    and yellow cards given are
    related.

33
Summary
  • Statistics should be interesting and fun.
  • One of many data sets.
  • Pupils could pick their own
  • Favourite football team.
  • Formula1.
  • Tennis etc.
  • Perform hypothesis tests for paired data
  • T-test
  • Sign test
  • Wilcoxon Rank Sum Test

34
(No Transcript)
35
THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com