QM1 Week 1 Descriptive Statistics - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

QM1 Week 1 Descriptive Statistics

Description:

3.3 Kurtosis ... Measures of skewness and kurtosis of the normal distribution are equal to 0 and 3 ... of a Distribution (Mean, Variance, Skewness, and Kurtosis) ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 25
Provided by: sascha8
Category:

less

Transcript and Presenter's Notes

Title: QM1 Week 1 Descriptive Statistics


1
QM1 Week 1 Descriptive Statistics
  • Dr Alexander Moradi
  • GPRG/CSAE Nuffield College
  • Dept. of Economics, University of Oxford
  • Email alexander.moradi_at_economics.ox.ac.uk

2
3. Descriptive Statistics
  • Def Statistics used to summarise/describe a set
    of numbers
  • Absolute frequency Number of times a certain
    value occurs
  • Relative frequency Ratio of the number of
    observations in a statistical category (with a
    certain value) to the total number of
    observations
  • Cumulative frequency Number of observations
    which are less than or equal to any specified
    number
  • Histogram Visual representation of the number
    or proportion of observations falling into each
    of several categories or intervals

3
3. Frequencies
Example Frequencies of relief payments (in
intervals of 6 shillings)
4
3. Histogram
5
3. Quartiles, Deciles, and Percentiles
  • Values that devide the data into certain
    fractions. Common divisions are
  • Quartiles divide the observations into four
    equal quarters
  • 1st quartile is the value that has 25 of the
    observations below it
  • 3rd quartile is the value that has 75 of the
    observations below it
  • Deciles divide the data into 10 portions of
    equal size
  • 1st decile 10 of the observations have a lower
    value than this
  • 2nd decile 20 of the observations have a lower
    value than this
  • Percentiles divide the data into 100 portions of
    equal size
  • Quartiles are the 25th, 50th, 75th percentile
  • Deciles are the 10th, 20th, 30th, ..., 90th
    percentile

6
3. Four Moments of a Distribution
  • Statistics to summarize the distribution of a
    variable
  • Arithmetic mean
  • Variance
  • Skewness
  • Kurtosis

7
3.1 Measures of Central Tendency
  • 1. Median Value that divides the higher half of
    a sample from the lower half
  • Order the series in ascending array
  • Position(Number of observations1)/2
  • If position is even, then median is the value at
    this position
  • If position is uneven, then median is the average
    between the values at the adjacent positions
  • ? The median is the 2nd quartile, the 5th decile,
    and the 50th percentile
  • 2. Mode Most frequent value
  • 3. Arithmetic mean

In words Sum of all values divided by total
number of observations
8
3.1 Numeric Example
  • Random sample of whatsoever
  • What is the median, mode and mean?

5
5
6
  • What is the effect of adding case H?

5
6
9
  • What is the absolute, what the relative frequency
    of value 5?

9
3.1 Weighted Average
  • The arithmetic mean gives the value of each case
    the same weight
  • Sometimes it is unreasonable to give all
    observations the same weight, e.g. data is
    aggregated by regions that do not have a similar
    size of population e.g. a data set consists of
    compounds, villages, towns
  • We ascertain each observation a weight w

10
3.1 Exercise Descriptive Statistics
  • Data set 1699_RELIEF.dta
  • The file contains data of 311 parishes c.1831
    that Boyer analysed in his study of the Old Poor
    Law. A short introduction and variable
    definitions can be found in FT, p. 496-498
  • Plot a histogram of the value of real property
    (land and building) per head in 1815 (WEALTH)
  • Graphics/ Histogram
  • Alternatively Graphics/ Two-way graph
    (Scatterplot, line, etc.)/ Plot type Histogram
  • Change the bin width of the histogram
  • Import the graph into a word processor
  • In STATA File/Save Graph as .png or .tif
  • In Word Insert/ Picture/ From File/
  • Try right-click the graph in STATA and Copy
    Paste
  • Plot a bar graph of average wealth across English
    counties
  • Graphics/Bar charts/ Summary Statistics/
  • Main Statistic mean, Variables wealth
  • Over groups Over1, Variable county

11
3.1 Exercise Descriptive Statistics
  • Assign value labels to the COUNTY variable
  • label define ccode 1 "Kent 2 Sussex 3 Essex
    4 Suffolk
  • label values county ccode
  • Alternatively Use the data editor
  • What inferences can we draw about spatial
    differences in wealth 1815?
  • Calculate percentiles, quartiles and the deciles
    of WEALTH
  • Use the centile command
  • Calculate the absolute, relative and cumulative
    frequency as well as median, mean and mode of
    variable WEALTH
  • Statistics/Summaries, tables, tests/
    Tables/Table of summary statistics (table)
  • Use the tabulate command
  • Use the mode command
  • (you need to download and install this command)

12
3.2 Measures of Dispersion Variance
  • Two variables with equal arithmetic mean, but
    different spread

f(x)
f(y)
f(x)
f(y)
m
x,y
  • Variable x is more densely distributed around the
    mean m than variable y
  • Variance

The variance is the arithmetic mean of the
squared deviations from the mean
13
3.2 Measures of Dispersion Standard Deviation
  • Standard deviation of variable x

f(x)
f(x)
sx
sx
m
x
  • Interpretation Average or typical deviation of
    variable x from the arithmetic mean

14
3.2 Other Measures of Dispersion
  • Range Difference between minimum and maximum
  • Inter-quartile range Range of the central half
    of observations/ distance between the first and
    third quartile
  • Coefficient of variation Measure of relative
    rather than absolute variation

15
3.3 Shape of the Distribution Skewness
  • Values need not be symmetrically distributed
    around the central point distributions can be
    skewed
  • Mean and standard deviation are insufficient to
    describe the distribution

Frequency
This distribution is skewed to the right
(positively skewed)
Mode
Mean
x
Median
16
3.3 Kurtosis
  • Two variables with equal mean and standard
    deviation, and symmetrically distributed, but a
    different kurtosis

f(x)
f(y)
? Here, variable y has the larger kurtosis than
variable x
f(y)
sy
sx
f(x)
m
x,y
17
3.3 Skewness and Kurtosis
  • Measures of skewness and kurtosis of a
    distribution
  • Skewness and kurtosis of a normal distributed
    variable are zero and three, respectively
  • Skewness
  • a3 gt 0 distribution skewed to the right/
    positively skewed
  • a3 lt 0 distribution skewed to the left/
    negatively skewed
  • Kurtosis
  • a4 gt 3 thinner tails higher peak than a normal
    distribution
  • a4 lt 3 thicker tails lower peak compared to a
    normal distribution
  • For a meaningful and comparable measure of a4,
    the distribution should be symmetrical

18
3.3 Consequences of a Skewed Distribution
  • Especially socio-economic data (wages, income,
    wealth and related variables) is frequently
    skewed
  • Skewed variables can lead to undesirable effects
    in regressions
  • ? Non-normal distributed residuals
    (misspecification)
  • ? Heteroscedasticity test statistics and
    confidence intervals are biased
  • (Roughly) normal distributed variables help to
    avoid these problems. Take a look at the variable
  • If the variable is not significantly skewed,
    continue
  • If the variable is skewed, transform the
    variable Ladder of Powers. For this reason you
    often find the logarithm of income, the square
    root of the mortality rate, etc.

19
3.4 Normal Distribution
  • The normal distribution is a symmetrical, smooth,
    bell-shaped distribution that is fully described
    by the arithmetic mean and standard deviation
  • Mode, median and mean are equal
  • Measures of skewness and kurtosis of the normal
    distribution are equal to 0 and 3
  • Key role in inductive
  • statistics

20
3.4 Exercise The Four Moments of a Distribution
(Mean, Variance, Skewness, and Kurtosis)
  • Data set 1699_RELIEF.dta
  • Calculate the standard deviation, inter-quartile
    range, skewness, and kurtosis of the WEALTH
    variable
  • Statistics/Summaries, tables, tests/
    Tables/Table of summary statistics (tabstat)
  • Is the distribution positively skewed? Has the
    distribution thicker tails than a normal
    distribution?
  • Plot a histogram of the variable WEALTH and add a
    normal curve
  • Graphics/ Histogram
  • Density Plots/Add normal density plot
  • Does the visual test agree with the measures of
    skewness and kurtosis?
  • Test whether the WEALTH variable can be
    reasonably expected to be normal distributed
  • Use the command sktest

21
3. STATA commands
22
3. Homework Exercises
  • Read chapter 1 2 of FT
  • Replicate Figure 2.2, upper lower panel, in FT
    (p. 38) using STATA Hints see next page
  • Calculate the weighted arithmetic mean of the
    migration rate (CNTYMIG) in 1911 (Table 2.4, p.
    45)
  • Do the following exercises from FT (p. 66-70)
    2, 7, 8
  • Read The Economist, Sep 30th 2006, Soho Surprise
    - What happened when drinking hours were
    liberalized
  • Give a short outline of how data can be used to
    analyse the effect of the relaxation of licensing
    hours?
  • What are the difficulties?
  • Is the graph informative? Point to flaws in the
    graph!
  • Try the UCLA online course at lthttp//www.ats.ucla
    .edu/stat/stata/notes3/default.htmgt
  • !!!! Dont send me your log file !!!! Use a word
    processor !!!! Include all answers in one file
    !!!! Generate concise tables !!!! No more than 4
    pages !!!!
  • DEADLINE 14 Oct, midnight

23
Hints Replicating the histograms in FT (p.38)
  • Use the relief data set - the 1699_RELIEF.dta
    file
  • Open it in STATA
  • Choose from the menu bar Graphics/Histogram. A
    window opens where you enter the instructions
  • Enter under Main/ Variable relief FT used
    intervals, each 6 shillings wide
  • Tick "Width of bins" and enter 6
  • Tick "Lower limit of first bin" and enter 0
  • Click on Submit gt the shape of the histogram
    should be exactly the same as in Figure 2.2,
    upper panelThe rest is labelling the axes
    appropriately
  • On the right hand side of the Main tab you see
    Y-axis Tick Frequency (it gives you the absolute
    frequency as units on the y-axis)
  • Click on the tab "Y-Axis" and enter as Title
    "Number of parishes" (with or without
    apostrophes)
  • Click on the tab "X-Axis" and enter as Title
    "Relief payments (shillings)"
  • Tab "X-Axis" Under Ticks/ Lines. Enter in the
    Custom field 3 "0-6" 9 "6-12" 15 "12-18" 21
    "18-24" 27 "24-30" 33 "30-36" 39 "36-42"
    45"42-48" The first number is where you want to
    have a tick at the x-axis, i.e at 3 shillings
    (the centre between 0 and 6) you want to have a
    tick and below the tick a label "0-6", at 9
    shillings you want to have another tick and below
    the tick a label "6-12"... It is a very special
    labelling of the bins, therefore "CustomTry
    label the centres of the bins/bars. Try the
    following
  • Delete what you entered under Custom. Enter in
    the "Rule" field 3 (6) 45, i.e. you want to have
    ticks starting from 3 to 45 in steps of 6. If you
    dont like these labels, enter in the "Rule"
    field 0 (6) 48 instead
  • For the lower panel, you need to change the width
    of the bins, the lower limit and adjust the
    labels, i.e. you have to change the entries of
    steps 5, 6 and 11

24
Appendix County codes in the Relief data set
(1699_RELIEF.dta)
Write a Comment
User Comments (0)
About PowerShow.com