Title: Statistics
1Statistics and Parameters
2Measures of Central tendency - mean, median,
mode Dispersion - range, mean deviation,
variance, standard deviation, coefficient of
variation
3Central tendency - Mean
The real population (all the little ball
creatures that exist)
4Your samples
The real population (all the little ball
creatures that exist)
5Central Tendency 1) Arithmetic mean
The real population (all the little ball
creatures that exist)
At Population level
Measuring the diameters of all the little ball
creatures that exist
m SXi
N
m - population mean
Xi - every measurement in the population
N - population size
6Your samples
X SXi
X SXi
X SXi
X SXi
X SXi
n
n
n
n
n
7X SXi
Sum of all measurements in the sample
Sample mean
n
Sample size
8If you have sampled in an unbiased fashion
X SXi
n
X SXi
n
X SXi
Each roughly equals m
n
X SXi
n
X SXi
n
9Central tendency - Median
Median - middle value of a population or sample
e.g. Lengths of Mayfly (Ephemeroptera) nymphs
5th value (middle of 9)
1
2
3
4
5
6
7
8
9
10Odd number of values
Even number of values
Median value
Median value
Median middle value
Median
2
11Or - to put it more formally
Odd number of values (i.e. n is odd)
Median X(n1)
2
Even number of values
Median X(n/2) X(n/2) 1
2
12Central tendency - Mode
c. Mode - the most frequently occurring
measurement
Mode
Frequency ( number of times each measurement
appears in the population
Values ( measurements taken)
13Measures of Dispersion
Why worry about this?? -because not all
populations are created equal
Distribution of values in the populations are
clearly different BUT means and medians are the
same
Mean median
14Measures of Dispersion - 1. Range -
difference between the highest and lowest values
Remember little ball creatures and the five
samples
Range
-
15Range - crude measure of dispersion
Note - three samples do not include the highest
value
and - two samples do not include the lowest
16Measures of Dispersion - 2. Mean Deviation
X is a measure of central tendency
Take difference between each measure and the
mean Xi - X
BUT
SXi - X 0
So this is not useful as it stands
17Measures of Dispersion - 2. Mean Deviation
(contd)
But if you take the absolute value -get a
measure of disperson
S Xi - X
and
S Xi - X
mean deviation
n
18Measures of Dispersion - 3. Variance
-eliminate the sign from deviation from
mean Square the difference
(Xi - X)2
And if you add up the squared differences - get
the sum of squares
(hint youll be seeing this a lot!)
S(Xi - X)2
19Measures of Dispersion - 3. Variance (contd)
Sum of squares can be considered at both the
population and sample level
Sample
Population
ss S(Xi - X)2
SS S(Xi - m)2
20Measures of Dispersion - 3. Variance (contd)
If you divide by the population or sample size -
get the mean squared deviation or VARIANCE
Sample
Population
s2 S(Xi - X)2
s2 S(Xi - m)2
n-1
N
Population variance
Sample variance
21Measures of Dispersion - 3. Variance (contd)
Note something about the sample variance
s2 S(Xi - X)2
n-1
Degrees of freedom or df or n
22Measures of Dispersion - 4. Standard Deviation
- just the square root of the variance
Population
Sample
s S(Xi - X)2
s S(Xi - m)2
n-1
N
23Standard Deviation - very useful Most data in
any population are within one standard deviation
of the mean
24Measures of Dispersion - 5. Coefficient of
Variation
Mean length - 2.4 m Variance - 1.6 m S.D. - 1.26
m
Mean length - 2.4 cm, Variance - 1.6 cm S.D. -
1.26 cm
Are elephant ears 100x more variable than mouse
ears?
Variance and standard deviation have magnitudes
dependent on the magnitude of the data
25Measures of Dispersion - 5. Coefficient of
Variation (contd)
Coefficient of variation
V (s/X) 100
For only ratio data
26Exploratory Data Analysis -a first step in
analysing any set of data
Stem and Leaf Plot
You have collected data on the number of
barnacles in two different areas of the
intertidal zone in the Bay of Fundy. The
following are the numbers per m2
27- Stem and Leaf Plot
- For each data point, consider the last digit to
be the leaf and all others to be the stem - i.e. for 12 - the 1 is the stem and the 2
is the leaf - i.e. for 345 - the 34 is the stem and the 5
is the leaf
28- Stem and Leaf Plot
- For each data point, consider the last digit to
be the leaf and all others to be the stem - i.e. for 12 - the 1 is the stem and the 2
is the leaf - i.e. for 345 - the 34 is the stem and the 5
is the leaf - 2. List all the data in increasing order and
group all the leaves for a particular stem.
e.g. for the high intertidal, it would be 10,
11, 12, 13, 13, 15, 15, 16, 17, 18, 18, 22, 23,
24, 24, 25, 27, 27, 33, 34
29- Stem and Leaf Plot
- For each data point, consider the last digit to
be the leaf and all others to be the stem - i.e. for 12 - the 1 is the stem and the 2
is the leaf - i.e. for 345 - the 34 is the stem and the 5
is the leaf - 2. List all the data in increasing order and
group all the leaves for a particular stem.
e.g. for the high intertidal, it would be 10,
11, 12, 13, 13, 15, 15, 16, 17, 18, 18, 22, 23,
24, 24, 25, 27, 27, 33, 34
And the stem and leaf plot would be
1 0 1 2 3 3 5 5 6 7 8 8 2 2 3 4 4 5 7 7
3 3 4
stems
leaves
30- Stem and Leaf Plot
- And doing the same thing for the Low intertidal
1 4 8 2 3 3 6 7 3 3 4 5 7 9 4 4 4
5 6 7 5 5 6 6 2
31- Stem and Leaf Plot
- And doing the same thing for the Low intertidal
And compared to the High intertidal
1 4 8 2 3 3 6 7 3 3 4 5 7 9 4 4 4
5 6 7 5 5 6 6 2
1 0 1 2 3 3 5 5 6 7 8 8 2 2 3 4 4 5 7 7
3 3 4
32Box or Box and Whiskers Plots
The following are data on weights (in mg) of two
populations of a nudibranch mollusc (Dendronotus
frondosus) in the St. Andrews area. n 21 for
each
Haddock Ledge Woodstock Point
12 2 27 5 29 8 29 9 31 10 41 11 44 12 46 1
3 50 14 54 15 55 16 64 17 71 19 78 21 90 25
91 26 95 27 100 28 106 56 121 74 143 141
33Box or Box and Whiskers Plots
The first step is to arrange the data in
increasing order and then find the median value
of the entire data set.
Haddock Ledge Woodstock Point
12 2 27 5 29 8 30 9 31 10 41 11 44 12 46 1
3 50 14 54 15 55 16 64 17 71 19 78 21 90 25
91 26 95 27 100 28 106 56 121 74 143 141
Median
Median
34Box or Box and Whiskers Plots
Note that the median divides the data set into
equal halves. The next step is to find the median
of the lower values (the first quartile(Q1)) and
the upper values (the third quartile (Q3)).
Haddock Ledge Woodstock Point
12 2 27 5 29 8 30 9 31 10 41 11 44 12 46 1
3 50 14 54 15 55 16 64 17 71 19 78 21 90 25
91 26 95 27 100 28 106 56 121 74 143 141
Q1
Q1
36
10.5
Median
Median
Q3
Q3
93
26.5
35Box or Box and Whiskers Plots
Now take these numbers and plot them (along with
the maximum and minimum values) to get the box
and whiskers figure.
0 10 20 30 40 50 60
70 80 90 100 110 120 130
140 150
36Box or Box and Whiskers Plots
Now take these numbers and plot them (along with
the maximum and minimum values to get the box and
whiskers figure
Median
Q1
Q3
Median
Q1
Q3
0 10 20 30 40 50 60
70 80 90 100 110 120 130
140 150
37Box or Box and Whiskers Plots
Now take these numbers and plot them (along with
the maximum and minimum values to get the box and
whiskers figure
Median
Q1
Q3
Woodstock Point
Median
Q1
Q3
Haddock Ledge
0 10 20 30 40 50 60
70 80 90 100 110 120 130
140 150
Comparing the plots 1. The medians are very
different 2. The quartiles are much larger in
the Haddock Ledge population 3. The ranges are
very similar 4. The median at Woodstock Point is
closer to Q1 than at Haddock Ledge. This
indicates that there are more lower values and
that the distribution is skewed.