Statistics - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Statistics

Description:

... quartiles are much larger in the Haddock Ledge population. 3. The ranges are very similar. 4. The median at Woodstock Point is closer to Q1 than at Haddock Ledge. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 38
Provided by: tri5203
Category:

less

Transcript and Presenter's Notes

Title: Statistics


1
Statistics and Parameters
2
Measures of Central tendency - mean, median,
mode Dispersion - range, mean deviation,
variance, standard deviation, coefficient of
variation
3
Central tendency - Mean
The real population (all the little ball
creatures that exist)
4
Your samples
The real population (all the little ball
creatures that exist)
5
Central Tendency 1) Arithmetic mean
The real population (all the little ball
creatures that exist)
At Population level
Measuring the diameters of all the little ball
creatures that exist
m SXi
N
m - population mean
Xi - every measurement in the population
N - population size
6
Your samples
X SXi
X SXi
X SXi
X SXi
X SXi
n
n
n
n
n
7
X SXi
Sum of all measurements in the sample
Sample mean
n
Sample size
8
If you have sampled in an unbiased fashion
X SXi
n
X SXi
n
X SXi
Each roughly equals m
n
X SXi
n
X SXi
n
9
Central tendency - Median
Median - middle value of a population or sample
e.g. Lengths of Mayfly (Ephemeroptera) nymphs
5th value (middle of 9)
1
2
3
4
5
6
7
8
9
10
Odd number of values
Even number of values
Median value
Median value

Median middle value
Median
2
11
Or - to put it more formally
Odd number of values (i.e. n is odd)
Median X(n1)
2
Even number of values
Median X(n/2) X(n/2) 1
2
12
Central tendency - Mode
c. Mode - the most frequently occurring
measurement
Mode
Frequency ( number of times each measurement
appears in the population
Values ( measurements taken)
13
Measures of Dispersion
Why worry about this?? -because not all
populations are created equal
Distribution of values in the populations are
clearly different BUT means and medians are the
same
Mean median
14
Measures of Dispersion - 1. Range -
difference between the highest and lowest values
Remember little ball creatures and the five
samples
Range
-
15
Range - crude measure of dispersion
Note - three samples do not include the highest
value
and - two samples do not include the lowest
16
Measures of Dispersion - 2. Mean Deviation
X is a measure of central tendency
Take difference between each measure and the
mean Xi - X
BUT
SXi - X 0
So this is not useful as it stands
17
Measures of Dispersion - 2. Mean Deviation
(contd)
But if you take the absolute value -get a
measure of disperson
S Xi - X
and
S Xi - X
mean deviation
n
18
Measures of Dispersion - 3. Variance
-eliminate the sign from deviation from
mean Square the difference
(Xi - X)2
And if you add up the squared differences - get
the sum of squares
(hint youll be seeing this a lot!)
S(Xi - X)2
19
Measures of Dispersion - 3. Variance (contd)
Sum of squares can be considered at both the
population and sample level
Sample
Population
ss S(Xi - X)2
SS S(Xi - m)2
20
Measures of Dispersion - 3. Variance (contd)
If you divide by the population or sample size -
get the mean squared deviation or VARIANCE
Sample
Population
s2 S(Xi - X)2
s2 S(Xi - m)2
n-1
N
Population variance
Sample variance
21
Measures of Dispersion - 3. Variance (contd)
Note something about the sample variance
s2 S(Xi - X)2
n-1
Degrees of freedom or df or n
22
Measures of Dispersion - 4. Standard Deviation
- just the square root of the variance
Population
Sample
s S(Xi - X)2
s S(Xi - m)2
n-1
N
23
Standard Deviation - very useful Most data in
any population are within one standard deviation
of the mean
24
Measures of Dispersion - 5. Coefficient of
Variation
Mean length - 2.4 m Variance - 1.6 m S.D. - 1.26
m
Mean length - 2.4 cm, Variance - 1.6 cm S.D. -
1.26 cm
Are elephant ears 100x more variable than mouse
ears?
Variance and standard deviation have magnitudes
dependent on the magnitude of the data
25
Measures of Dispersion - 5. Coefficient of
Variation (contd)
Coefficient of variation
V (s/X) 100
For only ratio data
26
Exploratory Data Analysis -a first step in
analysing any set of data
Stem and Leaf Plot
You have collected data on the number of
barnacles in two different areas of the
intertidal zone in the Bay of Fundy. The
following are the numbers per m2
27
  • Stem and Leaf Plot
  • For each data point, consider the last digit to
    be the leaf and all others to be the stem
  • i.e. for 12 - the 1 is the stem and the 2
    is the leaf
  • i.e. for 345 - the 34 is the stem and the 5
    is the leaf

28
  • Stem and Leaf Plot
  • For each data point, consider the last digit to
    be the leaf and all others to be the stem
  • i.e. for 12 - the 1 is the stem and the 2
    is the leaf
  • i.e. for 345 - the 34 is the stem and the 5
    is the leaf
  • 2. List all the data in increasing order and
    group all the leaves for a particular stem.

e.g. for the high intertidal, it would be 10,
11, 12, 13, 13, 15, 15, 16, 17, 18, 18, 22, 23,
24, 24, 25, 27, 27, 33, 34
29
  • Stem and Leaf Plot
  • For each data point, consider the last digit to
    be the leaf and all others to be the stem
  • i.e. for 12 - the 1 is the stem and the 2
    is the leaf
  • i.e. for 345 - the 34 is the stem and the 5
    is the leaf
  • 2. List all the data in increasing order and
    group all the leaves for a particular stem.

e.g. for the high intertidal, it would be 10,
11, 12, 13, 13, 15, 15, 16, 17, 18, 18, 22, 23,
24, 24, 25, 27, 27, 33, 34
And the stem and leaf plot would be
1 0 1 2 3 3 5 5 6 7 8 8 2 2 3 4 4 5 7 7
3 3 4
stems
leaves
30
  • Stem and Leaf Plot
  • And doing the same thing for the Low intertidal

1 4 8 2 3 3 6 7 3 3 4 5 7 9 4 4 4
5 6 7 5 5 6 6 2
31
  • Stem and Leaf Plot
  • And doing the same thing for the Low intertidal

And compared to the High intertidal
1 4 8 2 3 3 6 7 3 3 4 5 7 9 4 4 4
5 6 7 5 5 6 6 2
1 0 1 2 3 3 5 5 6 7 8 8 2 2 3 4 4 5 7 7
3 3 4
32
Box or Box and Whiskers Plots
The following are data on weights (in mg) of two
populations of a nudibranch mollusc (Dendronotus
frondosus) in the St. Andrews area. n 21 for
each
Haddock Ledge Woodstock Point
12 2 27 5 29 8 29 9 31 10 41 11 44 12 46 1
3 50 14 54 15 55 16 64 17 71 19 78 21 90 25
91 26 95 27 100 28 106 56 121 74 143 141
33
Box or Box and Whiskers Plots
The first step is to arrange the data in
increasing order and then find the median value
of the entire data set.
Haddock Ledge Woodstock Point
12 2 27 5 29 8 30 9 31 10 41 11 44 12 46 1
3 50 14 54 15 55 16 64 17 71 19 78 21 90 25
91 26 95 27 100 28 106 56 121 74 143 141
Median
Median
34
Box or Box and Whiskers Plots
Note that the median divides the data set into
equal halves. The next step is to find the median
of the lower values (the first quartile(Q1)) and
the upper values (the third quartile (Q3)).
Haddock Ledge Woodstock Point
12 2 27 5 29 8 30 9 31 10 41 11 44 12 46 1
3 50 14 54 15 55 16 64 17 71 19 78 21 90 25
91 26 95 27 100 28 106 56 121 74 143 141
Q1
Q1
36
10.5
Median
Median
Q3
Q3
93
26.5
35
Box or Box and Whiskers Plots
Now take these numbers and plot them (along with
the maximum and minimum values) to get the box
and whiskers figure.
0 10 20 30 40 50 60
70 80 90 100 110 120 130
140 150
36
Box or Box and Whiskers Plots
Now take these numbers and plot them (along with
the maximum and minimum values to get the box and
whiskers figure
Median
Q1
Q3
Median
Q1
Q3
0 10 20 30 40 50 60
70 80 90 100 110 120 130
140 150
37
Box or Box and Whiskers Plots
Now take these numbers and plot them (along with
the maximum and minimum values to get the box and
whiskers figure
Median
Q1
Q3
Woodstock Point
Median
Q1
Q3
Haddock Ledge
0 10 20 30 40 50 60
70 80 90 100 110 120 130
140 150
Comparing the plots 1. The medians are very
different 2. The quartiles are much larger in
the Haddock Ledge population 3. The ranges are
very similar 4. The median at Woodstock Point is
closer to Q1 than at Haddock Ledge. This
indicates that there are more lower values and
that the distribution is skewed.
Write a Comment
User Comments (0)
About PowerShow.com