Title: Measures of Central Tendency
1Measures of Central Tendency
2The Motivation
- Measure of central tendency are used to describe
the typical member of a population. - Depending on the type of data, typical could have
a variety of best meanings. - We will discuss four of these possible choices.
34 Measures of Central Tendency
- Mean the arithmetic average. This is used for
continuous data. - Median a value that splits the data into two
halves, that is, one half of the data is smaller
than that number, the other half larger. May be
used for continuous or ordinal data. - Mode this is the category that has the most
data. As the description implies it is used for
categorical data. - Midrange not used as often as the other three,
it is found by taking the average of the lowest
and highest number in the data set. Also
primarily used for continuous data.
4Measures of Central Tendency
- The central tendency is measured by averages.
These describe the point about which the various
observed values cluster. - In mathematics, an average, or central tendency
of a data set refers to a measure of the "middle"
or "expected" value of the data set.
5Mean
- To find the mean, add all of the values, then
divide by the number of values. - The lower case, Greek letter mu is used for
population mean. - An x with a bar over it, read x-bar, is used
for sample mean.
6Mean Example
7Arithmetic Mean of Group Data
- if are the mid-values
and - are the corresponding
frequencies, where the subscript k stands for
the number of classes, then the mean is
8Exercise-1 Find the Arithmetic Mean
Class Frequency (f) x fx
20-29 3 24.5 73.5
30-39 5 34.5 172.5
40-49 20 44.5 890
50-59 10 54.5 545
60-69 5 64.5 322.5
Sum N43 2003.5
9Median
- The median is a number chosen so that half of the
values in the data set are smaller than that
number, and the other half are larger. - To find the median
- List the numbers in ascending order
- If there is a number in the middle (odd number of
values) that is the median - If there is not a middle number (even number of
values) take the two in the middle, their average
is the median
10Median Example
11Median
- The implication of this definition is that a
median is the middle value of the observations
such that the number of observations above it is
equal to the number of observations below it.
If n is Even
If n is odd
12Median of Group Data
- L0 Lower class boundary of the median
- class
- h Width of the median class
- f0 Frequency of the median class
- F Cumulative frequency of the pre-
- median class
13Steps to find Median of group data
- Compute the less than type cumulative
frequencies. - Determine N/2 , one-half of the total number of
cases. - Locate the median class for which the cumulative
frequency is more than N/2 . - Determine the lower limit of the median class.
This is L0. - Sum the frequencies of all classes prior to the
median class. This is F. - Determine the frequency of the median class. This
is f0. - Determine the class width of the median class.
This is h.
14Example-Find Median
Age in years Number of births Cumulative number of births
14.5-19.5 677 677
19.5-24.5 1908 2585
24.5-29.5 1737 4332
29.5-34.5 1040 5362
34.5-39.5 294 5656
39.5-44.5 91 5747
44.5-49.5 16 5763
All ages 5763 -
15Mode
- The mode is simply the category or value which
occurs the most in a data set. - If a category has radically more than the others,
it is a mode. - Generally speaking we do not consider more than
two modes in a data set. - No clear guideline exists for deciding how many
more entries a category must have than the others
to constitute a mode.
16Obvious Example
- There is obviously more yellow than red or blue.
- Yellow is the mode.
- The mode is the class, not the frequency.
17Bimodal
18No Mode
- Category Frequency
- 1 51
- 2 51
- 3 66
- 4 62
- 5 65
- 6 57
- 7 47
- 8 43
- 64
- Although the third category is the largest, it is
not sufficiently different to be called the mode.
19Example-2 Find Mean, Median and Mode of Ungroup
Data
The weekly pocket money for 9 first year pupils
was found to be 3 , 12 , 4 , 6 , 1 , 4 , 2 , 5
, 8
Mean 5
Median 4
Mode 4
20Mode of Group Data
- L1 Lower boundary of modal class
- ?1 difference of frequency between
- modal class and class before it
- ?2 difference of frequency between
- modal class and class after
- H class interval
21Steps of Finding Mode
- Find the modal class which has highest frequency
- L0 Lower class boundary of modal class
- h Interval of modal class
- ?1 difference of frequency of modal
- class and class before modal class
- ?2 difference of frequency of modal class and
- class after modal class
22Example -4 Find Mode
Slope Angle () Midpoint (x) Frequency (f) Midpoint x frequency (fx)
0-4 2 6 12
5-9 7 12 84
10-14 12 7 84
15-19 17 5 85
20-24 22 0 0
Total Total n 30 ?(fx) 265
23Midrange
- The midrange is the average of the lowest and
highest value in the data set. - This measure is not often used since it is based
strictly on the two extreme values in the data.
24Midrange Example
25Measures of Variation
Same mean, but y varies more than x.
26Three Measures of Variation
- While there are other measures, we will look at
only three - Variance
- Standard deviation
- Coefficient of variation
- Population mean and sample mean use an identical
formula for calculation. - There is a minor difference in the formulas for
variation.
27Population Variance
- The population variance, s2, is found using
either of the formulas to the right. - The differences are squared to prevent the sum
from being zero for all cases. - N is the size of the population, µ is the
population mean. - Note that variance is always positive if x can
take on more than one value.
28Population Standard Deviation
- The standard deviation can be thought of as the
average amount we could expect the xs in the
population to differ from the mean value of the
population. - To get the standard deviation, simply take the
square root of the variance.
29Sample Variance
- The sample variance, s2, is found using either of
the formulas to the right. - The differences are squared to prevent the sum
from being zero for all cases. - The sample size is n, x-bar is the sample mean.
- Note that n-1 is used rather than n. This
adjustment prevents bias in the estimate.
30Sample Standard Deviation
- Just like the standard deviation of a population,
to find the standard deviation of a sample, take
the square root of the sample variance.
31Coefficient of Variation
- The measures discussed so far are primarily
useful when comparing members from the same
population, or comparing similar populations. - When looking at two or more dissimilar
populations, it doesnt make any more sense to
compare standard deviations than it does to
compare means.
32Coefficient of Variation Cont.
- Example 1 Weight loss programs A and B.
- Two different programs with the same goal and
target population. - While program B averages more weight loss, it
also has less consistent results.
A B
Mean (weight loss per month) 20 25
Standard deviation 15 30
33Coefficient of Variation Cont.
- Example 2 Weight loss program A and tax refund
B. - Two different programs with different goals and
different target populations. - We know that average weight loss and average tax
refund are not comparable. Are the standard
deviations comparable?
A B
Mean 20 650
Standard deviation 15 30
34Coefficient of Variation Cont.
- In the last example we can see an argument that
standard deviation does not give the complete
picture. - The coefficient of variation addresses this issue
by establishing a ratio of the standard deviation
to the mean. This ratio is expressed as a
percentage.
35Coefficient of Variation Cont.
- Looking at the two examples. We see that in both
cases the standard deviation for B is twice that
of A. - In the first example we have almost twice the
relative variation in B. - In the second example, we have a little over 16
times as much variation in A.
A B
CV Example 1 75 120
CV Example 2 75 4.6
36Measures of Position
The dot on the left is at about -1, the dot on
the right is at approximately 0.8. But where are
they relative to the rest of the values in this
distribution.
37Quartiles, Percentiles and Other Fractiles
- We will only consider the quartile, but the same
concept is often extended to percentages or other
fractions. - The median is a good starting point for finding
the quartiles. - Recall that to find the median, we wanted to
locate a point so that half of the data was
smaller, and the other half larger than that
point.
38Quartile
- For quartiles, we want to divide our data into 4
equal pieces.
Suppose we had the following data set (already in
order) 2 3 7 8 8 8 9 13 17 20 21 21
Choosing the numbers 7.5, 8.5, and 18.5 as
markers would Divide the data into 4 groups, each
with three elements. These numbers would be the
three quartiles for this data set.
39Quartiles Continued
- Conceptually, this is easy, simply find the
median, then treat the left hand side as if it
were a data set, and find its median then do the
same to the right hand side. - This is not always simple. Consider the following
data set. - 3 3 3 3 3 5 6 8 8 8 8 8 9
- The first difficulty is that the data set does
not divide nicely. - Using the rules for finding a median, we would
get quartiles of 3, 6 and 8. - The second difficulty is how many of the 3s are
in the first quartile, and how many in the second?
40Quartiles Continued
- For this course, lets pretend that this is not
an issue. - I will give you the quartiles.
- I will not ask how many are in a quartile.
41Interquartile Range
- One method for identifying these outliers,
involves the use of quartiles. - The interquartile range (IQR) is Q3 Q1.
- All numbers less than Q1 1.5(IQR) are probably
too small. - All numbers greater than Q3 1.5(IQR) are
probably too large.
42Measures of Variation Variance Standard
Deviationfor GROUPED DATA
- The grouped variance is
- The grouped standard deviation is
43Example 3-24 (p130) Miles Run per Week
- Find the variance and the standard deviation for
the frequency distribution below. The data
represents the number of miles that 20 runners
ran during one week.
Class f Xm fXm f(Xm X)
5.5 10.5 10.5 15.5 15.5 20.5 20.5 25.5 25.5 30.5 30.5 35.5 35.5 40.5 1 2 3 5 4 3 2 20
8 13 18 23 28 33 38
18 8 213 26 318 54 523 115 428
108 333 99 238 76 SfXm 486
1(8-24.3)2 265.69 2(13-24.3)2
255.38 3(18-24.3)2 119.07 5(23-24.3)2
8.45 4(28-24.3)2 54.76 3(33-24.3)2
227.07 2(38-24.3)2 375.38 S f(Xm X) 1305.80
44Mean Deviation
- The mean deviation is an average of absolute
deviations of individual observations from the
central value of a series. Average deviation
about mean - k Number of classes
- xi Mid point of the i-th class
- fi frequency of the i-th class
45Coefficient of Mean Deviation
- The third relative measure is the coefficient of
mean deviation. As the mean deviation can be
computed from mean, median, mode, or from any
arbitrary value, a general formula for computing
coefficient of mean deviation may be put as
follows
46Coefficient of Range
- The coefficient of range is a relative measure
corresponding to range and is obtained by the
following formula - where, L and S are respectively the largest
and the smallest observations in the data set.
47Coefficient of Quartile Deviation
- The coefficient of quartile deviation is computed
from the first and the third quartiles using the
following formula
48Assignment-1
- Find the following measurement of dispersion from
the data set given in the next page - Range, Percentile range, Quartile Range
- Quartile deviation, Mean deviation, Standard
deviation - Coefficient of variation, Coefficient of mean
deviation, Coefficient of range, Coefficient of
quartile deviation -
49Data for Assignment-1
Marks No. of students Cumulative frequencies
40-50 6 6
50-60 11 17
60-70 19 36
70-80 17 53
80-90 13 66
90-100 4 70
Total 70