Measures of Central Tendency - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

Measures of Central Tendency

Description:

Title: PowerPoint Presentation Last modified by: intel Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show (4:3) Other titles – PowerPoint PPT presentation

Number of Views:364

Avg rating:3.0/5.0

Slides: 50

Provided by: pbwo527

Category:

more less

Transcript and Presenter's Notes

Title: Measures of Central Tendency

1
Measures of Central Tendency

By Rahul Jain

2
The Motivation

Measure of central tendency are used to describe
the typical member of a population.
Depending on the type of data, typical could have
a variety of best meanings.
We will discuss four of these possible choices.

3
4 Measures of Central Tendency

Mean the arithmetic average. This is used for
continuous data.
Median a value that splits the data into two
halves, that is, one half of the data is smaller
than that number, the other half larger. May be
used for continuous or ordinal data.
Mode this is the category that has the most
data. As the description implies it is used for
categorical data.
Midrange not used as often as the other three,
it is found by taking the average of the lowest
and highest number in the data set. Also
primarily used for continuous data.

4
Measures of Central Tendency

The central tendency is measured by averages.
These describe the point about which the various
observed values cluster.
In mathematics, an average, or central tendency
of a data set refers to a measure of the "middle"
or "expected" value of the data set.

5
Mean

To find the mean, add all of the values, then
divide by the number of values.
The lower case, Greek letter mu is used for
population mean.
An x with a bar over it, read x-bar, is used
for sample mean.

6
Mean Example
7
Arithmetic Mean of Group Data

if are the mid-values
and
are the corresponding
frequencies, where the subscript k stands for
the number of classes, then the mean is

8
Exercise-1 Find the Arithmetic Mean
Class Frequency (f) x fx
20-29 3 24.5 73.5
30-39 5 34.5 172.5
40-49 20 44.5 890
50-59 10 54.5 545
60-69 5 64.5 322.5
Sum N43 2003.5
9
Median

The median is a number chosen so that half of the
values in the data set are smaller than that
number, and the other half are larger.
To find the median
List the numbers in ascending order
If there is a number in the middle (odd number of
values) that is the median
If there is not a middle number (even number of
values) take the two in the middle, their average
is the median

10
Median Example
11
Median

The implication of this definition is that a
median is the middle value of the observations
such that the number of observations above it is
equal to the number of observations below it.

If n is Even
If n is odd
12
Median of Group Data

L0 Lower class boundary of the median
class
h Width of the median class
f0 Frequency of the median class
F Cumulative frequency of the pre-
median class

13
Steps to find Median of group data

Compute the less than type cumulative
frequencies.
Determine N/2 , one-half of the total number of
cases.
Locate the median class for which the cumulative
frequency is more than N/2 .
Determine the lower limit of the median class.
This is L0.
Sum the frequencies of all classes prior to the
median class. This is F.
Determine the frequency of the median class. This
is f0.
Determine the class width of the median class.
This is h.

14
Example-Find Median
Age in years Number of births Cumulative number of births
14.5-19.5 677 677
19.5-24.5 1908 2585
24.5-29.5 1737 4332
29.5-34.5 1040 5362
34.5-39.5 294 5656
39.5-44.5 91 5747
44.5-49.5 16 5763
All ages 5763 -
15
Mode

The mode is simply the category or value which
occurs the most in a data set.
If a category has radically more than the others,
it is a mode.
Generally speaking we do not consider more than
two modes in a data set.
No clear guideline exists for deciding how many
more entries a category must have than the others
to constitute a mode.

16
Obvious Example

There is obviously more yellow than red or blue.
Yellow is the mode.
The mode is the class, not the frequency.

17
Bimodal
18
No Mode

Category Frequency
1 51
2 51
3 66
4 62
5 65
6 57
7 47
8 43
64
Although the third category is the largest, it is
not sufficiently different to be called the mode.

19
Example-2 Find Mean, Median and Mode of Ungroup
Data
The weekly pocket money for 9 first year pupils
was found to be 3 , 12 , 4 , 6 , 1 , 4 , 2 , 5
, 8
Mean 5
Median 4
Mode 4
20
Mode of Group Data

L1 Lower boundary of modal class
?1 difference of frequency between
modal class and class before it
?2 difference of frequency between
modal class and class after
H class interval

21
Steps of Finding Mode

Find the modal class which has highest frequency
L0 Lower class boundary of modal class
h Interval of modal class
?1 difference of frequency of modal
class and class before modal class
?2 difference of frequency of modal class and
class after modal class

22
Example -4 Find Mode
Slope Angle () Midpoint (x) Frequency (f) Midpoint x frequency (fx)
0-4 2 6 12
5-9 7 12 84
10-14 12 7 84
15-19 17 5 85
20-24 22 0 0
Total Total n 30 ?(fx) 265
23
Midrange

The midrange is the average of the lowest and
highest value in the data set.
This measure is not often used since it is based
strictly on the two extreme values in the data.

24
Midrange Example
25
Measures of Variation
Same mean, but y varies more than x.
26
Three Measures of Variation

While there are other measures, we will look at
only three
Variance
Standard deviation
Coefficient of variation
Population mean and sample mean use an identical
formula for calculation.
There is a minor difference in the formulas for
variation.

27
Population Variance

The population variance, s2, is found using
either of the formulas to the right.
The differences are squared to prevent the sum
from being zero for all cases.
N is the size of the population, µ is the
population mean.
Note that variance is always positive if x can
take on more than one value.

28
Population Standard Deviation

The standard deviation can be thought of as the
average amount we could expect the xs in the
population to differ from the mean value of the
population.
To get the standard deviation, simply take the
square root of the variance.

29
Sample Variance

The sample variance, s2, is found using either of
the formulas to the right.
The differences are squared to prevent the sum
from being zero for all cases.
The sample size is n, x-bar is the sample mean.
Note that n-1 is used rather than n. This
adjustment prevents bias in the estimate.

30
Sample Standard Deviation

Just like the standard deviation of a population,
to find the standard deviation of a sample, take
the square root of the sample variance.

31
Coefficient of Variation

The measures discussed so far are primarily
useful when comparing members from the same
population, or comparing similar populations.
When looking at two or more dissimilar
populations, it doesnt make any more sense to
compare standard deviations than it does to
compare means.

32
Coefficient of Variation Cont.

Example 1 Weight loss programs A and B.
Two different programs with the same goal and
target population.
While program B averages more weight loss, it
also has less consistent results.

A B
Mean (weight loss per month) 20 25
Standard deviation 15 30
33
Coefficient of Variation Cont.

Example 2 Weight loss program A and tax refund
B.
Two different programs with different goals and
different target populations.
We know that average weight loss and average tax
refund are not comparable. Are the standard
deviations comparable?

A B
Mean 20 650
Standard deviation 15 30
34
Coefficient of Variation Cont.

In the last example we can see an argument that
standard deviation does not give the complete
picture.
The coefficient of variation addresses this issue
by establishing a ratio of the standard deviation
to the mean. This ratio is expressed as a
percentage.

35
Coefficient of Variation Cont.

Looking at the two examples. We see that in both
cases the standard deviation for B is twice that
of A.
In the first example we have almost twice the
relative variation in B.
In the second example, we have a little over 16
times as much variation in A.

A B
CV Example 1 75 120
CV Example 2 75 4.6
36
Measures of Position
The dot on the left is at about -1, the dot on
the right is at approximately 0.8. But where are
they relative to the rest of the values in this
distribution.
37
Quartiles, Percentiles and Other Fractiles

We will only consider the quartile, but the same
concept is often extended to percentages or other
fractions.
The median is a good starting point for finding
the quartiles.
Recall that to find the median, we wanted to
locate a point so that half of the data was
smaller, and the other half larger than that
point.

38
Quartile

For quartiles, we want to divide our data into 4
equal pieces.

Suppose we had the following data set (already in
order) 2 3 7 8 8 8 9 13 17 20 21 21
Choosing the numbers 7.5, 8.5, and 18.5 as
markers would Divide the data into 4 groups, each
with three elements. These numbers would be the
three quartiles for this data set.
39
Quartiles Continued

Conceptually, this is easy, simply find the
median, then treat the left hand side as if it
were a data set, and find its median then do the
same to the right hand side.
This is not always simple. Consider the following
data set.
3 3 3 3 3 5 6 8 8 8 8 8 9
The first difficulty is that the data set does
not divide nicely.
Using the rules for finding a median, we would
get quartiles of 3, 6 and 8.
The second difficulty is how many of the 3s are
in the first quartile, and how many in the second?

40
Quartiles Continued

For this course, lets pretend that this is not
an issue.
I will give you the quartiles.
I will not ask how many are in a quartile.

41
Interquartile Range

One method for identifying these outliers,
involves the use of quartiles.
The interquartile range (IQR) is Q3 Q1.
All numbers less than Q1 1.5(IQR) are probably
too small.
All numbers greater than Q3 1.5(IQR) are
probably too large.

42
Measures of Variation Variance Standard
Deviationfor GROUPED DATA

The grouped variance is
The grouped standard deviation is

43
Example 3-24 (p130) Miles Run per Week

Find the variance and the standard deviation for
the frequency distribution below. The data
represents the number of miles that 20 runners
ran during one week.

Class f Xm fXm f(Xm X)
5.5 10.5 10.5 15.5 15.5 20.5 20.5 25.5 25.5 30.5 30.5 35.5 35.5 40.5 1 2 3 5 4 3 2 20
8 13 18 23 28 33 38
18 8 213 26 318 54 523 115 428
108 333 99 238 76 SfXm 486
1(8-24.3)2 265.69 2(13-24.3)2
255.38 3(18-24.3)2 119.07 5(23-24.3)2
8.45 4(28-24.3)2 54.76 3(33-24.3)2
227.07 2(38-24.3)2 375.38 S f(Xm X) 1305.80
44
Mean Deviation

The mean deviation is an average of absolute
deviations of individual observations from the
central value of a series. Average deviation
about mean
k Number of classes
xi Mid point of the i-th class
fi frequency of the i-th class

45
Coefficient of Mean Deviation

The third relative measure is the coefficient of
mean deviation. As the mean deviation can be
computed from mean, median, mode, or from any
arbitrary value, a general formula for computing
coefficient of mean deviation may be put as
follows

46
Coefficient of Range

The coefficient of range is a relative measure
corresponding to range and is obtained by the
following formula
where, L and S are respectively the largest
and the smallest observations in the data set.

47
Coefficient of Quartile Deviation

The coefficient of quartile deviation is computed
from the first and the third quartiles using the
following formula

48
Assignment-1

Find the following measurement of dispersion from
the data set given in the next page
Range, Percentile range, Quartile Range
Quartile deviation, Mean deviation, Standard
deviation
Coefficient of variation, Coefficient of mean
deviation, Coefficient of range, Coefficient of
quartile deviation