Numerical Descriptive Measures

About This Presentation

Title:

Numerical Descriptive Measures

Description:

Numerical Descriptive Measures Week 3 Objectives On completion of this module you will be able to: calculate and interpret measures of central tendency (mean, median ... – PowerPoint PPT presentation

Number of Views:151

Avg rating:3.0/5.0

Slides: 70

Provided by: Lind51

Category:

more less

Transcript and Presenter's Notes

Title: Numerical Descriptive Measures

1
Numerical Descriptive Measures

Week 3

2
Objectives

On completion of this module you will be able to
calculate and interpret measures of central
tendency (mean, median mode),
calculate and interpret quartiles, range,
interquartile range, variance and standard
deviation,
calculate and interpret the coefficient of
variation,
understand and utilise the empirical rule and the
Bienaymé-Chebyshev Rule,

3
Objectives

On completion of this module you will be able to
construct a box-and-whisker plot,
calculate the covariance and correlation and
discuss pitfalls and ethical issues relating to
descriptive measures.

4
Guide for study this week

Print out Section 3.7 of the text (on CD) so that
you can bring it in to the exam room
Read Appendices A (algebra review), B (summation
notation) and C (statistical symbols). This
material will help you understand the course
content.

5
Example 3-1

A manufacturer of mobile phones has been
concerned that the latest model of the battery is
not lasting as long as anticipated.
They take a random sample of 20 phones and
batteries, and record how long they take to go
flat (this is done by turning the phones on and
leave them switched on until the battery goes
flat).

6
Example 3-1

The following data (battery life in hours) are
the result
Check that you can replicate the results
discussed here using Excel and PHStat2.

42 42 48 45 51 45 48 44 43 42
46 46 47 48 40 48 42 48 51 50
7
(a) Mean, median and mode

Mean

8
(a) Mean, median and mode

Median is observation.
Order data from smallest to largest
Find 10th and 11th values 46 and 46.
Half-way between these is the median 46.

40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
9
(a) Mean, median and mode

Mode is the data value that occurs most often
is most typical.
Mode 48 (appears fives times)

10
(a) Mean, median and mode

Mean and median are similar probably best
measures of middle for this data set.
Mode is usually only a good measure of the middle
for large data sets.
The manufacturer will use this information to
determine what is normal or most usual for
battery life. This would be helpful in producing
and maintaining quality products and in
benchmarking.

11
(b) Quartiles

Lower (or first) quartile (LQ) is the value
or the value.
The text rounds this value to the nearest whole
number and takes that data value the 5th data
point is 42.
Some texts take the value 0.25 of the way from
the 5th (42) to the 6th (43) value 42.25.

40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
12
(b) Quartiles

Upper (or third) quartile (UQ) is
value or the value.
UQ is the 16th value 48.
OR take the value 0.75 of the way from the 15th
(48) to the 16th (48) value 48.

40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
13
(c) Variance standard deviation

The text uses a different formula, but a better
computational formula for the variance is
This only requires we find the sum of the data
and the sum of the squared data to do the
calculation.

14
(c) Variance standard deviation
15
(c) Variance standard deviation
Note avoid rounding errors dont round until
the final stage!!!
16
Interpreting variance and standard deviation

Variance and standard deviation measure average
scatter around the mean.
Variance results in squared units (eg squared
dollars, squared metres etc).
Therefore we usually interpret the standard
deviation which is measured in the original units
(dollars, metres etc).
Much of the data is within one standard deviation
either side of the mean (more on this later).
Measures of variation (variance, standard
deviation, range and IQR) are always greater than
or equal to zero.

17
Populations parameters

The formulae we have been using calculate the
mean and standard deviation for samples of data.
If you had all the data from the population (i.e
not just a sample), the following formulae would
be used

18
Coefficient of variation

Coefficient of variation (CV) is
This is a relative measure of variation it
measures scatter of the data relative to the
mean.
It allows comparison of variability between
variables with different units of measure.

19
Coefficient of variation

Imagine two data sets
Although the standard deviations are equal, we
cant say the distributions have the same
relative variation.
Distribution one shows greater relative variation
than distribution two.

20
(c) Coefficient of variation

Back to the example the coefficient of variation
(CV) is

21
(c) Interquartile range range

Interquartile range (IQR)
Range

22
(d) Box-and-Whisker Plot

For a box-and-whisker plot we need five values
the minimum, lower quartile, median, upper
quartile and the maximum.
For our data set these are 40, 42, 46, 48 and 51
respectively.
Create the central box with vertical lines at the
lower quartile, median and upper quartile.
Plot lines out from this box to the minimum and
maximum values.

23
(d) Box-and-Whisker Plot
40
41
42
49
48
47
46
45
44
43
51
50
24
(e) Interpretation

If the manufacturer intended to issue a
statement saying that their batteries will last
more than 50 hours, what would you advise them?
Why?
Mean, median and mode are all less than 50 hours
? not wise to make this claim.
Only 3 of the 20 observations (15) are over 50
hours!!

25
(f) Changed data measures of central tendency

(f) Suppose the first value was 142 instead of
42. Repeat (a) and comment on the differences.

26
(f) Changed data measures of central tendency
40 42 42 42 43 44 45 45 46 46
47 48 48 48 48 48 50 51 51 142

Since the data order has changed, the median
becomes 46.5
Mode 48 (as before)

27
(g) Changed data measures of spread
28
(g) Changed data measures of spread
29
(g) Changed data measures of spread

All measures of spread have increased
dramatically remember only one data point
changed!
We would have interpreted the shape of this data
distribution very differently if we had not known
these figures were so affected by one data value
important to always check the data carefully.
A stem-and-leaf plot or histogram would have
helped us identify the outlier.

30
(h) Description of distribution

(h) How would you describe the shape of the
original data set? The revised data set?
The best way to do this would be a graph (eg
produce a stem-and-leaf diagram or a histogram).
First data set the mean (45.8) and median (46)
are very similar ? data is fairly symmetrical

31
(h) Description of distribution

Modified data set the mean (50.8) is greater
than the median (46.5) in the second case ? data
reveals a slight right skew (although we know an
outlier caused this result).

32
Shape
Negative or left-skewness MeanltM
edian
Symmetry or zero-skewness MeanMedian
Positive or right-skewness MeangtM
edian
33
Geometric mean

Geometric mean is the nth root of the product of
n values
Geometric mean rate of return is

34
Example 3-2

The total rate of return () of three bluechip
stocks is given in the table below for the years
2003, 2004 and 2005.
(a) Calculate the geometric mean rate of return
for each stock.
(b) Compare these results.

Year Stock A Stock B Stock C
2003 3.64 1.12 -0.25
2004 2.32 1.70 1.03
2005 0.09 -3.50 2.08
35
Solution 3-2

The geometric mean rate of return is given by
where the Ri are expressed as decimals.
Stock A
Stock B
Stock C

36
Solution 3-2

(b) Stock A 2.01
Stock B -0.25
Stock C 0.95
Stock B has the worst rate of return (due to
negative value in 2005)
Stock A has the best rate of return (positive but
still a considerable drop in 2005)
Stock C shows increasing rate of return over the
three years may actually make it a better
choice!

37
Populations parameters

Recall the following formula for population mean
and standard deviation
Greek letters are used to indicate population
parameters (µ mu, ? sigma) and Roman for
sample parameters ( , S).

38
Empirical Rule

In bell-shaped distributions (symmetrical, mean
median)
68 of observations are within 1 standard
deviation of the mean
95 of observations are within 2 standard
deviations of the mean
99.7 of observations are within 3 standard
deviations of the mean

39
Bienaymé-Chebyshev Rule

For any data set (i.e. not just bell-shaped
distributions), the percentage of observations
that are within k standard deviations of the mean
is at least
Often this rule is simply called the Chebyshevs
rule a little bit easier to say!

40
Example 3-3

Returning to the data set in Example 3-1, answer
the following questions.
According to the Bienaymé-Chebyshev Rule, what
percentage of these battery lives are expected to
be within 1 standard deviation of the mean?
Within 2 standard deviations of the mean?
Within 3 standard deviations of the mean?

41
(a) Bienaymé-Chebyshev Rule

Given k 1,
so at least 0 of observations are expected to
be within 1 standard deviation of the mean (not
very helpful!!!).
For k 2, so at least
75 of observations are within 2 standard
deviations of the mean.
For k 3, so at least
88.89 of observations are within 3 standard
deviations of the mean.

42
Example 3-3

(b) Assume that the manufacturer knows that the
mean life of the population of batteries is 48.2
hours and the standard deviation of the
population of batteries is 3.1 hours.
What percentage of data values are actually
within 1 standard deviation of the mean?
Within 2 standard deviations of the mean?
Within 3 standard deviations of the mean?

43
(b) Data within intervals
40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
So of the data is within one standard
deviation of the mean.
44
(b) Data within intervals
40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
So of the data is within two standard
deviations of the mean.
45
(b) Data within intervals
40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
So all (100) of the data is within three
standard deviations of the mean.
46
Example 3-3

(c) Discuss the difference in your answers to (a)
and (b).
Bienaymé-Chebyshev Rule applies to any
distribution it is a worst case.
It says at least 0 within 1 standard deviation,
at least 75 within 2 standard deviations and at
least 88.89 within 3 standard deviations.
The data set we examined has less spread than
this worst case rule.

47
Example 3-4

A real estate agency is worried that many of
their agents are using poor sales techniques and
that this is having a negative impact on sales.
They believe this is because many of their
agents received very low scores on their
compulsory training course exam (an exam which is
sat prior to beginning employment with the
agency).
They randomly select 10 of their agents,
recording their exam score (out of 200) and the
number of sales they made in the year 2005.

48
(a) Produce a scatterplot of the data. Does
there appear to be any correlation between exam
score and sales? Explain.
Score Sales
185 212
122 143
157 184
165 182
183 201
191 235
121 154
158 187
166 178
102 146
49
(No Transcript)
50
(a) Graph discussion

Higher exam scores appear to correspond to higher
sales figures.
Therefore there appears to be positive
correlation between exam scores and sales figures.

51
Hints on producing graphs

Important always include the following on a
graph
descriptive labels for both the x and y axes (in
this example Exam Score and Sales)
numbers on both axes to indicate the scale
a title
Truncate the axes only if it doesnt violate
principles of graphical excellence!

52
(b) Compute the correlation coefficient.
Comment on this value and its meaning for the
real estate agency.
Score Sales
185 212
122 143
157 184
165 182
183 201
191 235
121 154
158 187
166 178
102 146
53
(b) Correlation

Using the computational formula
we need to find the values
and

54
(b) Correlation
55
(b) Correlation
Note the amount of working required to use the
form of the formula that the text uses see p.
3-12 of the study guide.
56
(b) Correlation

0.934265 is close to 1 indicating strong positive
correlation between exam scores and sales
figures.
Interpretation for real estate agent
Allow students (agents) to re-sit the exam to
(possibly) improve their sales performance.
Use the exam as a pre-screening tool when
employing potential agents.
Be very careful correlation does not imply
causation!!

57
Covariance

The covariance is found via
Used in the calculation of correlation for the
formula used in the text

58
Descriptive measures from a frequency distribution

Approximate mean
Approximate standard deviation

59
Example 3-5

Participants at a recent accounting for small
businesses workshop were asked to complete an
anonymous survey.
The table below contains data taken from this
survey a frequency distribution of the number of
staff employed by each of the 50 small businesses
in attendance at the workshop.
Note that the fractional (part-time) staff were
recorded in this survey, so for example 2.75
staff could mean two full time and one staff
member employed for ¾ of the hours in a working
week.

60
Example 3-5
Class Frequency
0 to less than 5 16
5 to less than 10 19
10 to less than 15 5
15 to less than 20 7
20 to less than 25 2
25 to less than 30 1
Approximate the arithmetic mean and standard
deviation of the number of attendees.
61
Example 3-5
Class Frequency fj Midpoint mj mj fj
0 to less than 5 16
5 to less than 10 19
10 to less than 15 5
15 to less than 20 7
20 to less than 25 2
25 to less than 30 1
62
Example 3-5
Class Frequency fj Midpoint mj mj fj
0 to less than 5 16 2.5
5 to less than 10 19 7.5
10 to less than 15 5 12.5
15 to less than 20 7 17.5
20 to less than 25 2 22.5
25 to less than 30 1 27.5
63
Example 3-5
Class Frequency fj Midpoint mj mj fj
0 to less than 5 16 2.5 16?2.540
5 to less than 10 19 7.5 19?7.5142.5
10 to less than 15 5 12.5 62.5
15 to less than 20 7 17.5 122.5
20 to less than 25 2 22.5 45
25 to less than 30 1 27.5 27.5
64
Example 3-5
Class Frequency fj Midpoint mj mj fj
0 to less than 5 16 2.5 16?2.540
5 to less than 10 19 7.5 19?7.5142.5
10 to less than 15 5 12.5 62.5
15 to less than 20 7 17.5 122.5
20 to less than 25 2 22.5 45
25 to less than 30 1 27.5 27.5
Total 50 440
65
Example 3-5
66
Text sections on CD

Remember to print out Section 3.7 (Obtaining
descriptive summary measures from a frequency
distribution) of the text (on the CD) so that
you can bring it in to the exam room!

67
Pitfalls and ethical issues

Interpretation of numerical values is subjective
(although the actual calculations are objective).
Knowing the shape of the distribution can
influence the choice of descriptive measures that
you use.
For example the centre of a skewed data set might
be best described by the median rather than the
mean.
Report results accurately but in a neutral and
objective manner.

68
Pitfalls and ethical issues

Report both good and bad results.
Poor presentation is not necessarily the same as
unethical presentation of results.
Unethical behaviour occurs when
an inappropriate summary method is chosen
wilfully or
when selective findings are not reported because
it would not support a particular position.

69
After the lecture each week