Numerical Descriptive Measures - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Numerical Descriptive Measures

Description:

Numerical Descriptive Measures Week 3 Objectives On completion of this module you will be able to: calculate and interpret measures of central tendency (mean, median ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 70
Provided by: Lind51
Category:

less

Transcript and Presenter's Notes

Title: Numerical Descriptive Measures


1
Numerical Descriptive Measures
  • Week 3

2
Objectives
  • On completion of this module you will be able to
  • calculate and interpret measures of central
    tendency (mean, median mode),
  • calculate and interpret quartiles, range,
    interquartile range, variance and standard
    deviation,
  • calculate and interpret the coefficient of
    variation,
  • understand and utilise the empirical rule and the
    Bienaymé-Chebyshev Rule,

3
Objectives
  • On completion of this module you will be able to
  • construct a box-and-whisker plot,
  • calculate the covariance and correlation and
  • discuss pitfalls and ethical issues relating to
    descriptive measures.

4
Guide for study this week
  • Print out Section 3.7 of the text (on CD) so that
    you can bring it in to the exam room
  • Read Appendices A (algebra review), B (summation
    notation) and C (statistical symbols). This
    material will help you understand the course
    content.

5
Example 3-1
  • A manufacturer of mobile phones has been
    concerned that the latest model of the battery is
    not lasting as long as anticipated.
  • They take a random sample of 20 phones and
    batteries, and record how long they take to go
    flat (this is done by turning the phones on and
    leave them switched on until the battery goes
    flat).

6
Example 3-1
  • The following data (battery life in hours) are
    the result
  • Check that you can replicate the results
    discussed here using Excel and PHStat2.

42 42 48 45 51 45 48 44 43 42
46 46 47 48 40 48 42 48 51 50
7
(a) Mean, median and mode
  • Mean

8
(a) Mean, median and mode
  • Median is observation.
  • Order data from smallest to largest
  • Find 10th and 11th values 46 and 46.
  • Half-way between these is the median 46.

40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
9
(a) Mean, median and mode
  • Mode is the data value that occurs most often
    is most typical.
  • Mode 48 (appears fives times)

10
(a) Mean, median and mode
  • Mean and median are similar probably best
    measures of middle for this data set.
  • Mode is usually only a good measure of the middle
    for large data sets.
  • The manufacturer will use this information to
    determine what is normal or most usual for
    battery life. This would be helpful in producing
    and maintaining quality products and in
    benchmarking.

11
(b) Quartiles
  • Lower (or first) quartile (LQ) is the value
    or the value.
  • The text rounds this value to the nearest whole
    number and takes that data value the 5th data
    point is 42.
  • Some texts take the value 0.25 of the way from
    the 5th (42) to the 6th (43) value 42.25.

40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
12
(b) Quartiles
  • Upper (or third) quartile (UQ) is
  • value or the value.
  • UQ is the 16th value 48.
  • OR take the value 0.75 of the way from the 15th
    (48) to the 16th (48) value 48.

40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
13
(c) Variance standard deviation
  • The text uses a different formula, but a better
    computational formula for the variance is
  • This only requires we find the sum of the data
    and the sum of the squared data to do the
    calculation.

14
(c) Variance standard deviation
15
(c) Variance standard deviation
Note avoid rounding errors dont round until
the final stage!!!
16
Interpreting variance and standard deviation
  • Variance and standard deviation measure average
    scatter around the mean.
  • Variance results in squared units (eg squared
    dollars, squared metres etc).
  • Therefore we usually interpret the standard
    deviation which is measured in the original units
    (dollars, metres etc).
  • Much of the data is within one standard deviation
    either side of the mean (more on this later).
  • Measures of variation (variance, standard
    deviation, range and IQR) are always greater than
    or equal to zero.

17
Populations parameters
  • The formulae we have been using calculate the
    mean and standard deviation for samples of data.
  • If you had all the data from the population (i.e
    not just a sample), the following formulae would
    be used

18
Coefficient of variation
  • Coefficient of variation (CV) is
  • This is a relative measure of variation it
    measures scatter of the data relative to the
    mean.
  • It allows comparison of variability between
    variables with different units of measure.

19
Coefficient of variation
  • Imagine two data sets
  • Although the standard deviations are equal, we
    cant say the distributions have the same
    relative variation.
  • Distribution one shows greater relative variation
    than distribution two.

20
(c) Coefficient of variation
  • Back to the example the coefficient of variation
    (CV) is

21
(c) Interquartile range range
  • Interquartile range (IQR)
  • Range

22
(d) Box-and-Whisker Plot
  • For a box-and-whisker plot we need five values
    the minimum, lower quartile, median, upper
    quartile and the maximum.
  • For our data set these are 40, 42, 46, 48 and 51
    respectively.
  • Create the central box with vertical lines at the
    lower quartile, median and upper quartile.
  • Plot lines out from this box to the minimum and
    maximum values.

23
(d) Box-and-Whisker Plot
40
41
42
49
48
47
46
45
44
43
51
50
24
(e) Interpretation
  • If the manufacturer intended to issue a
    statement saying that their batteries will last
    more than 50 hours, what would you advise them?
    Why?
  • Mean, median and mode are all less than 50 hours
    ? not wise to make this claim.
  • Only 3 of the 20 observations (15) are over 50
    hours!!

25
(f) Changed data measures of central tendency
  • (f) Suppose the first value was 142 instead of
    42. Repeat (a) and comment on the differences.

26
(f) Changed data measures of central tendency
40 42 42 42 43 44 45 45 46 46
47 48 48 48 48 48 50 51 51 142
  • Since the data order has changed, the median
    becomes 46.5
  • Mode 48 (as before)

27
(g) Changed data measures of spread
28
(g) Changed data measures of spread
29
(g) Changed data measures of spread
  • All measures of spread have increased
    dramatically remember only one data point
    changed!
  • We would have interpreted the shape of this data
    distribution very differently if we had not known
    these figures were so affected by one data value
    important to always check the data carefully.
  • A stem-and-leaf plot or histogram would have
    helped us identify the outlier.

30
(h) Description of distribution
  • (h) How would you describe the shape of the
    original data set? The revised data set?
  • The best way to do this would be a graph (eg
    produce a stem-and-leaf diagram or a histogram).
  • First data set the mean (45.8) and median (46)
    are very similar ? data is fairly symmetrical

31
(h) Description of distribution
  • Modified data set the mean (50.8) is greater
    than the median (46.5) in the second case ? data
    reveals a slight right skew (although we know an
    outlier caused this result).

32
Shape
Negative or left-skewness MeanltM
edian
Symmetry or zero-skewness MeanMedian
Positive or right-skewness MeangtM
edian
33
Geometric mean
  • Geometric mean is the nth root of the product of
    n values
  • Geometric mean rate of return is

34
Example 3-2
  • The total rate of return () of three bluechip
    stocks is given in the table below for the years
    2003, 2004 and 2005.
  • (a) Calculate the geometric mean rate of return
    for each stock.
  • (b) Compare these results.

Year Stock A Stock B Stock C
2003 3.64 1.12 -0.25
2004 2.32 1.70 1.03
2005 0.09 -3.50 2.08
35
Solution 3-2
  • The geometric mean rate of return is given by
  • where the Ri are expressed as decimals.
  • Stock A
  • Stock B
  • Stock C

36
Solution 3-2
  • (b) Stock A 2.01
  • Stock B -0.25
  • Stock C 0.95
  • Stock B has the worst rate of return (due to
    negative value in 2005)
  • Stock A has the best rate of return (positive but
    still a considerable drop in 2005)
  • Stock C shows increasing rate of return over the
    three years may actually make it a better
    choice!

37
Populations parameters
  • Recall the following formula for population mean
    and standard deviation
  • Greek letters are used to indicate population
    parameters (µ mu, ? sigma) and Roman for
    sample parameters ( , S).

38
Empirical Rule
  • In bell-shaped distributions (symmetrical, mean
    median)
  • 68 of observations are within 1 standard
    deviation of the mean
  • 95 of observations are within 2 standard
    deviations of the mean
  • 99.7 of observations are within 3 standard
    deviations of the mean

39
Bienaymé-Chebyshev Rule
  • For any data set (i.e. not just bell-shaped
    distributions), the percentage of observations
    that are within k standard deviations of the mean
    is at least
  • Often this rule is simply called the Chebyshevs
    rule a little bit easier to say!

40
Example 3-3
  • Returning to the data set in Example 3-1, answer
    the following questions.
  • According to the Bienaymé-Chebyshev Rule, what
    percentage of these battery lives are expected to
    be within 1 standard deviation of the mean?
    Within 2 standard deviations of the mean?
    Within 3 standard deviations of the mean?

41
(a) Bienaymé-Chebyshev Rule
  • Given k 1,
  • so at least 0 of observations are expected to
    be within 1 standard deviation of the mean (not
    very helpful!!!).
  • For k 2, so at least
  • 75 of observations are within 2 standard
    deviations of the mean.
  • For k 3, so at least
  • 88.89 of observations are within 3 standard
    deviations of the mean.

42
Example 3-3
  • (b) Assume that the manufacturer knows that the
    mean life of the population of batteries is 48.2
    hours and the standard deviation of the
    population of batteries is 3.1 hours.
  • What percentage of data values are actually
    within 1 standard deviation of the mean?
  • Within 2 standard deviations of the mean?
  • Within 3 standard deviations of the mean?

43
(b) Data within intervals
40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
So of the data is within one standard
deviation of the mean.
44
(b) Data within intervals
40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
So of the data is within two standard
deviations of the mean.
45
(b) Data within intervals
40 42 42 42 42 43 44 45 45 46
46 47 48 48 48 48 48 50 51 51
So all (100) of the data is within three
standard deviations of the mean.
46
Example 3-3
  • (c) Discuss the difference in your answers to (a)
    and (b).
  • Bienaymé-Chebyshev Rule applies to any
    distribution it is a worst case.
  • It says at least 0 within 1 standard deviation,
    at least 75 within 2 standard deviations and at
    least 88.89 within 3 standard deviations.
  • The data set we examined has less spread than
    this worst case rule.

47
Example 3-4
  • A real estate agency is worried that many of
    their agents are using poor sales techniques and
    that this is having a negative impact on sales.
  • They believe this is because many of their
    agents received very low scores on their
    compulsory training course exam (an exam which is
    sat prior to beginning employment with the
    agency).
  • They randomly select 10 of their agents,
    recording their exam score (out of 200) and the
    number of sales they made in the year 2005.

48
(a) Produce a scatterplot of the data. Does
there appear to be any correlation between exam
score and sales? Explain.
Score Sales
185 212
122 143
157 184
165 182
183 201
191 235
121 154
158 187
166 178
102 146
49
(No Transcript)
50
(a) Graph discussion
  • Higher exam scores appear to correspond to higher
    sales figures.
  • Therefore there appears to be positive
    correlation between exam scores and sales figures.

51
Hints on producing graphs
  • Important always include the following on a
    graph
  • descriptive labels for both the x and y axes (in
    this example Exam Score and Sales)
  • numbers on both axes to indicate the scale
  • a title
  • Truncate the axes only if it doesnt violate
    principles of graphical excellence!

52
(b) Compute the correlation coefficient.
Comment on this value and its meaning for the
real estate agency.
Score Sales
185 212
122 143
157 184
165 182
183 201
191 235
121 154
158 187
166 178
102 146
53
(b) Correlation
  • Using the computational formula
  • we need to find the values
  • and

54
(b) Correlation
55
(b) Correlation
Note the amount of working required to use the
form of the formula that the text uses see p.
3-12 of the study guide.
56
(b) Correlation
  • 0.934265 is close to 1 indicating strong positive
    correlation between exam scores and sales
    figures.
  • Interpretation for real estate agent
  • Allow students (agents) to re-sit the exam to
    (possibly) improve their sales performance.
  • Use the exam as a pre-screening tool when
    employing potential agents.
  • Be very careful correlation does not imply
    causation!!

57
Covariance
  • The covariance is found via
  • Used in the calculation of correlation for the
    formula used in the text

58
Descriptive measures from a frequency distribution
  • Approximate mean
  • Approximate standard deviation

59
Example 3-5
  • Participants at a recent accounting for small
    businesses workshop were asked to complete an
    anonymous survey.
  • The table below contains data taken from this
    survey a frequency distribution of the number of
    staff employed by each of the 50 small businesses
    in attendance at the workshop.
  • Note that the fractional (part-time) staff were
    recorded in this survey, so for example 2.75
    staff could mean two full time and one staff
    member employed for ¾ of the hours in a working
    week.

60
Example 3-5
Class Frequency
0 to less than 5 16
5 to less than 10 19
10 to less than 15 5
15 to less than 20 7
20 to less than 25 2
25 to less than 30 1
Approximate the arithmetic mean and standard
deviation of the number of attendees.
61
Example 3-5
Class Frequency fj Midpoint mj mj fj
0 to less than 5 16
5 to less than 10 19
10 to less than 15 5
15 to less than 20 7
20 to less than 25 2
25 to less than 30 1
62
Example 3-5
Class Frequency fj Midpoint mj mj fj
0 to less than 5 16 2.5
5 to less than 10 19 7.5
10 to less than 15 5 12.5
15 to less than 20 7 17.5
20 to less than 25 2 22.5
25 to less than 30 1 27.5
63
Example 3-5
Class Frequency fj Midpoint mj mj fj
0 to less than 5 16 2.5 16?2.540
5 to less than 10 19 7.5 19?7.5142.5
10 to less than 15 5 12.5 62.5
15 to less than 20 7 17.5 122.5
20 to less than 25 2 22.5 45
25 to less than 30 1 27.5 27.5
64
Example 3-5
Class Frequency fj Midpoint mj mj fj
0 to less than 5 16 2.5 16?2.540
5 to less than 10 19 7.5 19?7.5142.5
10 to less than 15 5 12.5 62.5
15 to less than 20 7 17.5 122.5
20 to less than 25 2 22.5 45
25 to less than 30 1 27.5 27.5
Total 50 440
65
Example 3-5
66
Text sections on CD
  • Remember to print out Section 3.7 (Obtaining
    descriptive summary measures from a frequency
    distribution) of the text (on the CD) so that
    you can bring it in to the exam room!

67
Pitfalls and ethical issues
  • Interpretation of numerical values is subjective
    (although the actual calculations are objective).
  • Knowing the shape of the distribution can
    influence the choice of descriptive measures that
    you use.
  • For example the centre of a skewed data set might
    be best described by the median rather than the
    mean.
  • Report results accurately but in a neutral and
    objective manner.

68
Pitfalls and ethical issues
  • Report both good and bad results.
  • Poor presentation is not necessarily the same as
    unethical presentation of results.
  • Unethical behaviour occurs when
  • an inappropriate summary method is chosen
    wilfully or
  • when selective findings are not reported because
    it would not support a particular position.

69
After the lecture each week
  • Review the lecture material
  • Complete all readings
  • Complete all of recommended problems (listed in
    SG) from the textbook
  • Complete at least some of additional problems
  • Consider (briefly) the discussion points prior to
    tutorials
Write a Comment
User Comments (0)
About PowerShow.com