Title: Lecture 3: Thu, Sep 12
1Lecture 3 Thu, Sep 12
- Announcements
- HW1 due Thu, Sept 19 at start of class
- Todays Plan
- JMP-IN
- Read Chapter 3 (Provides guidelines for
application of graphical method. ) - Measures of Centrality (Start of Chapter 4)
- Measures of Variability
2Intro to JMP-IN
- Importing editing datasets
- Bar and pie charts
- Histograms and stem-and-leaf plots
- Scatter plots
- Saving and printing graphs
3Measures of Central Location
- Usually, we focus our attention on two types of
measures when describing population
characteristics - Central location (e.g. average)
- Variability or spread
The measure of central location reflects the
locations of all the actual data points.
4Measures of Central Location
- The measure of central location reflects the
locations of all the actual data points. - How?
With two data points, the central location
should fall in the middle between them (in order
to reflect the location of both of them).
But if the third data point appears on the left
hand-side of the midrange, it should pull the
central location to the left.
5Measures of Centrality
- Mean (average)
- Median (middle number)
- Mode (most likely)
6 The Arithmetic Mean
- This is the most popular and useful measure of
central location
7 The Arithmetic Mean
Sample mean
Population mean
Sample size
Population size
8 The Arithmetic Mean
The reported time on the Internet of 10 adults
are 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Find
the mean time on the Internet.
0
7
22
11.0
9 The Median
- The Median of a set of observations is the value
that falls in the middle when the observations
are arranged in order of magnitude.
Odd number of observations
8
0, 0, 5, 7, 8 9, 12, 14, 22
8.5,
0, 0, 5, 7, 8, 9, 12, 14, 22, 33
10 The Mode
- The Mode of a set of observations is the value
that occurs most frequently. - Set of data may have one mode (or modal class),
or two or more modes.
For large data sets the modal class is much more
relevant than a single-value mode.
The modal class
11 The Mode
- Example 4.5Find the mode for the data in Example
4.1. Here are the data again 0, 7, 12, 5, 33,
14, 8, 0, 9, 22 - Solution
-
- All observation except 0 occur once. There are
two 0. Thus, the mode is zero. - Is this a good measure of central location?
- The value 0 does not reside at the center of
this set(compare with the mean 11.0 and the
mode 8.5).
12Relationship among Mean, Median, and Mode
- If a distribution is symmetrical, the mean,
median and mode coincide
- If a distribution is asymmetrical, and skewed to
the left or to the right, the three measures
differ.
A positively skewed distribution (skewed to the
right)
Mode
Mean
Median
13Relationship among Mean, Median, and Mode
- If a distribution is symmetrical, the mean,
median and mode coincide
- If a distribution is non symmetrical, and skewed
to the left or to the right, the three measures
differ.
A negatively skewed distribution (skewed to the
left)
A positively skewed distribution (skewed to the
right)
Mean
Mode
Mean
Mode
Median
Median
14Measures of Centrality in JMP-IN
- Mean given in Moments
- Median given in Quantiles
- Mode
- Small dataset sort observations and obtain by
inspection - Large dataset plot histogram, reduce class size
and highlight bins with gt1 observation. - JMP-IN
15Relationship between Mean, Median and Mode
- Question What happens to Mean, Median, Mode if
we - Add a constant to every observation?
- For a symmetric histogram
- meanmedianmode
- For a positively-skewed histogram
- modeltmedianltmean
- For a negatively-skewed histogram
- meanltmedianltmode
16Symmetric Distribution
17Positively-Skewed Distribution
18Negatively-Skewed Distribution
19Which measure to use?
- Which of the three measures can be used to
summarize the following data types. - Quantitative
- Ranked
- Qualitative
- Which of the 3 (mean, median, mode), is most
sensitive to extreme values?
20 The Geometric Mean
- This is a measure of the average growth rate.
- Let Ri denote the the rate of return in period i
(i1,2,n). The geometric mean of the returns
R1, R2, ,Rn is the constant Rg that produces the
same terminal wealth at the end of period n as do
the actual returns for the n periods.
21 The Geometric Mean
For the given series of rate of returns the nth
period return is calculated by
If the rate of return was Rg in every period, the
nth period return would be calculated by
Rg is selected such that
22 The Geometric Mean
- Example
- A firms sales were 1,000 three years ago.
- Sales have grown annually by 20, 10, -5.
- Find the geometric mean rate of growth in sales.
- Solution
- Since Rg is the geometric mean
- (1Rg)3 (1.2)(1.1)(1-.05) 1.2540
- Thus,
23Measures of variability
- Measures of central location fail to tell the
whole story about the distribution. - A question of interest still remains unanswered
How much are the observations spread out around
the mean value?
24 Measures of variability
Observe two hypothetical data sets
Small variability
The average value provides a good representation
of the observations in the data set.
This data set is now changing to...
25Measures of variability
Observe two hypothetical data sets
Small variability
The average value provides a good representation
of the observations in the data set.
Larger variability
The same average value does not provide as good
representation of the observations in the data
set as before.
26Measures of Variability
- Range
- Variance
- Standard Deviation (SD)
- Coefficient of Variation (CV)
- (Other terms for variability volatility, risk)
27The range
- The range of a set of observations is the
difference between the largest and smallest
observations. - Its major advantage is the ease with which it can
be computed. - Its major shortcoming is its failure to provide
information on the dispersion of the observations
between the two end points.
28Variance
- This measure reflects the dispersion of all the
observations - Population
- Sample
29Variance Shortcut Formulas
30Standard Deviation
31Toy Example 1
- Consider the following three datasets
32(No Transcript)
33Questions
- Compare the means of the three distributions.
- Which variable has the largest range?
- Which one has the largest standard deviation?
34Calculations for x
35Variance Calculation (Shortcut Formula)
36Calculations for y
37Calculations for z
38The Coefficient of Variation
- The coefficient of variation of a set of
measurements is the standard deviation divided by
the mean value. - This coefficient provides a proportionate measure
of variation.
A standard deviation of 10 may be perceived large
when the mean value is 100, but only moderately
large when the mean value is 500
39Units of Variability
- If the original observations are in dollars, what
are the units for - Range
- Standard deviation
- Variance
- Coefficient of variation
40Measures of Variability
- (a) Range
- (b) Standard Deviation
- (c) Variance
- (d) Coefficient of Variation
- Question What happens to (a),(b),(c),(d) when we
- Add a constant to every observation?
- Multiply every observation by a constant?
41Measures of Variability in JMP-IN
- Range max and min given in Quantiles
- Standard Deviation given in Moments
- Variance and CV given in More Moments (under
Display Options). - JMP-IN