Title: Central Tendency and Variability
1Central Tendency and Variability
- The two most essential features of a distribution
2Questions
- Define
- Mean
- Median
- Mode
- What is the effect of distribution shape on
measures of central tendency? - When might we prefer one measure of central
tendency to another?
3Questions (2)
- Define
- Range
- Average Deviation
- Variance
- Standard Deviation
- When might we prefer one measure of variability
to another? - What is a z score?
- What is the point of Tchebycheffs inequality?
4Variables have distributions
- A variable is something that changes or has
different values (e.g., anger). - A distribution is a collection of measures,
usually across people. - Distributions of numbers can be summarized with
numbers (called statistics or parameters).
5Central Tendency refers to the Middle of the
Distribution
6Variability is about the Spread
71. Central Tendency Mode, Median, Mean
- The mode the most frequently occurring score.
Midpoint of most populous class interval. Can
have bimodal and multimodal distributions.
8Median
- Score that separates top 50 from bottom 50
- Even number of scores, median is half way between
two middle scores. - 1 2 3 4 5 6 7 8 Median is 4.5
- Odd number of scores, median is the middle number
- 1 2 3 4 5 6 7 Median is 4
9Mean
- Sum of scores divided by the number of people.
Population mean is (mu) and sample mean is
(X-bar). - We calculate the sample mean by
- We calculate the population mean by
10Deviation from the mean
- x X . Deviations sum to zero.
- Deviation score deviation from the mean
- Raw scores
- Deviation scores
9
8 9 10
7 8 9 10 11
0
-1 0 1
-2 -1 0 1 2
11Comparison of mean, median and mode
- Mode
- Good for nominal variables
- Good if you need to know most frequent
observation - Quick and easy
- Median
- Good for bad distributions
- Good for distributions with arbitrary ceiling or
floor
12Comparison of mean, median mode
- Mean
- Used for inference as well as description best
estimator of the parameter - Based on all data in the distribution
- Generally preferred except for bad
distribution. Most commonly used statistic for
central tendency.
13Best Guess interpretations
- Mean average of signed error will be zero.
- Mode will be absolutely right with greatest
frequency - Median smallest absolute error
14Expectation
- Discrete and continuous variables
- Mean is expected value either way
- Discrete
- Continuous
- (The integral looks bad but just means take the
average)
15Influence of Distribution Shape
16Review
- What is central tendency?
- Mode
- Median
- Mean
172. Variability aka Dispersion
- 4 Statistics Range, Average Deviation,
Variance, Standard Deviation - Range high score minus low score.
- 12 14 14 16 16 18 20 range20-128
- Average Deviation mean of absolute deviations
from the median
Note difference between this definition
undergrad text- deviation from Median vs. Mean
18Variance
- Population Variance
- Where means population variance,
- means population mean, and the other terms
have their usual meaning. - The variance is equal to the average squared
deviation from the mean. - To compute, take each score and subtract the
mean. Square the result. Find the average over
scores. Ta da! The variance.
19Computing the Variance
(N5)
5 15 -10 100
10 15 -5 25
15 15 0 0
20 15 5 25
25 15 10 100
Total 75 0 250
Mean Variance Is ? 50
20Standard Deviation
- Variance is average squared deviation from the
mean. - To return to original, unsquared units, we just
take the square root of the variance. This is
the standard deviation. - Population formula
21Standard Deviation
- Sometimes called the root-mean-square deviation
from the mean. This name says how to compute it
from the inside out. - Find the deviation (difference between the score
and the mean). - Find the deviations squared.
- Find their mean.
- Take the square root.
22Computing the Standard Deviation
(N5)
5 15 -10 100
10 15 -5 25
15 15 0 0
20 15 5 25
25 15 10 100
Total 75 0 250
Mean Variance Is ? 50
Sqrt SD Is ?
23Example Age Distribution
24Review
- Range
- Average deviation
- Variance
- Standard Deviation
25Standard or z score
- A z score indicates distance from the mean in
standard deviation units. Formula - Converting to standard or z scores does not
change the shape of the distribution. Z-scores
are not normalized.
26Tchebycheffs Inequality (1)
Suppose we know mean height in inches is 66 and
SD is 4 inches. We assume nothing about the
shape of the distribution of height. What is the
probability of finding people taller than 74
inches? (Note that b is a deviation from the
mean in this case 74-668.). Also 74 inches is
2 SDs above the mean therefore, z 2.
If we assume height is normally distributed, p
is much smaller. But we will get to that later.
27Tchebycheff (2)
- Z-score form
- Probability of z score from any distribution
being more than k SDs from mean is at most 1/k2. - Z-scores from the worst distributions are rarely
more than 5 or less than -5. - For symmetric, unimodal distributions, z is
rarely more than 3.
For the problem in the previous slide
28Review
- Z-score in words
- Z-score in symbols
- Meaning of Tchebycheffs theorem
29Median House Price Data
- Find data
- Show Univariate
- Show plots