The mean - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

The mean

Description:

Advantage: More informative than median & mode. Takes all the observation ... Use the mean for unimodal, symmetrical distributions of interval/ratio level data. ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 47

Provided by: Win6250

Learn more at: http://faculty.piercecollege.edu

Category:

more less

Transcript and Presenter's Notes

Title: The mean

1

The mean
The arithmetic average

The balance point of the distribution
X2 -3 X2 -3 X6 1 X10 5 0 An error
or deviation is the distance from a score to the
mean X-µ
2

The sum of the errors or deviations around the
mean is always 0.
Advantage More informative than median mode
Takes all the observation/scores into account.
Takes the distance direction of
deviations/errors into account.

Advantage More uses than median mode
Necessary for calculating many inferential
statistics.
Limitation Not always possible to calculate a
mean (scale)
Mean can only be calculated for interval/ratio
level data
Need a different measure for nominal or ordinal
level data

Limitation Not always appropriate to use the
mean to describe the middle of a distribution
(distribution)
Mean is sensitive to extreme values or
outliers
Mean does not always reflect where the scores
pile up
Need a different measure for asymmetrical
distributions
Use the mean for unimodal, symmetrical
distributions of interval/ratio level data.

The median
Divides the distribution exactly in half 50th
percentile
Odd of scores no pileup or ties at the
middle
median the middle score

Even of scores no pileup or ties at the
middle
median the average of the 2 middle scores

Advantage Insensitive to extreme values
Can be used when extreme values distort the mean
3, 5, 5, 6, 7, 8, 9 median6 mean 6.1
3, 5, 5, 6, 7, 8, 50 median6 mean12
Is the most central, representative value in
skewed distributions
Advantage Can be calculated when the mean cannot
Can be used with ranks (as well as interval/ratio
data)
Can be used with open-ended distributions
Example of siblings (5 siblings?)

Limitation Not as informative as the mean
Takes only the observations/scores around the
50th ile into account.
Provides no information about distances between
observations.
Limitation Fewer uses than the mean
Median is purely descriptive.

Limitation Not always possible to calculate a
median (scale)
Median can only be calculated for ordinal
interval/ratio data
Use the median when you cannot calculate a mean
or when the distributions of interval/ratio data
are skewed by extreme values.

The mode
The most frequently occurring score(s)
Advantage Simple to find

mode coke
11

Advantage Can be used with any scale of
measurement
Median can only be calculated for ordinal
interval/ratio data
Mean can only be calculated for interval/ratio
data
Advantage Can be used to indicate gt1 most
frequent value
Use to indicate bimodality, multimodality

12
M 5 median 5 Neither reflects where the
scores actually pile up. Modes 2 8.
13
Use to indicate major minor modes
major mode 6am minor mode 6pm
14

Limitation Not as informative as the mean or
median
Takes only the most frequently observed X values
into account.
Provides no information about distances between
observations or the of observations
above/below the mode.
Limitation Fewer uses than the mean
Mode is purely descriptive.
Need to calculate a mean to use with inferential
statistics.
Use the mode when you cannot compute a mean or
median, or with the mean/median to describe a
bimodal/multimodal distribution.

Describing distributions Measures of variability
or dispersion
To describe/summarize a distribution of scores
efficiently, you need
A measure of central tendency a measure of
variability.
Which measure of central tendency is most
appropriate? Why?

Central tendency variability measures are
partners.
Mode ?range
Median ?interquartile range, semi-interquartile
range
Mean ?SS, variance (s²or s² ), standard deviation
(s or s)
These measures describe distributions indicate
how well individual scores or samples of scores
represent the population.

Variability measures used with the mode median
Range
Based on the distance between the highest
lowest observations on the X scale.
Only takes the 2 most extreme observations into
account.

For interval/ratio data, range

highest score lowest score 1 11 - 2 1
10 OR URL X max LRL X min 11.5 - 1.5 10
19

Range can also be used for ordered categories
Range from agree to disagree strongly, with
modal response disagree.
The range is typically used with the mode, when
the mean median are inappropriate or impossible
to calculate (but may be reported along with a
median or a mean).

Interquartile range (IQR) semi-interquartile
range (SIQR)
Based on distances between scores corresponding
to percentiles on the X scale.
Only take the middle 50 of the distribution
into account.
Use only with interval/ratio data.
Interquartile range distance between 1st 3rd
quartiles
IQR Q3 - Q1
1st quartile Q1 is the score at the 25th
percentile
2nd quartile Q2 is the score at the 50th
percentilethe median
3rd quartile Q3 is the score at the 75th
percentile

IQR provides information about how much distance
on the X scale covers or contains the middle 50
of the distribution.

N
Q1
Q3
IQR
SIQR
Semi-Interquartile range half the interquartile
range
SIQR
For a symmetrical distribution, SIQR tells you
the distance from the median up to Q3 or down to
Q1the distance covering the 25 of the
distribution to each side of the median.
The interquartile range semi-interquartile
range are typically used with the median (but may
be reported along with a mean).

Variability measures used with the mean
SS, variance standard deviation are based on
distances between each of the scores the mean
on the X scale.
All scores are taken into account, as with the
mean.
Use only with interval/ratio data.
Most useful for symmetrical distributions, when
the mean is the best measure of central tendency.

24
For now, we will be working with the population
values.
25

Suppose you want to summarize how far the scores
in a distribution typically deviate from the
mean.
Averages are a convenient way to summarize
information, BUT
REMEMBER An error or deviation is the distance
from a score to the mean
X- µ
REMEMBER The sum of the deviations around the
mean is always 0.

You cant sum the deviations divide by the
number of scores to get a useful average amount
of deviation 0/N will always 0.
What can you do to summarize the deviations?

SS the sum of the squared deviations or errors
around the mean
(definitional formulaconceptual) Squaring the
deviations first allows you to sum them.
27

Computing, squaring, summing all N deviations
is tedious, so there is a shortcut.
Plug these values into the following formula
(computational formulause this one)
For the distribution of N4,
?X 8 ?X² 38

Computing, squaring, summing all N deviations
is tedious, so there is a shortcut.
Plug these values into the following formula
(computational formulause this one)
For the distribution of N4,
?X 8 ?X² 38
SS

So, what does __ tell you about the variability
of the distribution of scores?
By itself, not much
SS summarizes the amount of deviation is useful
for further analyses.
In general, we CAN say that

As variability increases (more differences
between scores, larger deviations) SS gets
larger.
Extreme scores farther from µ contribute
proportionately more to SS because they produce
larger deviations.
As N increases (more squared deviations to sum)
SS gets larger.
Because SS increases with N, SS is NOT a good
descriptive statistic
You cant compare SS between groups of different
sizes.

How can you use SS to create a measure that will
allow you to compare different-sized groups?

How can you use SS to create a measure that will
allow you to compare different-sized groups?
Variance the average squared deviation or mean
squared deviation
Variance is not affected by N, because it is an
averagethe mean squared deviation.
Variance summarizes the amount of deviation,
allows for comparisons between different-sized
groups, is useful for further analyses.
Since variance is a mean of squared deviations,
it is not on the same scale as our original
variable

Because variance does NOT allow you to describe
typical variation among scores in terms of the
original scale, it is still NOT a good
descriptive statistic.
The relationship between variance distances or
units on the I/E scale is difficult to visualize
or understand.
How can you use ?² to create a descriptive
measure of variability on the same scale as the
original scores?
SSthe sum of the squared deviations ?(X-?)²
?²the average squared deviation SS/N
What wed really like to have is a measure of the
typical or average deviation from the mean that
is NOT based on squared quantities.

Standard deviation the typical or expected
deviation
The typical, average, or expected distance that
scores deviate from the mean.
population s.d.
Taking the square root of the variance returns
the measure of variability to the original units
of measurement.
This allows you to represent standard deviation
as a distance on the X axis.
This also allows you to make statements about
how extreme or unusual an observation is.

Note As with the mean, standard deviation is
most useful for describing symmetrical
distributions.
Standard deviation is the best descriptive
measure of variability around a mean SS
variance are important concepts for understanding
for use in further analyses.

Sample variance standard deviation
We often want to make statements about population
parameters.
How extroverted are male U.S. citizens, on
average?
Parameter of interest µ .
BUT, much of the time we only have access to
sample statistics.
How extroverted are males from the PSY 1 subject
pool, on average?
Our best estimate of µ M calculated using
sample data.

37
We use statistics s s² as estimates of s s²
when the population parameters are unknown.
38

Formulae

Sample population SS are the same
Calculations do NOT change from population to
sample.
Population Sample
N has just been relabeled as n.
If you use the definitional formula, use the
correct mean.
Population Sample
This will matter later on

Comparing sample population variances
Calculations DO change from population to sample.
Population Sample
N has been relabeled as n.
AND
Use n-1 instead of N in the denominator.
Sample formula will always yield a larger value.

Why (n-1) instead of N?
Because (n-1) instead of N corrects for bias in
calculating s s².
Remember Sample statistics are only useful to
the extent that they provide unbiased estimates
of population parameters.
What is an unbiased statistic?

One that on average the population parameter.
M is an unbiased estimate of ? The average of
many sample means the population mean.

(Each box is a sample mean.) What is a biased
statistic? One that systematically over or
underestimates the parameter.
43

SS/N tends to underestimate population variance
when using sample data.
Why doesnt the SS/N formula work with sample
data?
Samples usually contain less variability than the
populations they come from.

Samples tend to contain observations from the
center of the population distribution. These
samples do not reflect the extremes of the
population, so we underestimate the true
variability.
44

Dividing SS by a smaller number corrects for the
tendency to underestimate true population
variability.
The n-1 correction makes s² s unbiased
estimators of s² s .
n-1 is also referred to as degrees of freedom.
Sample variances have n-1 degrees of
freedomthey are calculated from n-1 independent
scores.
The last score is determined by the other scores
by M.

For the following set of data, compute the value
for SS.
Scores 5, 2, 2, 7, 9
ANS SS
Calculate the variance and the standard deviation
for the following date (Population Sample)
Scores 2, 3, 2, 4, 7, 5, 3, 6, 4