Title: Tendencia central y dispersi
1Tendencia central y dispersión de una distribución
2Review Topics
- Measures of Central Tendency
- Mean, Median, Mode
- Quartile
- Measures of Variation
- The Range, Variance and
- Standard Deviation, Coefficient of variation
- Shape
- Symmetric, Skewed
3Important Summary Measures
One sample Summary Measures
Variation
Central Tendency
Quartile
Mean
Mode
Median
Coefficient of Variation
Range
Variance
Standard Deviation
4Measures of Central Tendency
Central Tendency
Median
Mode
Mean
Data You can access practice sample data on HMO
premiums here.
5Measures of Central Location (Tendency)
- Usually, we focus our attention on two aspects of
measures of central location - Measure of the central data point (the average).
- Measure of dispersion of the data about the
average.
With two data points, the central location
should fall in the middle between them (in order
to reflect the location of both of them).
If the third data point appears exactly in the
middle of the current range, the
central location should not change (because it
is currently residing in the middle).
With one data point clearly the central
location is at the point itself.
But if the third data point appears on the left
hand-side of the midrange, it should pull the
central location to the left.
6- This is the most popular and useful measure of
central location
Sample mean
Population mean
Sample size
Population size
7The mean of the sample of six measurements 7, 3,
9, -2, 4, 6 is given by
7
3
9
4
6
4.5
42.19
15.30
53.21
43.59
8- The median of a set of measurements is the value
that falls in the middle when the measurements
are arranged in order of magnitude.
Even number of observations
First, sort the salaries. Then, locate the values
in the middle
First, sort the salaries. Then, locate the value
in the middle
There are two middle values!
29.5,
26,26,28,29,30,32,60,31
26,26,28,29, 30,32,60,31
26,26,28,29, 30,32,60,31
26,26,28,29, 30,32,60,31
9- The mode of a set of measurements is the value
that occurs most frequently. - Set of data may have one mode (or modal class),
or two or more modes.
For large data sets the modal class is much more
relevant than the a single- value mode.
The modal class
10A professor of statistics wants to report the
results of a midterm exam, taken by 100
students. The data appear in file XM04-06. Find
the mean, median, and mode, and describe the
information they provide.
The mean provides information about the over-all
performance level of the class.
The Median indicates that half of the class
received a grade below 81, and half of the
class received a grade above 81.
The mode must be used when data is qualitative.
If marks are classified by letter grade, the
frequency of each grade can be calculated.Then,
the mode becomes a logical measure to compute.
Excel Results
11Relationship among Mean, Median, and Mode
- If a distribution is symmetrical, the mean,
median and mode coincide
- If a distribution is non symmetrical, and skewed
to the left or to the right, the three
measures differ.
A positively skewed distribution (skewed to the
right)
Mode
Mean
Median
12- If a distribution is symmetrical, the mean,
median and mode coincide
- If a distribution is non symmetrical, and skewed
to the left or to the right, the three measures
differ.
A negatively skewed distribution (skewed to the
left)
A positively skewed distribution (skewed to the
right)
Mean
Mode
Mean
Mode
Median
Median
13Measures of Variation
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
Population Variance
Population Standard Deviation
Sample Variance
Sample Standard Deviation
Interquartile Range
14Measures of variability(Looking beyond the
average)
- Measures of central location fail to tell the
whole story about the distribution. - A question of interest still remains unanswered
How typical is the average value of all the
measurements in the data set?
or
How much spread out are the measurements about
the average value?
15Observe two hypothetical data sets
Low variability data set
The average value provides a good representation
of the values in the data set.
High variability data set
This is the previous data set. It is now
changing to...
The same average value does not provide as good
presentation of the values in the data set as
before.
16- The range of a set of measurements is the
difference between the largest and smallest
measurements. - Its major advantage is the ease with which it can
be computed. - Its major shortcoming is its failure to provide
information on the dispersion of the values
between the two end points.
But, how do all the measurements spread out?
The range cannot assist in answering this question
Range
Largest measurement
Smallest measurement
17- This measure of dispersion reflects the values of
all the measurements. - The variance of a population of N measurements
x1, x2,,xN having a mean m is defined as - The variance of a sample of n measurementsx1,
x2, ,xn having a mean is defined as
18Consider two small populations Population A 8,
9, 10, 11, 12 Population B 4, 7, 10, 13, 16
9-10 -1
11-10 1
8-10 -2
12-10 2
Let us start by calculating the sum of deviations
Thus, a measure of dispersion is needed that
agrees with this observation.
The sum of deviations is zero in both
cases, therefore, another measure is needed.
A
10
9
8
11
12
but measurements in B are much more
dispersed then those in A.
The mean of both populations is 10...
4-10 - 6
16-10 6
B
7-10 -3
13-10 3
7
4
10
13
16
19 9-10 -1
The sum of squared deviations is used in
calculating the variance. See example next.
11-10 1
8-10 -2
12-10 2
The sum of deviations is zero in both
cases, therefore, another measure is needed.
A
10
9
8
11
12
4-10 - 6
16-10 6
B
7-10 -3
13-10 3
7
4
10
13
16
20Let us calculate the variance of the two
populations
Why is the variance defined as the average
squared deviation? Why not use the sum of squared
deviations as a measure of dispersion instead?
After all, the sum of squared deviations
increases in magnitude when the dispersion of a
data set increases!!
21- Example 4.8
- Find the mean and the variance of the following
sample of measurements (in years). - 3.4, 2.5, 4.1, 1.2, 2.8, 3.7
- Solution
A shortcut formula
3.422.523.72-(17.7)2/6 1.075 (years)2
22 Sample Standard Deviation
For the Sample use n - 1 in the denominator.
s
Data 10 12 14
15 17 18 18 24
n 8 Mean 16
s
4.2426
23Interpreting Standard Deviation
- The standard deviation can be used to
- compare the variability of several distributions
- make a statement about the general shape of a
distribution. - The empirical rule If a sample of measurements
has a mound-shaped distribution, the interval
24 Comparing Standard Deviations
Data 10 12 14
15 17 18 18 24
N 8 Mean 16
s
4.2426
3.9686
Value for the Standard Deviation is larger for
data considered as a Sample.
25 Comparing Standard Deviations
Data A
Mean 15.5 s 3.338
11 12 13 14 15 16 17 18
19 20 21
Data B
Mean 15.5 s .9258
11 12 13 14 15 16 17 18
19 20 21
Data C
Mean 15.5 s 4.57
11 12 13 14 15 16 17 18
19 20 21
26Measures of Association
- Two numerical measures are presented, for the
description of linear relationship between two
variables depicted in the scatter diagram. - Covariance - is there any pattern to the way two
variables move together? - Correlation coefficient - how strong is the
linear relationship between two variables
27mx (my) is the population mean of the variable X
(Y) N is the population size. n is the sample
size.
28- The coefficient of correlation
- This coefficient answers the question How strong
is the association between X and Y.
29Strong positive linear relationship
1 0 -1
COV(X,Y)gt0
or
r or r
No linear relationship
COV(X,Y)0
Strong negative linear relationship
COV(X,Y)lt0