Title: Review
1Review
- Descriptive Statistics
- Qualitative (Graphical)
- Quantitative (Graphical)
- Summation Notation
- Qualitative (Numerical)
- Central Measures (mean, median, mode and modal
class) - Shape of the Data
- Measures of Variability
2Outlier
- A data measurement which is unusually large or
small compared to the rest of the data. - Usually from
- Measurement or recording error
- Measurement from a different population
- A rare, chance event.
3Advantages/Disadvantages Mean
- Disadvantages
- is sensitive to outliers
- Advantages
- always exists
- very common
- nice mathematical properties
4Advantages/Disadvantages Median
- Disadvantages
- does not take all data into account
- Advantages
- always exists
- easily calculated
- not affected by outliers
- nice mathematical properties
5Advantages/Disadvantages Mode
- Disadvantages
- does not always exist, there could be just one
of each data point - sometimes more than one
- Advantages
- appropriate for qualitative data
6Review
- A data set is skewed if one tail of the
distribution has more extreme observations than
the other. - http//www.shodor.org/interactivate/activities/Ske
wDistribution/
7Review
Skewed to the right The mean is bigger than the
median.
8Review
Skewed to the left The mean is less than the
median.
9Review
When the mean and median are equal, the data is
symmetric
10Numerical Measures of Variability
- These measure the variability or spread of the
data.
11Numerical Measures of Variability
- These measure the variability or spread of the
data.
Relative Frequency
0.5
0.4
0.3
0.2
0.1
1
3
4
5
2
0
12Numerical Measures of Variability
- These measure the variability or spread of the
data.
Relative Frequency
0.5
0.4
0.3
0.2
0.1
1
3
4
5
2
0
13Numerical Measures of Variability
- These measure the variability or spread of the
data.
Relative Frequency
0.5
0.4
0.3
0.2
0.1
1
3
4
5
2
7
0
6
14Numerical Measures of Variability
- These measure the variability, spread or relative
standing of the data. - Range
- Standard Deviation
- Percentile Ranking
- Z-score
15Range
- The range of quantitative data is denoted R and
is given by - R Maximum Minimum
16Range
- The range of quantitative data is denoted R and
is given by - R Maximum Minimum
- In the previous examples the first two graphs
have a range of 5 and the third has a range of 7.
17Range
- R Maximum Minimum
- Disadvantages
- Since the range uses only two values in the
sample it is very sensitive to outliers. - Give you no idea about how much data is in the
center of the data.
18What else?
- We want a measure which shows how far away most
of the data points are from the mean.
19What else?
- We want a measure which shows how far away most
of the data points are from the mean. - One option is to keep track of the average
distance each point is from the mean.
20Mean Deviation
- The Mean Deviation is a measure of dispersion
which calculates the distance between each data
point and the mean, and then finds the average of
these distances.
21Mean Deviation
- Advantages The mean deviation takes into
account all values in the sample. - Disadvantages The absolute value signs are very
cumbersome in mathematical equations.
22Standard Deviation
- The sample variance, denoted by s², is
23Standard Deviation
- The sample variance, denoted by s², is
- The sample standard deviation is
- The sample standard deviation is much more
commonly used as a measure of variance.
24Example
- Let the following be data from a sample
- 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
- Find
- a) The range
- b) The standard deviation of this sample.
25Sample 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
- a) The range
-
- b) The standard deviation of this sample.
2 4 3 2 5 2 1 4 5 2
26Sample 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
- a) The range
-
- b) The standard deviation of this sample.
2 4 3 2 5 2 1 4 5 2
27Sample 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
- a) The range
-
- b) The standard deviation of this sample.
2 4 3 2 5 2 1 4 5 2
-1 1 0
28Sample 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
- a) The range
-
- b) The standard deviation of this sample.
2 4 3 2 5 2 1 4 5 2
-1 1 0 -1 2 -1 -2 1 2 -1
1 1 0 1 4 1 4 1 4 1
29Sample 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
2 4 3 2 5 2 1 4 5 2
-1 1 0 -1 2 -1 -2 1 2 -1
1 1 0 1 4 1 4 1 4 1
30Sample 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
2 4 3 2 5 2 1 4 5 2
-1 1 0 -1 2 -1 -2 1 2 -1
1 1 0 1 4 1 4 1 4 1
31Sample 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
2 4 3 2 5 2 1 4 5 2
-1 1 0 -1 2 -1 -2 1 2 -1
1 1 0 1 4 1 4 1 4 1
32Sample 2, 4, 3, 2, 5, 2, 1, 4, 5, 2.
Standard Deviation
33More Standard Deviation
- Like the mean, we are also interested in the
population variance (i.e. your sample is the
whole population) and the population standard
deviation. - The population variance and standard deviation
are denoted s and s2 respectively.
34More Standard Deviation
- The population variance and standard deviation
are denoted s and s2 respectively. - The formula for population variance is
slightly different than sample variance
35Example Using Standard Deviation
- 35, 59, 70, 73, 75, 81, 84, 86.
- The mean and standard deviation are 70.4 and
16.7, respectively. - We wish to know if any of are data points are
outliers. That is whether they dont fit with
the general trend of the rest of the data. - To find this we calculate the number of standard
deviations each point is from the mean.
36Example Using Standard Deviation
- To find this we calculate the number of standard
deviations each point is from the mean. - To simplify things for now, work out which data
points are within - one standard deviation from the mean i.e.
- two standard deviations from the mean i.e.
- three standard deviations from the mean i.e.
37Example Using Standard Deviation
- Here are eight test scores from a previous Stats
201 class - 35, 59, 70, 73, 75, 81, 84, 86.
- The mean and standard deviation are 70.4 and
16.7, respectively. Work out which data points
are within - one standard deviation from the mean i.e.
- two standard deviations from the mean i.e.
- three standard deviations from the mean i.e.
38Example Using Standard Deviation
- Here are eight test scores from a previous Stats
201 class - 35, 59, 70, 73, 75, 81, 84, 86.
- The mean and standard deviation are 70.4 and
16.7, respectively. Work out which data points
are within - one standard deviation from the mean i.e.
- 59, 70, 73, 75, 81, 84, 86
- two standard deviations from the mean i.e.
- 59, 70, 73, 75, 81, 84, 86
- c) three standard deviations from the mean i.e.
- 35, 59, 70, 73, 75, 81, 84, 86