Title: Numerical Methods of Descriptive Statistics
1Numerical Methods of Descriptive Statistics
2Objectives
- Learn to calculate interpret measures for
- Central tendency
- Sample mean, median, mode
- Variability
- Standard deviation, variance, inter-quartile
range. - Learn how to construct and apply box-plots with
all the components. - Text reference 24-9,11
- HLS reference Ch 2.
3Key Terms
- Central Tendency
- Mean
- Median
- Mode
- Variability
- Range Inter-Quartile Range (IQR)
- Variance or Standard Deviation
- Skewness
- Kurtosis
4Sample vs Population
- Similar measures are calculated, but these are
- different...
- ...conceptually
- ...computationally
- in notation.
5Overview
6Measures of Central Tendency
7Method of Averages
- Sample mean vs population mean
- The sample variance
- the sample standard deviation
8The sample mean
9The Sample Variance and Sample Standard Deviation
10Interpreting the Sample Standard Deviation
- Question why dont we interpret the sample
variance directly? - Hint What are the units of sample variance?
- Two rules aid the interpretation of the sample
standard deviation - Chebyshevs rule
- The Empirical rule
11Chebyshevs Rule
- Mathematical Theorem
- at least 1-1/k2 of the measurements will fall
within k standard deviations of the mean - This provides a lower bound for the number of
observations. - Assumptions essentially none.
- Limitations weak result.
12The Empirical Rule
- Zero mathematical basis, rather a rule of thumb.
- Assumptions the relative frequency distribution
must be mound-shaped and symmetrical. - The rule states that within 1 standard deviation
from the mean approximately 68 of the data will
be clustered. 95 will be within 2s and 99.7 of
the data will be within 3s of x-bar.
13Preview of the z-score
- The z-score measures how many standard deviations
a variable is from the mean. - For example, if McDonalds has a mean profit
margin of 12 of revenue, how many standard
deviations from 12 is a store with a profit
margin of 18? - Answer if s3, z (18-12)/3 2
14How do you develop feel for the standard
deviation?
- Do problems.
- Read the text to develop a conceptual overview of
statistics. - Try to frame problems in terms of standard
deviations. - Try to come up with problems that can be
addressed using the standard deviation but not
otherwise.
15Rank-ordering
16Method Rank-ordering
- Step 1 rank the observations
- Step 2 count and identify appropriate cut-offs
- Step 3 use the numbers from step 2 to identify
possible and likely outliers
17Calculated quantities - Percentiles
- A percentile P is a value of x such that the
given percentage of the data falls below x. - To determine the Pth percentile (QS 2.4)
- Rank order the data.
- Let l be the location of the Pth percentile in
the ordered data
18Calculated quantities - Percentiles
- 3. If l is not an integer, then round l up to the
next greatest integer. If l is an integer, the
percentile P is the average of the the data
values in position l and l1. - For example, the 50th percentile is called
the median. If there are an odd number of
observations (N), the median is the middle number
of the ranked observations. If N is even, the
median is the average of the two numbers in the
middle.
19Calculated quantities - Quartiles
- In order to describe the data, we split it up
into portions of roughly the same size. - The median (Q2 or 50th percentile) splits the
data into two sets. - Q1 (or QL) is defined to be the 25th percentile
and splits the first half of the data into two
again. - Q3 (or QU) is defined to be the 75th percentile
and splits the second half of the data into two.
20Example Assume X is the number of courses a
student is taking.
21Step 1Rank order the data
22Calculate the quartiles
- The median
- l (M)7(50/100)3.5 l 4
- We rounded up since 3.5 is not an integer.
- Since l 4, the median 5
- The lower upper quartiles
- l (QL) 7(25/100) 1.75 l 2
- QL 4
- l (QU) 7(75/100) 5.25 l 6
- QU 5
23Measure of variability
- Range
- Sensitive to extreme values
- Interquartile range (IQR)
- IQR Qu QL
- Easy to compute
- Totally insensitive to extreme values
- How does that compare to the sample standard
deviation?
24Outliers
- Outliers are data points that do not fit the data
set because - There was something wrong with the way the data
was collected or - There was a unique change that made the outlier
too different to compare to other data points or - Any other reason that makes the data points
simply irrelevant to the rest of the data or
problem at hand. - Often the most vital information is in the
outliers! - How do we identify outliers without knowing
anything more than what is included in the data?
25Detection of outliers
- Method of averages z-scores
- Three or more standard deviations away from the
mean indicates an observation may be an outlier. - Method of rank-ordering
- The upper fence UF Qu 1.5IQR
- The lower fence LF QL - 1.5IQR
- Outer fences Add (or subtract) 3 IQRs instead
of 1.5. - Observations outside the fences are potential
outliers. Those outside the outer fences are
almost certainly outliers, but we focus on the
inner fences.
26Graphical SummaryThe Box - Plot
- Several ways to construct one.
- It includes all the pieces
- The median
- The quartiles
- The Upper Lower Fences
- You can identify potential outliers as the
observations outside the fences.
27Problem in class
Find any outliers and draw a box-plot.
28Conclusion
- Objectives addressed
- Learn measures for
- Central tendency
- variability
- and how to interpret those measures.
- Be able to calculate sample mean, sample
variance, sample standard deviation construct a
box-plot with all the components.