Numerical descriptions of distributions - PowerPoint PPT Presentation

About This Presentation
Title:

Numerical descriptions of distributions

Description:

... mean with calculators & CrunchIt ... s=189.24 calories. Be sure you know how to compute the standard. deviation with CrunchIt and with your calculator since ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 20
Provided by: frie9
Learn more at: http://people.uncw.edu
Category:

less

Transcript and Presenter's Notes

Title: Numerical descriptions of distributions


1
Numerical descriptions of distributions
  • Describe the shape, center, and spread of a
    distribution for shape, see slide 6 below...
  • Center mean and median
  • Spread range, IQR, standard deviation
  • We treat these as aids to understanding the
    distribution of the variable at hand
  • The mean is often called the "average" and is in
    fact the arithmetic average ("add all the values
    and divide by the number of observations").

2
Mathematical notation
Learn right away how to get the mean with
calculators JMP
3
Your numerical summary must be meaningful!
The distribution of womens heights appears
coherent and symmetrical. The mean is a good
numerical summary.
4
58 60 62 64 66
68 70 72 74 76
78 80 82 84
A single numerical summary here would not make
sense.
5
  • The Median (M) is often called the "middle" value
    and is the value at the midpoint of the
    observations when they are ranked from smallest
    to largest value.
  • arrange the data from smallest to largest
  • if n is odd then the median is the single
    observation in the center (at the (n1)/2
    position in the ordering)
  • if n is even then the median is the average of
    the two middle observations (at the (n1)/2
    position i.e., in between)
  • In Table 1.10, calculate the
    mean and median for the 2-
    seater cars' city m.p.g. to see
    that the mean is more
  • sensitive to outliers than
  • the median (use TI-83)
  • Also, try with JMP

6
Skewness
Mode Mean Median
SYMMETRIC
Mean
Mean
Mode
Mode
Median
Median
SKEWED LEFT (negatively)
SKEWED RIGHT (positively)
7
Mean and median of a distribution with outliers
Percent of people dying

8
Impact of skewed data
9
  • Spread percentiles, quartiles (Q1 and Q3), IQR,
  • 5-number summary (and boxplots), range, standard
    deviation
  • pth percentile of a variable is a data value such
    that p of the values of the variable are less
    than or equal to it.
  • the lower (Q1) and upper (Q3) quartiles are
    special percentiles dividing the data into
    quarters (fourths). get them by finding the
    medians of the lower and upper halfs of the data
  • IQR interquartile range Q3 - Q1 spread of
    the middle 50 of the data. IQR is used with the
    so-called 1.5IQR criterion for outliers - know
    this!

10
Measure of spread the quartiles
The first quartile, Q1, is the value in the
sample that has 25 of the data less than or
equal to it (? it is the median of the lower half
of the sorted data, excluding M). The third
quartile, Q3, is the value in the sample that has
75 of the data less than or equal to it (? it is
the median of the upper half of the sorted data,
excluding M).
Q1 first quartile 2.2
M median 3.4
Q3 third quartile 4.35
11
Five-number summary and boxplot
Largest max 6.1
BOXPLOT
Q3 third quartile 4.35
M median 3.4
Q1 first quartile 2.2
Five-number summary min Q1 M Q3 max
Smallest min 0.6
12
Boxplots for skewed data
Comparing box plots for a normal and a
right-skewed distribution
Boxplots remain true to the data and depict
clearly symmetry or skew.
13
  • 5-number summary min. , Q1, median, Q3, max
  • when plotted, the 5-number summary is a boxplot
    we can also do a modified boxplot to show
    outliers (mild and extreme). Boxplots have less
    detail than histograms and are often used for
    comparing distributions e.g., Fig. 1.19, p.37
    and below...

14
Distance to Q3 7.9 - 4.35 3.55
Q3 4.35
Interquartile range Q3 Q1 4.35 - 2.2 2.15
Q1 2.2
Individual 25 has a value of 7.9 years, which is
3.55 years above the third quartile. This is more
than 3.225 years, 1.5 IQR. Thus, individual 25
is an outlier by our 1.5 IQR rule.
15
(No Transcript)
16
Look at Example 1.19 on page 41(section 1.2,
8/11) see Fig. 1.21 for a graph of deviations
from the mean... metabolic rates for 7 men in a
dieting study 1792, 1666, 1362, 1614, 1460,
1867, 1439. Mean1600 cals., s189.24 calories.
Be sure you know how to compute the
standard deviation with JMP and with your
calculator since its almost never done by hand
with the previous pages formula...
17
  • why do we square the deviations? - two technical
    reasons that we'll see when we discuss the normal
    distribution in the next section
  • why do we use the standard deviation (s) instead
    of the variance (s2)? s2 has units which are the
    squares of the original units of the data
  • why do we divide by n-1 instead of n? n-1 is
    called the number of degrees of freedom since
    the sum of the deviations is zero, the last
    deviation can always be found if we know n-1 of
    them be careful when using the TI-83 since it
    calculates both division by n and n-1
  • which measure of spread is best? 5-number summary
    is better than the mean and s.d. for skewed data
    - use mean s.d. for symmetric data

18
What should you use, when, and why?
  • Arithmetic mean or median?
  • Middletown is considering imposing an income tax
    on citizens. City hall wants a numerical summary
    of its citizens income to estimate the total tax
    base.
  • In a study of standard of living of typical
    families in Middletown, a sociologist makes a
    numerical summary of family income in that city.
  • Mean Although income is likely to be
    right-skewed, the city government wants to know
    about the total tax base.
  • Median The sociologist is interested in a
    typical family and wants to lessen the impact
    of extreme incomes.

19
  • Finish reading section 1.2
  • Be sure to go over the Summary at the end of each
    section and know all the terminology
  • Do 1.56, 1.62-1.64, 1.67, 1.69, 1.75-1.77
    (Mean/Median Applet), 1.78, 1.79
  • use JMP for any problem requiring more than very
    simple computations or use the TI-83 for
    numerical (but not graphical) analysis...
Write a Comment
User Comments (0)
About PowerShow.com