Statistics 221 - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Statistics 221

Description:

Example: Apartment Rents. Given below is a sample of monthly rent values ... The range of apartment rents. The range is 615 525 or 190. Inter-quartile range ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 75
Provided by: CBU
Category:

less

Transcript and Presenter's Notes

Title: Statistics 221


1
Statistics 221
  • Chapter 3 Part A
  • Descriptive Statistics

2
Summarizing Data
  • We learned in Chapter 2 that one way to derive
    knowledge (i. e. learn something) is to collect
    data regarding some phenomenon and then summarize
    and analyze it.
  • In chapter 2, we learned about tabular and
    graphical techniques for summarizing data. In
    this chapter, we learn about numeric techniques
    for summarizing data.

3
Numeric techniques for summarizing data
  • Measures of Location (mean, median, mode,
    percentiles, quartiles)
  • Measures of Variability (range, inter-quartile
    range, variance, standard deviation, coefficient
    of variation)
  • Measures of Relative Location (z-scores) and
    Detecting Outliers
  • Exploratory Data Analysis (the 5-number summary
    and box plot)
  • Measures of Association Between Two Variables
    (covariance and correlation coefficient)

4
Parameters vs Statistics
  • If a numerical summary statistic (such as a mean
    or average) is computed from a sample, it is
    referred to as a statistic if it is computed
    from a population, it is referred to as a
    parameter.
  • When a sample set is taken from a population, and
    a statistic is calculated from the sample
    dataset, the sample statistic is considered to be
    a point estimate of the population parameter.

5
Measures of location (aka measures of central
tendency)
  • Here is the five we will learn
  • Mean
  • Median
  • Mode
  • Percentiles
  • Quartiles

6
The mean (average)
  • The mean of a data set is the average of all the
    data values.
  • If the data are from a sample, the mean is
    denoted by ?
  • If the data are from a population, the mean is
    denoted by ? (mu).

?
n
n sample size
N population size
7
Example Apartment Rents
  • Given below is a sample of monthly rent values
    ()
  • for one-bedroom apartments. The data is a sample
    of 70
  • apartments in a particular city. The data are
    presented
  • in ascending order.

8
Calculating the mean rent
  • Add up all the rents and divide by the number of
    rents.
  • The mean is denoted by x (x-bar).

490.8
9
The Median
  • The median of a data set is the value in the
    middle when the data items are arranged in
    ascending order.
  • For an odd number of observations (n), the median
    is the middle value.
  • For an even number of observations (n), the
    median is the average of the two middle values.
  • The median may be reported instead of the mean
    when the data set includes a few extreme values.

10
The median rent
  • i refers to index. Index is the position number
    of a value in a data set that has been arranged
    into ascending order.
  • i 50 70 35
  • Since 70 is even, we average the values in the
    35th and 36th positions Median (475 475)/2
    475

11
The median rent
  • What would be the median rent if n 25?

25 /2 12.5, round up to 13. The 13th value is
440. (The middle value)
12
The Mode
  • The mode of a data set is the value that occurs
    with greatest frequency.
  • The greatest frequency can occur at two or more
    different values.
  • If the data have exactly two modes, the data are
    bimodal.
  • If the data have more than two modes, the data
    are multimodal.

13
The mode rent
  • 450 occurred most frequently (7 times) so the
    Mode 450

14
Percentiles
  • A percentile provides information about where a
    particular value falls in the rankings of all
    data values in the data set.
  • For example, admission test scores for colleges
    and universities are frequently reported in terms
    of percentiles.
  • So if you got a 25 on the ACT, a percentile score
    would tell you what percentage of people did
    worse than you.
  • If your score was in the 70ile, then
    approximately 70 of the students did worse than
    you which means approximately 30 did better.

15
Calculating Percentiles
  • 1. Arrange the data in ascending order.
  • 2. Compute index i, the position of the pth
    percentile.
  • i p n
  • 3a. If i is not an integer, round up to the next
    integer. The p th percentile is the value in the
    i th position.
  • 3b. If i is an integer, the p th percentile is
    the average of the values in positions i and i
    1.

16
Percentiles for Apartment Rents
  • What rent amount is in the 90th Percentile?
  • i p n
  • i .90 70
  • i 63
  • Since i is an integer, we average the numbers in
    the 63rd and 64th positions (580 590)/2 585
  • At least 90 of the apartments have rents of 585
    or less.

17
Similar Percentile Question
  • Here are the scores on the midterm (n12)
  • 70 73 79 82 83 87 88 90 91
    94 98 100
  • If you know that you are in the 80th percentile,
    which of these is your score?
  • i p n
  • i .8 12
  • i 9.6
  • Since i is not an integer, we round up to 10.
  • The number in the 10th position is 94.
  • At least 80 of the scores are less than your
    score of 94.

18
Another Percentile Question
  • Here are the scores on the midterm (n12)
  • 70 73 79 82 83 87 88 90 91
    94 98 100
  • If you got the 79, what percentile are you in?
  • After the dataset is sorted in ascending order,
    count the number of values below 79 and divide
    that by n
  • p below you / n
  • p 2 / 12
  • p 16.7 round to 17th percentile
  • At least 17 of the scores are less than your
    score of 79.

19
Another Percentile Question
  • Here are the scores on the midterm (n12)
  • 70 73 79 82 83 87 88 90 91
    94 98 100
  • If you got the 98, what percentile are you in?

p below / n p 10 / 12 p 86.7 round to
87th percentile
  • At least 87 of the scores are less than your
    score of 98.

20
Quartiles
  • Sometimes statisticians divide datasets into four
    parts called quartiles.
  • Quartiles are specific percentiles
  • First Quartile all the values in the 0-24th
    Percentile
  • Second Quartile all the values in the 25th-49th
    Percentile
  • Third Quartile all the values in the 50th -
    75th Percentile
  • Fourth Quartile all the values in the 76th
    100th percentile.

21
What are the quartile cut-off amounts (Q1, Q2,
Q3)?
iQ1 25th percentile 25 70 17.5 rounded
to 18 so Q1 445 iQ2 50th percentile 50
70 35 averaged with 36 so Q2 (475 475)/2
475 (same as the median) iQ3 75th percentile
75 70 52.5 rounded to 53 so Q3 525
22
What are the quartiles?
1st quartile all rents less than 445 2nd
quartile all rents gt445 and less than 475 3rd
quartile all rents gt475 and less than 525 4th
quartile all rents gt525
23
Open the file DataSetsForCh3 and click on the
worksheet Cereal - centrals (measures of
central tendency).
24
To calculate the mean, first we add up all the
values to get a sum . B18 sum(b2b17)
25
then count the number of values B19
count(b2b17)
26
then divide by the sum by the count of values
E2 b18/b19
27
To calculate the median, find the middle value in
the sorted data set. To sort the dataset,
position the cell pointer on one of the cells in
the dataset. From the menu bar, click Data,
Sort
28
the entire dataset is selected and the sort
window opens. In the sort by box, select Grams
of sugar and make sure ascending is selected,
click ok
29
to find the index of the middle value, divide n
by 2. If n is odd, the quotient will not be an
integer, so round up using the ceiling( )
function... F3 ceiling(B19/2, 1) (If n is even,
n/2 will be an integer and ceiling( ) will not do
any rounding.)
30
to calculate the median, since n is even, we
didnt have to round and i is an integer, so add
the values in positions i and i1 (8 and 9), then
divide by 2 E3 (B9 B10)/2
31
to calculate the mode, identify the values that
occurred most often E4 .13, .43 and .47
32
Excels Built-in Functions
  • Excel has built-in formulas to calculate mean,
    median, and mode
  • average( )
  • median( )
  • mode( )

33
To find what percentile Cocoa Puffs is, count
the number of values below that row and divide by
the number of values and round up E8
ceiling(13/B19, .01) Format that cell to
percentage, 0 decimal places.
34
To find what quartile Cocoa Puffs is, divide
the dataset into 4 quarters and see which quarter
Cocoa Puffs falls into E9 4th You could also
calculate Q3 (the value of the 75ile) and list
all values greater than or equal to that value.
35
To find what cereal is in the 30th percentile
multiply .3 number of values and if i is not an
integer, round up to get i (index or position
number) F12 ceiling((.3 B19), 1) (If i is an
integer, average the ith value with the ith1
value.)
36
i 16 .3 4.8, and rounding up, i 5, we
identify what cereal is listed in that
position E12 Special K
37
To identify the third quartile, calculate Q2 and
Q3, and list the cereals in between We know that
Q2 is the median (.345). To find Q3i, first
multiply n by .75 F13 16 .75
38
since i is an integer (12), average the values
in i and i1 (12th 13th positions) to calculate
Q3 G13 (.44 .45) /2 .445
39
.type in the names of the cereals that have
sugar content that is gt .345 (Q2) and lt .445
(Q3)
Resave this file.
40
When the mean, median, and mode are not
aligned
  • The data is said to be skewed.
  • Data is skewed if it is not symmetric and if it
    extends more to one side than the other.

41
Skewness
Not skewed - symmetric
A few very small values in the data set
A few very large values in the data set
42
Which measure of central tendency should you
regard as most representative of a data set?
  • If there are a few extreme values in your data
    set, extreme values may distort the mean but not
    the median or the mode.
  • Lets say you are a fund-raiser. Your last 10
    donations were
  • 5, 5, 15, 5, 10, 5, 10, 15, 10 and
    1,000.
  • What do you want to tell the next person you
    solicit for a donation?
  • 1. That the average donation is over 100
    (actually its 103.50)
  • 2. The median donation is 10.
  • 3. The mode donation is 5.

43
Which measure of central tendency should you
consider?
  • The median and the mode are often used to
    describe a typical value.
  • Lets say you are thinking about becoming a
    teacher and you are interested in knowing what
    type of starting salary you could expect after
    graduation. Which value might be most meaningful
    to you?
  • 1. The mean starting salary
  • 2. The median starting salary
  • 3. The mode starting salary

44
Measures of Variability
  • It is often desirable to consider measures of
    variability (dispersion) in addition to measures
    of location.
  • For example, in choosing supplier A or supplier B
    we might consider not only the average delivery
    time for each, but also the variability in
    delivery time for each.

45
Measures of Variability
  • Range
  • Inter-quartile Range
  • Variance
  • Standard Deviation
  • Coefficient of Variation

46
The Range
  • The range of a data set is the difference between
    the largest and smallest data values.
  • It is the simplest measure of variability.
  • It is very sensitive to the smallest and largest
    data values.

47
The range of apartment rents
  • The range is 615 525 or 190

48
Inter-quartile range
  • The interquartile range of a data set is the
    difference between the third quartile and the
    first quartile.
  • It is the range for the middle 50 of the data.
  • Examining the inter-quartile range of a dataset
    allows you to get a feel for the middle-range.

49
Example Inter-quartile Range
  • 3rd Quartile (Q3) 525
  • 1st Quartile (Q1) 445
  • Inter-quartile Range Q3 - Q1 525 - 445 80

50
Variance
  • The variance is a measure of variability that
    utilizes all the data.
  • It is based on the difference between the value
    of each observation (xi) and the mean (x for a
    sample, m for a population).

51
Variance
  • The variance is the average of the squared
    differences between each data value and the mean.
  • If the data set is a sample, the variance is
    denoted by s2.
  • If the data set is a population, the variance is
    denoted by ? 2.

52
Standard Deviation
  • The standard deviation of a data set is the
    positive square root of the variance.
  • It is measured in the same units as the data,
    making it more easily comparable, than the
    variance, to the mean.
  • If the data set is a sample, the standard
    deviation is denoted s.
  • If the data set is a population, the standard
    deviation is denoted ? (sigma).

53
Coefficient of variation
  • The coefficient of variation indicates how large
    the standard deviation is in relation to the
    mean.
  • If the data set is a sample, the coefficient of
    variation is computed as follows
  • If the data set is a population, the coefficient
    of variation is computed as follows

54
Calculating the variance, standard deviation, and
coefficient of variation in Excel
  • We will walk through the formulas using the
    Cereal dataset.

55
Open the file DataSetsForCh3 and click on the
worksheet Cereal dispersions (measures of
dispersion).
56
Enter the formula to calculate the mean (x) B18
average(B2B17)
57
Enter the formula to count the number of values
in the data set (n) B19 count(B2B17)
58
Enter the formula to subtract the first xi from
the mean (x) C2 B2 - B18
59
Copy the formula in C2 down to C17 to subtract
all the other xis from the mean (x).
60
Enter the formula to square the first xis
difference from the mean (x) D2 C2 C2
61
Copy the formula in D2 down to D17 to square each
xis difference from the mean (x).
62
Sum all the squares of the xis differences from
the mean (x) D18 sum(D2D17)
63
Calculate the variance by dividing the
sum-of-squares by n-1 D21 D18 / (B19 1)
64
Calculate the standard deviation by taking the
square root of the variance D22 sqrt(D21)
65
Calculate the coefficient of variation by
dividing the standard deviation by the mean D23
D22 / B18 Format the cell to percentage
66
Excels Built-in Formulas
  • Standard deviation of a sample
  • stdev( )
  • Variance of a sample
  • var( )
  • Excel does not provide a built-in formula for the
    coefficient of variation which is rarely used.

67
Excels Descriptive Statistics
  • We can use Excels data analysis tool to generate
    a table of all the descriptive statistics.

68
  • Select all cells in the data set B2B17.
  • From the menu bar, select Tools, Data
    Analysis

69
3. In the data analysis window, select
Descriptive Statistics and click ok
70
4. The input range should be B2B17
Summary statistics should be checked. New
worksheet ply should be selected. Click ok
71
5. See a new sheet created with the descriptive
statistics. Resize columns as necessary
Notice that it did not list all three modes
only the first mode.
72
6. Right-click on the sheet 2 sheet tab and
select Rename
73
7. Type the name Cereal Descriptives and press
enter. Resave the file.
74
Homework 4
  • 7 on page 84
  • Mean, median, 1st and 3rd quartiles, percentile
  • Use data sheet Music
  • 18 on page 92
  • Mean, median, range, std. deviation, coefficient
    of variation, make comparisons
  • Create new worksheet.
Write a Comment
User Comments (0)
About PowerShow.com