Chapter 3 Descriptive Statistics: Numerical Methods Part B - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Chapter 3 Descriptive Statistics: Numerical Methods Part B

Description:

Descriptive Statistics: Numerical Methods Part B Measures of Relative Location and Detecting Outliers Exploratory Data Analysis Measures of Association Between Two ... – PowerPoint PPT presentation

Number of Views:206
Avg rating:3.0/5.0
Slides: 33
Provided by: Chan3171
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3 Descriptive Statistics: Numerical Methods Part B


1
Chapter 3 Descriptive Statistics Numerical
MethodsPart B
  • Measures of Relative Location and Detecting
    Outliers
  • Exploratory Data Analysis
  • Measures of Association Between Two Variables
  • The Weighted Mean and
  • Working with Grouped Data

2
Measures of Relative Locationand Detecting
Outliers
  • z-Scores
  • Chebyshevs Theorem
  • Empirical Rule
  • Detecting Outliers

3
z-Scores
  • The z-score is often called the standardized
    value.
  • It denotes the number of standard deviations a
    data value xi is from the mean.
  • A data value less than the sample mean will have
    a z-score less than zero.
  • A data value greater than the sample mean will
    have a z-score greater than zero.
  • A data value equal to the sample mean will have a
    z-score of zero.

4
Example Apartment Rents
  • z-Score of Smallest Value (425)
  • Standardized Values for Apartment Rents

5
Chebyshevs Theorem
  • At least (1 - 1/z2) of the items in any data
    set will be
  • within z standard deviations of the mean, where z
    is
  • any value greater than 1.
  • At least 75 of the items must be within
  • z 2 standard deviations of the mean.
  • At least 89 of the items must be within
  • z 3 standard deviations of the mean.
  • At least 94 of the items must be within
  • z 4 standard deviations of the mean.

6
Example Apartment Rents
  • Chebyshevs Theorem
  • Let z 1.5 with 490.80 and s 54.74
  • At least (1 - 1/(1.5)2) 1 - 0.44 0.56 or
    56
  • of the rent values must be between
  • - z(s) 490.80 - 1.5(54.74) 409
  • and
  • z(s) 490.80 1.5(54.74) 573

7
Example Apartment Rents
  • Chebyshevs Theorem (continued)
  • Actually, 86 of the rent values
  • are between 409 and 573.

8
Empirical Rule
  • For data having a bell-shaped
    distribution
  • Approximately 68 of the data values will be
    within one standard deviation of the mean.

9
Empirical Rule
  • For data having a bell-shaped distribution
  • Approximately 95 of the data values will be
    within two standard deviations of the mean.

10
Empirical Rule
  • For data having a bell-shaped distribution
  • Almost all (99.7) of the items will be
    within three standard deviations of the mean.

11
Example Apartment Rents
  • Empirical Rule
  • Interval in Interval
  • Within /- 1s 436.06 to 545.54 48/70 69
  • Within /- 2s 381.32 to 600.28 68/70 97
  • Within /- 3s 326.58 to 655.02 70/70 100

12
Detecting Outliers
  • An outlier is an unusually small or unusually
    large value in a data set.
  • A data value with a z-score less than -3 or
    greater than 3 might be considered an outlier.
  • It might be
  • an incorrectly recorded data value
  • a data value that was incorrectly included in the
    data set
  • a correctly recorded data value that belongs in
    the data set

13
Example Apartment Rents
  • Detecting Outliers
  • The most extreme z-scores are -1.20 and 2.27.
  • Using z gt 3 as the criterion for an outlier,
  • there are no outliers in this data set.
  • Standardized Values for Apartment Rents

14
Exploratory Data Analysis
  • Five-Number Summary
  • Box Plot

15
Five-Number Summary
  • Smallest Value
  • First Quartile
  • Median
  • Third Quartile
  • Largest Value

16
Example Apartment Rents
  • Five-Number Summary
  • Lowest Value 425 First Quartile 450
  • Median 475
  • Third Quartile 525 Largest Value 615

17
Box Plot
  • A box is drawn with its ends located at the first
    and third quartiles.
  • A vertical line is drawn in the box at the
    location of the median.
  • Limits are located (not drawn) using the
    interquartile range (IQR).
  • The lower limit is located 1.5(IQR) below Q1.
  • The upper limit is located 1.5(IQR) above Q3.
  • Data outside these limits are considered
    outliers.
  • continued

18
Box Plot (Continued)
  • Whiskers (dashed lines) are drawn from the ends
    of the box to the smallest and largest data
    values inside the limits.
  • The locations of each outlier is shown with the
    symbol .

19
Example Apartment Rents
  • Box Plot
  • Lower Limit Q1 - 1.5(IQR) 450 - 1.5(75)
    337.5
  • Upper Limit Q3 1.5(IQR) 525 1.5(75)
    637.5
  • There are no outliers.

575
600
625
375
400
450
500
525
425
475
550
20
Measures of Association Between Two Variables
  • Covariance
  • Correlation Coefficient

21
Covariance
  • The covariance is a measure of the linear
    association between two variables.
  • Positive values indicate a positive relationship.
  • Negative values indicate a negative relationship.

22
Covariance
  • If the data sets are samples, the covariance is
    denoted by sxy.
  • If the data sets are populations, the covariance
    is denoted by .

23
Correlation Coefficient
  • The coefficient can take on values between -1 and
    1.
  • Values near -1 indicate a strong negative linear
    relationship.
  • Values near 1 indicate a strong positive linear
    relationship.
  • If the data sets are samples, the coefficient is
    rxy.
  • If the data sets are populations, the coefficient
    is .

24
The Weighted Mean andWorking with Grouped Data
  • Weighted Mean
  • Mean for Grouped Data
  • Variance for Grouped Data
  • Standard Deviation for Grouped Data

25
Weighted Mean
  • When the mean is computed by giving each data
    value a weight that reflects its importance, it
    is referred to as a weighted mean.
  • In the computation of a grade point average
    (GPA), the weights are the number of credit hours
    earned for each grade.
  • When data values vary in importance, the analyst
    must choose the weight that best reflects the
    importance of each value.

26
Weighted Mean
  • x ? wi xi
  • ? wi
  • where
  • xi value of observation i
  • wi weight for observation i

27
Grouped Data
  • The weighted mean computation can be used to
    obtain approximations of the mean, variance, and
    standard deviation for the grouped data.
  • To compute the weighted mean, we treat the
    midpoint of each class as though it were the mean
    of all items in the class.
  • We compute a weighted mean of the class midpoints
    using the class frequencies as weights.
  • Similarly, in computing the variance and standard
    deviation, the class frequencies are used as
    weights.

28
Mean for Grouped Data
  • Sample Data
  • Population Data
  • where
  • fi frequency of class i
  • Mi midpoint of class i

29
Example Apartment Rents
  • Given below is the previous sample of monthly
    rents
  • for one-bedroom apartments presented here as
    grouped
  • data in the form of a frequency distribution.

30
Example Apartment Rents
  • Mean for Grouped Data
  • This approximation
    differs by 2.41 from
  • the actual sample mean of
    490.80.

31
Variance for Grouped Data
  • Sample Data
  • Population Data

32
Example Apartment Rents
  • Variance for Grouped Data
  • Standard Deviation for Grouped Data
  • This approximation differs by only .20
  • from the actual standard deviation of 54.74.
Write a Comment
User Comments (0)
About PowerShow.com