Data Description - PowerPoint PPT Presentation

About This Presentation
Title:

Data Description

Description:

Basic Vocabulary ... In statistics the basic rounding rule is that when computations are done in the ... Distribution Shapes ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 36
Provided by: alyson9
Category:

less

Transcript and Presenter's Notes

Title: Data Description


1
CHAPTER 3
Data Description
2
Objectives
  • Summarize data using measures of central
    tendency, such as the mean, median, mode, and
    midrange.
  • Describe data using the measures of variation,
    such as the range, variance, and standard
    deviation.
  • Identify the position of a data value in a data
    set using various measures of position, such as
    percentiles, deciles, and quartiles.

3
Objectives (contd.)
  • Use the techniques of exploratory data analysis,
    including boxplots and five-number summaries to
    discover various aspects of data.

4
Introduction
  • Statistical methods can be used to summarize
    data.
  • Measures of average are also called measures of
    central tendency and include the mean, median,
    mode, and midrange.
  • Measures that determine the spread of data values
    are called measures of variation or measures of
    dispersion and include the range, variance, and
    standard deviation.

5
Introduction (contd.)
  • Measures of position tell where a specific data
    value falls within the data set or its relative
    position in comparison with other data values.
  • The most common measures of position are
    percentiles, deciles, and quartiles.

6
Introduction (contd.)
  • The measures of central tendency, variation, and
    position are part of what is called traditional
    statistics. This type of data is typically used
    to confirm conjectures about the data.

7
Introduction (contd.)
  • Another type of statistics is called exploratory
    data analysis (EDA). These techniques include the
    the box plot and the five-number summary. They
    can be used to explore data to see what they show.

8
Basic Vocabulary
  • A statistic is a characteristic or measure
    obtained by using the data values from a sample.
  • A parameter is a characteristic or measure
    obtained by using all the data values for a
    specific population.
  • When the data in a data set is ordered it is
    called a data array.

9
General Rounding Rule
  • In statistics the basic rounding rule is that
    when computations are done in the calculation,
    rounding should not be done until the final
    answer is calculated.

0
0
0
10
(3.2) The Arithmetic Average
  • The mean is the sum of the values divided by the
    total number of values.
  • Rounding rule the mean should be rounded to one
    more decimal place than occurs in the raw data.
  • The type of mean that considers an additional
    factor is called the weighted mean.

11
The Arithmetic Average
  • The Greek letter ? (mu) is used to represent the
    population mean.
  • The symbol (x-bar) represents the sample
    mean.
  • Assume that data are obtained from a sample
    unless otherwise specified.

12
Median and Mode
  • The median is the halfway point in a data set.
    The symbol for the median is MD.
  • The median is found by arranging the data in
    order and selecting the middle point.
  • The value that occurs most often in a data set is
    called the mode.
  • The mode for grouped data, or the class with the
    highest frequency, is the modal class.

13
Midrange
  • The midrange is defined as the sum of the lowest
    and highest values in the data set divided by 2.
  • The symbol for midrange is MR.

14
Central Tendency The Mean
  • One computes the mean by using all the values of
    the data.
  • The mean varies less than the median or mode when
    samples are taken from the same population and
    all three measures are computed for these
    samples.
  • The mean is used in computing other statistics,
    such as variance.

15
Central Tendency The Mean (contd.)
  • The mean for the data set is unique, and not
    necessarily one of the data values.
  • The mean cannot be computed for an open-ended
    frequency distribution.
  • The mean is affected by extremely high or low
    values and may not be the appropriate average to
    use in these situations.

16
Central Tendency The Median
  • The median is used when one must find the center
    or middle value of a data set.
  • The median is used when one must determine
    whether the data values fall into the upper half
    or lower half of the distribution.
  • The median is used to find the average of an
    open-ended distribution.
  • The median is affected less than the mean by
    extremely high or extremely low values.

17
Central Tendency The Mode
  • The mode is used when the most typical case is
    desired.
  • The mode is the easiest average to compute.
  • The mode can be used when the data are nominal,
    such as religious preference, gender, or
    political affiliation.
  • The mode is not always unique. A data set can
    have more than one mode, or the mode may not
    exist for a data set.

18
Central Tendency The Midrange
  • The midrange is easy to compute.
  • The midrange gives the midpoint.
  • The midrange is affected by extremely high or low
    values in a data set.

19
Distribution Shapes
  • In a positively skewed or right skewed
    distribution, the majority of the data values
    fall to the left of the mean and cluster at the
    lower end of the distribution.

20
Distribution Shapes (contd.)
  • In a symmetrical distribution, the data values
    are evenly distributed on both sides of the mean.

21
Distribution Shapes (contd.)
  • When the majority of the data values fall to the
    right of the mean and cluster at the upper end of
    the distribution, with the tail to the left, the
    distribution is said to be negatively skewed or
    left skewed.

22
The Range
  • The range is the highest value minus the lowest
    value in a data set.
  • The symbol R is used for the range.

23
(3.3) Variance and Standard Deviation
  • The variance is the average of the squares of the
    distance each value is from the mean. The symbol
    for the population variance is ?2.

24
Variance and Standard Deviation
  • The standard deviation is the square root of the
    variance. The symbol for the population standard
    deviation is ?. Rounding rule The final answer
    should be rounded to one more decimal place than
    the original data.

25
Coefficient of Variation
  • The coefficient of variation is the standard
    deviation divided by the mean. The result is
    expressed as a percentage.
  • The coefficient of variation is used to compare
    standard deviations when the units are different
    for the two variables being compared.

26
Variance and Standard Deviation
  • Variances and standard deviations can be used to
    determine the spread of the data. If the variance
    or standard deviation is large, the data are more
    dispersed. The information is useful in comparing
    two or more data sets to determine which is more
    variable.
  • The measures of variance and standard deviation
    are used to determine the consistency of a
    variable.

27
Variance and Standard Deviation (contd.)
  • The variance and standard deviation are used to
    determine the number of data values that fall
    within a specified interval in a distribution.
  • The variance and standard deviation are used
    quite often in inferential statistics.

28
Chebyshevs Theorem
  • The proportion of values from a data set that
    will fall within k standard deviations of the
    mean will be at least 1 1/k2 where k is a
    number greater than 1.
  • This theorem applies to any distribution
    regardless of its shape.

29
Empirical Rule for Normal Distributions
  • The following apply to a bell-shaped
    distribution.
  • Approximately 68 of the data values fall within
    one standard deviation of the mean.
  • Approximately 95 of the data values fall within
    two standard deviations of the mean.
  • Approximately 99.75 of the data values fall
    within three standard deviations of the mean.

30
Standard Scores
  • A standard score or z score is used when direct
    comparison of raw scores is impossible.
  • A standard score or z score for a value is
    obtained by subtracting the mean from the value
    and dividing the result by the standard
    deviation.

31
Percentiles
  • Percentiles are position measures used in
    educational and health-related fields to indicate
    the position of an individual in a group.
  • A percentile, P, is an integer between 1 and 99
    such that the Pth percentile is a value where P
    of the data values are less than or equal to the
    value and 100 P of the data values are
    greater than or equal to the value.

32
Quartiles and Deciles
  • Quartiles divide the distribution into four
    groups, denoted by Q1, Q2, Q3. Note that Q1 is
    the same as the 25th percentile Q2 is the same
    as the 50th percentile or the median and Q3
    corresponds to the 75th percentile.
  • Deciles divide the distribution into 10 groups.
    They are denoted by D1, D2, , D10.

33
Outliers
  • An outlier is an extremely high or an extremely
    low data value when compared with the rest of the
    data values.
  • Outliers can be the result of measurement or
    observational error.
  • When a distribution is normal or bell-shaped,
    data values that are beyond three standard
    deviations of the mean can be considered
    suspected outliers.

34
Exploratory Data Analysis
  • The purpose of exploratory data analysis is to
    examine data in order to find out what
    information can be discovered. For example
  • Are there any gaps in the data?
  • Can any patterns be discerned?

35
Boxplots and Five-Number Summaries
  • Boxplots are graphical representations of a
    five-number summary of a data set. The five
    specific values that make up a five-number
    summary are
  • The lowest value of data set (minimum)
  • Q1 (or 25th percentile)
  • The median (or 50th percentile)
  • Q3 (or 75th percentile)
  • The highest value of data set (maximum)
Write a Comment
User Comments (0)
About PowerShow.com