Descriptive%20Statistics - PowerPoint PPT Presentation

About This Presentation
Title:

Descriptive%20Statistics

Description:

Descriptive Statistics Lecture 02: Tabular and Graphical Presentation of Data and Measures of Locations Presentation of Qualitative Variables The simplest way of ... – PowerPoint PPT presentation

Number of Views:272
Avg rating:3.0/5.0
Slides: 35
Provided by: Pena150
Category:

less

Transcript and Presenter's Notes

Title: Descriptive%20Statistics


1
Descriptive Statistics
  • Lecture 02
  • Tabular and Graphical Presentation of Data and
    Measures of Locations

2
Presentation of Qualitative Variables
  • The simplest way of presenting/summarizing a
    qualitative variable is by using a frequency
    table, which shows the frequency of occurrence of
    each of the different categories.
  • Such a table could also include the relative
    frequency, which indicates the proportion or
    percentage of occurrence of each of the
    categories.
  • The frequency table could then be pictorially
    represented by a bar graph or a pie diagram.

3
An Example
  • A manufacturer of jeans has plants in California
    (CA), Arizona (AZ), and Texas (TX). A sample of
    25 pairs of jeans was randomly selected from a
    computerized database, and the state in which
    each was produced was recorded. The data are as
    follows
  • CA AZ AZ TX CA CA CA TX TX TX AZ AZ CA AZ TX CA
    AZ TX TX TX CA AZ AZ CA CA
  • Quite uninformative at this stage!
  • Need to summarize to reveal information.

4
The Frequency Table
5
The Bar Chart
Frequency
10
5
0
TX
CA
AZ
6
Example continued
  • By looking at this frequency table and bar graph,
    one is able to obtain the information that there
    seems to be equal proportions of pairs of jeans
    being manufactured in the three states.
  • Frequency table and bar graph certainly more
    informative than the raw presentation of the
    sample data.
  • Another method of pictorial presentation of
    qualitative data is by using the pie diagram. In
    this case a pie is divided into the categories
    with a given categorys angle being equal to 360
    degrees times the relative frequency of
    occurrence of that category.

7
Pie Diagram
Angles (in degrees) CA(360)(.36)129.6 AZ(360)(
.32)115.2 TX(360)(.32)115.2
129.6o
115.2o
115.2o
8
Pie Chart from Minitab
9
Presentation of Quantitative Variables
  • When the quantitative variable is discrete (such
    as counts), a frequency table and a bar graph
    could also be used for summarizing it.
  • Only difference is that the values of the
    variables could not be reshuffled in the graph,
    in contrast to when the variable is categorical
    or qualitative.
  • For example suppose that we asked a sample 20
    students about the number of siblings in their
    family. The sample data might be
  • 4, 1, 6, 2, 2, 3, 4, 1, 2, 2, 3, 7, 2, 1, 1, 5,
    3, 4, 6, 3

10
Its Bar Graph is
11
An Example of a Real Data Set Poverty versus
PACT in SC
74 48 54 77 43 55 94 41 62 88 49 62 78 50 59 79 46
58 61 41 47 45 26 34 87 49 62 68 36 52 76 45 56 3
2 22 31 63 39 53 33 20 26 64 44 53 39 20 22 37 21
27 47 23 30 40 29 41 43 25 27 37 24 31 64 37 43 59
36 45 70 32 41 55 37 46 90 38 47 45 32 35 31 25 2
4 35 29 32 15 14 18
73 30 41 31 24 30 75 45 57 57 29 40 80 51 63 54 30
44 67 28 33 76 45 50 87 61 61 54 27 33 60 32 41 3
5 26 35 51 29 36 50 35 42 43 23 26 66 32 44 86 63
75 54 25 33 87 60 69 49 29 37 46 38 43 50 38 44 57
40 50 90 60 75 26 17 20 47 23 27 53 37 39 58 34 4
3 16 13 15
Lunch ActualLang ActualMath 59 32 38 46 26 30 90 6
3 67 29 17 24 41 24 26 51 30 41 41 25 30 43 32 36
70 33 36 93 50 66 84 50 66 64 27 32 52 36 43 50 31
43 53 28 35 78 36 41 57 31 42 51 39 42 55 41 53 6
0 37 45 96 46 66 75 34 45 60 29 36 71 43 53 68 42
51 76 47 52 82 49 55
12
Frequency Tables and Histograms
Consider the variable Lunch, which represents
the percentage of students in the school district
whose lunches are not free. The higher the value
of this variable, the richer the district. n
Number of Observations 86 LV Lowest Value
15 HV Highest Value 96 Let us construct a
frequency table with classes 10,20), 20,30),
30,40), , 90,100)
13
Frequency Table for Variable Lunch
14
Frequency Histogram
15
Stem-and-Leaf Plots
  • An important tool for presenting quantitative
    data when the sample size is not too large is via
    a stem-and-leaf plot.
  • By using this method, there is usually no loss
    of information in that the exact values of the
    observations could be recovered (in contrast to a
    frequency table for continuous data).
  • Basic idea To divide each observation into a
    stem and a leaf.
  • The stems will serve as the body of the plant
    while the leaves will serve as the branches or
    leaves of the plant.
  • An illustration makes the idea transparent.

16
An Example
  • A random sample of 30 subjects from the 1910
    subjects in the blood pressure data set was
    selected. We present here the systolic blood
    pressures of these 30 subjects.
  • 30 Systolic Blood Pressures 122 135 110 126 100
    110 110 126 94 124 108 110 92 98 118 110 102 108
    126 104 110 120 110 118 100 110 120 100 120 92
  • Lowest Value 92, Highest Value 135
  • Stems 9,10, 11, 12, 13
  • Leaves Ones Digit

17
Stem-and-Leaf Plot
  • 9 224
  • 9 8
  • 10 00024
  • 10 88
  • 11 00000000
  • 11 88
  • 12 00024
  • 12 666
  • 13
  • 13 5

18
Stem-and-Leaf continued
  • In this stem-and-leaf plot, because there will
    only be 5 stems if we use 9, 10, 11, 12, 13, we
    decided to subdivide each stem into two parts
    corresponding to leaf values lt 4, and those gt
    5.
  • Such a procedure usually produces better looking
    distributions.
  • Looking at this stem-and-leaf plot, notice that
    many of the observations are in the range of
    100-126.
  • The exact values could be recovered from this
    plot.
  • By arranging the leaves in ascending order, the
    plot also becomes more informative.

19
Comparative Stem-and-Leaf Plots
  • When comparing the distributions of two groups
    (e.g., when classified according to GENDER),
    side-by-side stem-and-leaf plots (also
    side-by-side histograms) could be used.
  • To illustrate, consider 30 observations from the
    blood pressure data set with Gender and Systolic
    Blood Pressure being the observed variables.
  • For the males (Sex 0) 122, 120, 130, 110, 134,
    136, 142, 100, 120, 162, 126, 132, 124, 130
  • For the females (Sex 1) 132, 94, 104, 100,
    130, 110, 102, 110, 130, 92, 125, 108, 100, 130,
    100, 100

20
Comparing Male/Female Systolic Blood Pressures
21
Scatterplots Studying Relationship Between
Poverty and Math
Question What kind of relationship is there
between Lunch and PACT Math Scores?
22
Numerical Summary Measures
  • Overview
  • Why do we need numerical summary measures?
  • Measures of Location
  • Measures of Variation
  • Measures of Position
  • Box Plots

23
Why we Need Summary Measures?
  • A picture is worth a thousand words, but beauty
    is always in the eyes of the beholder!
  • Graphs or pictures sometimes unwieldy
  • Usually wants a small set of numbers that could
    provide the important features of the data set
  • When making decisions, objectivity is enhanced
    when they are based on numbers!
  • Numerical summaries and tabular/graphical
    presentations complement each other

24
The Setting
  • In defining and illustrating our summary
    measures, assume that we have sample data
  • Sample Data X1, X2, X3, , Xn
  • Sample Size n
  • These summary measures are thus (sample)
    statistics.
  • If instead they are based on the population
    values, they will be (population) parameters.

25
Measures of Location or Center
  • These are summary measures that provide
    information on the center of the data set
  • Usually, these measures of location are where the
    observations cluster, but not always
  • In laymans terms, these measures are what we
    associate with averages
  • Will discuss two measures sample mean and sample
    median

26
Sample Mean or Arithmetic Average
  • The sample mean equals the sum of the
    observations divided by the number of
    observations.
  • It is defined symbolically via

27
Properties of the Sample Mean
  • Center of Gravity
  • Sum of the deviations of the observations from
    the mean is always zero (barring rounding errors)
  • Sample mean could however be affected drastically
    by extreme or outliers
  • The sample mean is very conducive to mathematical
    analysis compared to other measures of location

28
Illustration
  • Consider the systolic blood pressure data set
    considered in Lecture 01
  • Sample Size n 30
  • Data 122, 135, 110, 126, 100, 110, 110, 126, 94,
    124, 108, 110, 92, 98, 118, 110, 102, 108, 126,
    104, 110, 120, 110, 118, 100, 110, 120, 100, 120,
    92

29
Sample Mean Computation
  • This value of 111.1 could be interpreted as the
    balancing point of the 30 systolic blood pressure
    observations.
  • Locating this in the histogram we have

30
Sample Mean in Histogram
31
Sample Median
  • Sample median (M) value that divides the
    arranged/ordered data set into two equal parts.
  • At least 50 are lt M and at least 50 are gt M
  • Not sensitive to outliers but harder to deal with
    mathematically
  • Appropriate when histogram is left or
    right-skewed
  • Better to present both mean and median in practice

32
Illustration of Computation of Median
  • Consider again the blood pressure data earlier.
  • n30 an even number.
  • Median will be the average of the 15th and 16th
    observations in arranged data.
  • Arranged data 92, 92, 94, 98, 100, 100, 100,
    102, 104, 108, 108, 110, 110, 110, 110, 110, 110,
    110, 110, 118, 118, 120, 120, 120, 122, 124, 126,
    126, 126, 135

33
Continued ...
  • The sample median is the average of 110 and 110,
    which are the 15th and 16th observations in the
    arranged data.
  • The median equals 110.
  • Note that it is very close to the sample mean
    value of 111.1
  • This closeness is because of the near symmetry of
    the distribution

34
Relative Positions of Mean and Median
  • For symmetric distributions, the mean and the
    median coincide.
  • For right-skewed distributions, the mean tends to
    be larger than the median (mean pulled up by the
    large extreme values)
  • For left-skewed distributions, the mean tends to
    be smaller than the median (mean pulled down by
    the small extreme values)
Write a Comment
User Comments (0)
About PowerShow.com