Descriptive statistics (Part I) - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Descriptive statistics (Part I)

Description:

... randomly selects 20 winter days and records the daily high temperature ... to a question asked in a survey of 20 college students majoring in business ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 30
Provided by: Pin113
Category:

less

Transcript and Presenter's Notes

Title: Descriptive statistics (Part I)


1
Lecture 2
  • Descriptive statistics (Part I)

2
Lecture 2 Descriptive statistics
  • Data in raw form are usually not easy to use for
    decision making
  • Some type of organization is needed
  • Table
  • Graph
  • Techniques reviewed here
  • Bar charts and pie charts
  • Ordered array
  • Stem-and-leaf display
  • Frequency distributions, histograms
  • Cumulative distributions
  • Contingency tables

3
Tabulating and Graphing Univariate Categorical
Data
Categorical Data
Graphing Data
Tabulating Data
Pie Charts
Summary Table
Bar Charts
4
Summary Table(for an Investors Portfolio)
Investment Category Amount Percentage (in
thousands ) Stocks 46.5
42.27 Bonds 32 29.09 CD
15.5 14.09 Savings 16
14.55 Total 110 100
Variables are Categorical
5
Bar Chart(for an Investors Portfolio)
6
Pie Chart (for an Investors Portfolio)
Amount Invested in K
Savings 15
Stocks 42
CD 14
Percentages are rounded to the nearest percent
Bonds 29
7
Organizing Numerical Data
Numerical Data
41, 24, 32, 26, 27, 27, 30, 24, 38, 21
Frequency Distributions Cumulative
Distributions
Ordered Array
Stem and Leaf Display
2 144677 3 028 4 1
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Histograms
Tables
8
The Ordered Array
  • Data in raw form (as collected)
  • 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
  • Data in ordered array from smallest to largest
    21, 24, 24, 26, 27, 27, 30, 32, 38, 41
  • Shows range (min to max)
  • May help identify outliers (unusual
    observations)
  • If the data set is large, the ordered array is
    less useful

9
Stem-and-Leaf Display
  • A simple way to see distribution details in a
    data set
  • METHOD Separate the sorted data series
    into leading digits (the stem) and
    the trailing digits (the leaves)

10
Example
  • Data in Raw Form (as Collected) 24, 26, 24,
    21, 27, 27, 30, 41, 32, 38
  • Data in Ordered Array from Smallest to Largest
    21, 24, 24, 26, 27, 27, 30, 32, 38, 41
  • Stem-and-Leaf Display

2 1 4 4 6 7 7
3 0 2 8
4 1
11
Tabulating Numerical Data Frequency Distributions
  • What is a Frequency Distribution?
  • A frequency distribution is a list or a table
  • containing class groupings (ranges within which
    the data fall) ...
  • and the corresponding frequencies with which data
    fall within each grouping or category
  • It allows for a quick visual interpretation of
    the data

12
Tabulating Numerical Data Frequency Distributions
  • Example A manufacturer of insulation randomly
    selects 20 winter days and records the daily high
    temperature
  • 24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
  • 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

13
  • Sort Raw Data on days in Ascending Order12, 13,
    17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38,
    41, 43, 44, 46, 53, 58
  • Find Range 58 - 12 46
  • Select Number of Classes 5 (usually between 5
    and 15)
  • Compute Class Interval (Width) 10 (46/5 then
    round up)
  • Determine Class Boundaries (Limits)10, 20, 30,
    40, 50, 60
  • Count Observations Assign to Classes

14
Frequency Distributions, Relative Frequency
Distributions and Percentage Distributions
Data in Ordered Array 12, 13, 17, 21, 24, 24,
26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46,
53, 58
Relative Frequency
Percentage
Class Frequency
10, 20) 3
.15 15 20, 30) 6
.30 30 30,
40) 5 .25
25 40, 50)
4 .20
20 50, 60) 2 .10
10 Total 20
1 100
15
Graphing Numerical Data The Histogram
  • A graph of the data in a frequency distribution
    is called a histogram
  • The class boundaries (or class midpoints) are
    shown on the horizontal axis
  • the vertical axis is either frequency, relative
    frequency, or percentage
  • Bars of the appropriate heights are used to
    represent the number of observations within each
    class

16
Histogram Example
Class Midpoint
Class
Frequency
10, 20) 15
3 20, 30) 25
6 30, 40) 35
5 40, 50) 45
4 50, 60) 55
2
(No gaps between bars)
Class Midpoints
17
Tabulating Numerical Data Cumulative Frequency
Data in Ordered Array 12, 13, 17, 21, 24, 24,
26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46,
53, 58
Upper Cumulative Cumulative Limit
Frequency Frequency 10
0
0 20 3
15 30 9
45 40 14
70 50
18 90 60
20 100
18
Two categorical variables (contingency table)
  • The following data represent the responses to a
    question asked in a survey of 20 college students
    majoring in business
  • What is your gender? (Male M Female F)
  • What is your major? (Accountancy A
    Information System I Market M)
  • Gender M M M F M F F M F M F M M M M F F M
    F F
  • Major A I I M A I A A I I A A A M I
    M A A A I

19
Contingency table (contd)
  • Raw data set
  • Gender M M M F M F F M F M F M M M M
    F F M F F
  • Major A I I M A I A A I I A A
    A M I M A A A I

A I M Total
Male 6 4 1 11
Female 4 3 2 9
Total 10 7 3 20
20
Graphical methods are
  • Good in presenting data
  • Not easy for comparison
  • Difficult to use for statistical inference

21
Numerical description
Summary Measures
Variation
Central Tendency (location measures)
Quartiles
Range
Mean
Median
Mode
Variance
Interquartile range
Standard Deviation
22
Mean
  • Mean (Arithmetic Mean) of Data Values
  • Sample mean
  • Population mean

Sample Size
Population Size
23
An example
  • TV watching hours/week 5, 7, 3, 38, 7
  • Mean (5 7 3 38 7)/5 60/5 12
  • If the correct time for 4th subject is 8 (not 38)
  • Mean (5 7 3 8 7)/5 30/5 6

3 5 6 7 8
3 5 7 12
38
Mean 6
Mean 12
24
Mean (Contd)
  • The Most Common Measure of Central Tendency
    especially when n is large due to its good
    theoretical properties
  • Affected by Extreme Values (Outliers)

25
Median
  • Robust measure of central tendency
  • Not affected by extreme values
  • In an ordered array, the median is the middle
    number
  • If n is odd, the median is the middle number
    (i.e,(n1)/2 th measurement)
  • If n is even, the median is the average of the
    n/2 th and (n/2 1) th measurement

3 5 7 8
3 5 7
38
Median 7
Median 7
26
Mode
  • A Measure of Central Tendency
  • Value that Occurs Most Often
  • Not Affected by Extreme Values
  • There May Not Be a Mode
  • There May Be Several Modes
  • Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
No Mode
Mode 9
27
Quartiles
  • Split ordered data into 4 quarters
  • Position of i-th quartile
  • (1st quartile) and (3rd quartile)
    are measures of Noncentral Location
  • are called 25th, 50th, and
    75th percentile respectively. A pth percentile
    is the value of X such that p of the
    measurements are less than X and (100-p) are
    greater than X.

25
25
25
25
28
Quartiles (example)
Data in Ordered Array 3 6 6 12 12 12 15
15 18 21
  • Position of first quartile is
  • Position of third quartile is

29
5-number summary
  • Box-and-Whisker Plot
  • Graphical display of data using 5-numbers

Data in Ordered Array 3 6 6 12 12 12 15
15 18 21
Median( )
X
X
largest
smallest
21
6
3
12
15.75
Write a Comment
User Comments (0)
About PowerShow.com