Looking at Data - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Looking at Data

Description:

Continuous variables take on many values on a finely-grained scale. ... Number of home runs per season hit by Babe Ruth as a NY Yankee: 54,59,35,41,46, ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 27
Provided by: cdaMr
Category:
Tags: babe | data | looking | ruth

less

Transcript and Presenter's Notes

Title: Looking at Data


1
Looking at Data
  • Distributions

2
Definition
  • A characteristic of a subject that can be
    measured is a variable.

3
Types of Variables
  • Quantitative Takes on numerical values.
  • Continuous variables take on many values on a
    finely-grained scale. Examples are temperature,
    pressure, or speed measurements.
  • Discrete variables take on values that tend to be
    choppy, or integers. Examples are the number of
    children, number of DVDs owned, number of times
    married.
  • Categorical No inherent numerical scale, but
    groups.
  • Male/Female, Yes/No
  • Married/Single/Divorced/Hoping

4
Types of Variables
  • Categorical No inherent numerical scale.
  • Sex Male/Female
  • Opinion Yes/No
  • Race Ethnicity White/Black/Hispanic/Asian/Other
  • Marital Status Married/Single/Divorced/Hoping

5
The Distribution
  • The pattern of variation in a variable is called
    its distribution.
  • The distribution records the numerical values of
    the variable and how often each value occurs.

6
MM Activity
  • Not only does the total number of MMs in every
    bag differ, but the number of MMs per color
    also varies greatly.

7
Distribution Features
  • Center or typical value.
  • Spread or variation.
  • Extreme values or outliers.
  • Distribution shape.
  • Symmetric
  • Skewed
  • Number of peaks (modes).

8
Center
  • An important feature of a distribution is what is
    the typical value in this data list. That is,
    what is the center of the distribution.
  • In graphical displays like a dotplot, the center
    is found by noting the region or values where
    most of the dots are located in the display.
  • What is the typical number of raisins in a box?

9
Spread
  • This refers to the amount and nature of the
    spread or variation in the data values.
  • We can measure the amount of spread by finding
    the difference between the biggest and smallest
    values in the data list. This is called the
    range, RangeMax Min.
  • Can also measure spread by locating the center of
    the distribution and then noting how much above
    and below the center catches the vast majority of
    the data.

10
Extremes
  • The extremes of a data list can often be very
    informative.
  • They can help us identify outliers, values that
    are very different from the rest of the values in
    the list.
  • The gap in this histogram helps us visually
    identify a very strange observation.

11
What accounts for outlier?
Binwidth 2
Binwidth 1
12
Symmetric Distributions
13
Number of Peaks (Modes)
14
Skewness
  • Skewed left data has a distribution with a very
    long tail that extends to the left. Few, very
    small values.
  • Skewed right data has a long tail extending to
    the right. Few, very large values.

15
Stemplot
  • Number of home runs per season hit by Babe Ruth
    as a NY Yankee 54,59,35,41,46,25,47,60,54,46,49,
    46,41,34,22
  • Stemplot Display for these data
  • 2 25
  • 3 45
  • 4 1166679
  • 5 449
  • 6 0

16
Stemplot
  • Number of home runs per season hit by Babe Ruth
    as a NY Yankee 54,59,35,41,46,25,47,60,54,46,49,4
    6,41,34,22

17
Stemplot
  • Home runs for Roger Maris as a NY Yankee 8,
    13,14,16,23,26,28,33,39,61.
  • Stemplot Display
  • 0 8
  • 1 346
  • 2 368
  • 3 39
  • 4 ?------- Gap !
  • 5
  • 6 1 lt- ------- Outlier !

18
Splitting the Stems
  • Percent Adults completed HS (USA States)
  • 6 45667788
  • 7 01224455555666677777899999
  • 8 0000111122234457
  • Poor display, data too concentrated. Need to
    spread out display by splitting the stems. Break
    each stem into leaves 0-4, and another for leaves
    5-9.

19
Splitting the Stems
  • Better display
  • 64
  • 65667788
  • 7012244
  • 755555666677777899999
  • 800001111222344
  • 857
  • New display clearly shows skewed left pattern.
  • MS64, KY65, AR66, AK87, UT85, MN82, GA71,
    WI79, NDSD77.

20
Side by Side Stemplots
  • A way to compare two distributions.
  • Ruth Maris
  • 08
  • 1346
  • 522368
  • 54 339
  • 9766611 4
  • 944 5
  • 0 61

21
Histograms
  • Very useful when have lot of data.
  • Step 1 in construction is divide data into
    groups.
  • Step 2 count number of observations in each
    group.
  • Step 3 draw histogram with bar heights
    proportional to the frequency of observations in
    each group.

22
Histogram Example
  • Ages of diabetic patients 48, 41, 57, 83, 41,
    etc.
  • Arrange values into a frequency table by groups.

23
Histogram
24
Histograms vs Bar Graphs
  • The bars touch in a histogram for quantitative
    data, bars separated for a bar graph of
    categorical data.

25
Histogram
  • 1996 States Dataset, built into StatCrunch
  • Metro is of states residents living in metro
    areas.
  • Produce histogram and observe features.

26
Homework
  • 1.1, 1.3, 1.15, 1.17, 1.18, 1.21, 1.31
Write a Comment
User Comments (0)
About PowerShow.com