Displaying Distributions with Graphs - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Displaying Distributions with Graphs

Description:

Variables: PID, College, Class, Degree, Major - Categorical. How ... histogram in which the right half is a mirror image of the left half. Skewed to the right: ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 51
Provided by: nipis4
Category:

less

Transcript and Presenter's Notes

Title: Displaying Distributions with Graphs


1
  • Displaying Distributions with Graphs

2
Interesting Problems
  • Poker games, lottery,
  • Sports statistics,
  • Political voting, poll, survey,
  • Business, stock market,
  • Census,
  • Marketing,
  • Biological, medical, psychological,
  • Practical for decision making

3
Recall
  • Statistics is the science of data.
  • Collecting
  • Analyzing
  • Decision making
  • Data
  • Individuals
  • Variables
  • Categorical variables
  • Quantitative variables

4
NBA Draft 2003 Top 5 Picks
5
Students in STAT 31
  • Class Roll
  • Variables PID, College, Class, Degree, Major -
    Categorical
  • How many categories? How many students in each
    category?
  • Equivalently, what is the distribution for each
    variable?

6
Distributions of Variables
  • The distribution of a variable indicates what
    values a variable takes and how often it takes
    these values.
  • For a categorical variable, distribution
  • For a quantitative variable, distribution

7
Variable Class
8
Exploratory Data Analysis (EDA)
  • Use statistical tools and ideas to help us
    examine data
  • Goal to describe the main features of the data
  • NEVER skip this
  • EDA
  • Displaying distributions with
  • Displaying distributions with

9
Basic Strategies for EDA
  • Graphical visualizations
  • Numerical summaries
  • One variable at a time
  • Relationships among the variables

10
Graphic Techniques for Categorical Variables
  • Bar Graph uses bars to represent the frequencies
    (or relative frequencies) such that the height of
    each bar equals the frequency or relative
    frequency of each category.
  • Frequencies counts
  • Relative frequencies percent
  • Bar Graph height indicates count or percent

11
Graphic Techniques for Categorical Variables
12
Graphic Techniques for Quantitative Variables
  • Stemplot (Stem-and-Leaf Plot)
  • Histogram
  • Time plot

13
Example Midterm Scores
  • The following data set contains the midterm
  • exam scores
  • 74 76 78 88 87 87 53 95 82 79 79 78
  • 62 80 77 70 60 60 84 95 85 93 79 84
  • 71 85 100 77 72 95 79 83 97 87 73 84
  • 74 83 85 95 62 50 86 83 86 36
  • Type of variable?

14
Stemplot
  • Separate each observation into a stem consisting
    of all but the final (rightmost) digit and a
    leaf, the final digit. Stems may have as many
    digits as needed, but each leaf contains only a
    single digit.
  • Write the stems in a vertical column with the
    smallest at the top, and draw a vertical line at
    the right of this column.
  • Write each leaf in the row to the right of its
    stem, in increasing order out from the stem.

15
Example Midterm Scores of STAT 101
  • The following data set contains the midterm
  • exam scores
  • 74 76 78 88 87 87 53 95 82 79 79 78
  • 62 80 77 70 60 60 84 95 85 93 79 84
  • 71 85 100 77 72 95 79 83 97 87 73 84
  • 74 83 85 95 62 50 86 83 86 36

16
Example Midterm Scores of STAT 101
  • A stem-and-leaf display is follows
  • 3 6
  • 4
  • 5 03 Leaf last digit
  • 6 0022 Stem remaining digit(s)
  • 7 012344677889999
  • 8 02333444555667778
  • 9 355557
  • 10 0

17
Back-to-back Stemplot
  • Babe Ruth (New York Yankees) 1920-1934
  • 54 59 35 41 46 25 47 60 54 46 49 46 41 34 22
  • Mark McGwire (St. Louis Cardinals) 19862001
  • 3 49 32 33 39 22 42 9 9 39 52 58 70 65 32 29

18
Back-to-back Stemplot
  • Ruth McGwire
  • 0 3 9 9
  • 1
  • 5 2 2 2 9
  • 5 4 3 2 2 3 9 9
  • 9 7 6 6 6 1 1 4 2 9
  • 9 4 4 5 2 8
  • 0 6 5
  • 7 0

19
Splitting stems rounding
  • For a moderate number of obs,
  • Split each stem into two one with leaves 0-4 and
    the other with leaves 5-9
  • Increase of stems, reduce of leaves
  • Rounding
  • If many stems have no leaves or only one leaf,
    rounding may help.

20
Spending at a supermarket
21
Example A study on litter size
  • Data (170 observations)
  • 4 6 5 6 7 3 6 4 4 6 4 4 9 5 10 6
    6 5 6 8 2 7 7 7 9 3 7 5 7 7 4 5
    5 6 7 6 7 8 6 6 7 6 6 7 5 4 5 6 6
    1 3 4 7 5 4 7 5 8 8 5 6 8 5 5 4
    9 6 7 3 7 7 5 4 6 9 6 7 7 5 7 3 7
    6 5 3 7 10 5 6 8 7 5 5 7 5 5 8 9
    7 5 7 5 5 5 6 3 7 8 7 7 6 3 4 4 4
    7 2 7 8 5 8 6 6 5 6 4 7 5 5 6 9
    3 5 4 8 3 9 8 3 6 5 4 7 8 4 8 6 8
    5 6 4 3 8 8 6 9 5 5 6 6 7 6 8 6
    11 6 5 6 6 3

22
Stem-and-leaf plot for pups
  • 0122333333333333344 (35)
  • 0555555555555555555555555... (132)
  • 1 001

23
Limitations of Stemplot
  • Awkward for large data sets
  • Splitting stem/rounding is not very helpful.

24
Histogram
  • breaks the range of the values of a quantitative
    variable into intervals and displays only the
    count or percent of the observations that fall
    into each interval.
  • You can choose any convenient number of
    intervals.
  • Intervals must be of equal width.

25
Example A study on litter size
26
Example Call Center Data
  • Financial firm call center
  • Calls handled by AVI within 60 seconds
  • October 666
  • December 523
  • Avi Service Time Data

27
October
28
December
29
Notes for Making Histogram
  • Choose the number of classes sensibly
  • Too few classes skyscraper graph
  • Too many pancake graph
  • Sturges rule
  • Choose number of classes k such that
  • where n is the sample size
  • Intervals must be of equal width.
  • Areas of the bars are proportional to the
    frequency.

30
Examining Distributions
  • Overall Pattern
  • Shape
  • Center (numerical, Lecture 3)
  • midpoint
  • Spread (numerical, Lecture 3)
  • range
  • Deviations
  • Outliers some values that fall outside the
    overall pattern.

31
Shapes of Distributions
  • Graphs can help to determine shapes.
  • Modes peaks of a distribution.
  • Unimodal one peak
  • Bimodal two peaks
  • Symmetric or skewed?

32
Shakespeares Words
33
A unimodal histogram
34
Tuition and fees
35
A bimodal histogram
36
Shakespeares Words
37
Skewness
Left skewed
Right skewed
38
Iowa Test of Basic Skills vocabulary scores
39
A study on litter size
40
Bell-shaped Histograms
41
Summary Shapes of Distributions
  • Symmetric
  • histogram in which the right half is a mirror
    image of the left half.
  • Skewed to the right
  • histogram in which the right tail is more
    stretched out than the left.(long tail to the
    right)
  • Skewed to the left
  • histogram the left tail is more stretched out
    than the right.(long tail to the left)
  • Number of modal classes
  • the number of distinct peaks in a histogram
  • Bell-shaped
  • A histogram looks like a bell.

42
Time plots
  • A time plot of a variable plots each obs against
    the time at which it was measured.
  • Time x-axis
  • Variable y-axis
  • Examples stock price, unemployment rate, daily
    temperature
  • Great for identifying changing patterns related
    to time.
  • What to look for
  • .
  • .
  • .

43
Example Number of Suicides in USA (1900-1970)
44
Call Center Daily Call Volume in Sep. 2002
45
Call Center Monthly Call Volume in 2002
46
Outliers
  • Observations that lie outside the overall pattern
    of a distribution.
  • Possible reasons
  • error in data entry (most likely reason)
  • extraordinary individuals (Jordans salary)

47
Handling Outliers
  • Detect it using graphical and numerical methods.
  • Check the data to make sure correct entry.
  • Reducing influence of outlier
  • delete the observation (BE CAREFUL!)
  • Use transformations, robust methods.

48
Speed of Light (Histogram)
49
Speed of Light (Time plot)
50
Remember
  • Distribution of variables
  • Examine distributions
  • Overall pattern
  • Shape
  • Symmetric or skewed
  • How many modes?
  • Bell-shaped
  • Outliers
  • Graphical tools for categorical data
  • Bar graph
  • Pie chart
  • Graphical tools for quantitative data
  • Stemplot
  • Histograms
  • Time plots
Write a Comment
User Comments (0)
About PowerShow.com