Chapter 2 online slides - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Chapter 2 online slides

Description:

Chapter 2 online s – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 48
Provided by: Robert1889
Category:

less

Transcript and Presenter's Notes

Title: Chapter 2 online slides


1
Chapter 2 online slides
2
Chapter 2
  • Frequency Distributions, Stem-and-leaf displays,
    and Histograms

3
Where have we been?
4
To calculate SS, the variance, and the standard
deviation find the deviations from ?, square and
sum them (SS), divide by N (?2) and take a square
root(?).
Example Scores on a Psychology quiz
Student John Jennifer Arthur Patrick Marie
X 7 8 3 5 7
X - ? 1.00 2.00 -3.00 -1.00 1.00
(X - ?)2 1.00 4.00 9.00 1.00 1.00
?2 SS/N 3.20
5
Ways of showing how scores are distributed around
the mean
  • Frequency Distributions,
  • Stem-and-leaf displays
  • Histograms

6
Some definitions
  • Frequency Distribution - a tabular display of the
    way scores are distributed across all the
    possible values of a variable
  • Absolute Frequency Distribution - displays the
    count (how many there are) of each score.
  • Cumulative Frequency Distribution - displays the
    total number of scores at and below each score.
  • Relative Frequency Distribution - displays the
    proportion of each score.
  • Relative Cumulative Frequency Distribution -
    displays the proportion of scores at and below
    each score.

7
Example Data
  • Traffic accidents by bus drivers
  • Studied 708 bus drivers, all of whom had worked
    for the company for the past 5 years or more.
  • Recorded all accidents for the last 4 years.
  • Data looks like3, 0, 6, 0, 0, 2, 1, 4, 1, 6,
    0, 2

8
Frequency Distributions
  • of
  • accidents
  • 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

Calculate relative frequency.
Divide each absolute frequency by the N.
For example, 117/708 .165
9
What pops out of such a display
  • 18 drivers (about 2.5 of the drivers) had 7 or
    more accidents during the 4 years just before the
    study.
  • Those 18 drivers caused 147 of the 708 accidents
    or slightly over 20 (20.76) of the accidents.
  • Maybe they should be given eye/reflex exams?
  • Maybe they should be given desk jobs?

10
Frequency Distributions
  • of
  • accidents
  • 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

Calculate relative frequency.
Divide each absolute frequency by the N.
For example, 117/708 .165
11
What can you answer?
  • of
  • accidents
  • 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

Percent with at most 1 accident?
.165 .222 .387
.387 100 38.7
Proportion with 8 or more accidents?
.008 .001 .004 .001 .014
Percent with between 4 and 7 accidents?
.110 .062 .030 .010 .212 21.2
12
Cumulative Frequencies
Cumulative Relative Frequency .165 .387 .610 .773
.883 .945 .975 .983 .993 .994 .999 1.000
  • of acdnts
  • 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

Cumulative Frequency 117 274 432 547 625 669 690 6
97 703 704 707 708
Cumulative frequencies show number of scores at
or below each point.
Calculate by adding all scores below each point.
Cumulative relative frequencies show the
proportion of scores at or below each point.
Calculate by dividing cumulative frequencies by N
at each point.
13
Grouped Frequencies
  • Needed when
  • number of values is large OR
  • values are continuous.
  • To calculate group intervals
  • First find the range.
  • Determine a good interval based on
  • on number of resulting intervals,
  • meaning of data, and
  • common, regular numbers.
  • List intervals from largest to smallest.

14
Grouped Frequency Example
100 High school students average time in seconds
to read ambiguous sentences. Values range
between 2.50 seconds and 2.99 seconds.
15
Determining i (the size of the interval)
  • WHAT IS THE RULE FOR DETERMINING THE SIZE OF
    INTERVALS TO USE IN WHICH TO GROUP DATA?
  • Whatever intervals seems appropriate to most
    informatively present the data. It is a matter of
    judgement. Usually we use 6 12 same size
    intervals each of which use intuitively obvious
    endpoints (e.g., 5s and 0s).

16
Grouped Frequencies
Range 2.99 - 2.50 .49 .50
i .1 i 5
i .05 i 10
Reading Time 2.95-2.99 2.90-2.94 2.85-2.89 2.80-2.
84 2.75-2.79 2.70-2.74 2.65-2.69 2.60-2.64 2.55-2.
59 2.50-2.54
Frequency 9 7 20 11 10 10 4 8 10 11
Reading Time 2.90-2.99 2.80-2.89 2.70-2.79 2.60-2.
69 2.50-2.59
Frequency 16 31 20 12 21
17
Either is acceptable.
  • Use whichever display seems most informative.
  • In this case, the smaller intervals and 10
    category table seems more informative.
  • Sometimes it goes the other way and less detailed
    presentation is necessary to prevent the reader
    from missing the forest for the trees.

18
How you organize the data is up to you.
  • When engaged in this kind of thing, there is
    often more that one way to organize the data.
  • You should organize the data so that people can
    easily understand what is going on.
  • Thus, the point is to use the grouped frequency
    distribution to provide a simplified description
    of the data.

19
Stem and Leaf Displays
  • Used when seeing all of the values is important.
  • Shows
  • data grouped
  • all values
  • visual summary

20
Stem and Leaf Display
  • Reading time data

Reading Time 2.9 2.9 2.8 2.8 2.7 2.7 2.6 2.6 2.5 2
.5
Leaves 5,5,6,6,6,6,8,8,9 0,0,1,2,3,3,3 5,5,5,5,5,
6,6,6,7,7,7,7,7,7,7,8,9,9,9,9 0,0,1,2,3,3,3,3,4,4,
4 5,5,5,5,6,6,6,8,9,9 0,0,0,1,2,3,3,3,4,4 5,6,6,6
0,1,1,1,2,3,3,4 6,6,8,8,8,8,8,9,9,9 0,1,1,1,2,2,2,
4,4,4,4
i .05 i 10
21
Stem and Leaf Display
  • Reading time data

Reading Time 2.9 2.8 2.7 2.6 2.5
Leaves 0,0,1,2,3,3,3,5,5,6,6,6,6,8,8,9 0,0,1,2,3,
3,3,3,4,4,4,5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,
9 0,0,0,1,2,3,3,3,4,4,5,5,5,5,6,6,6,8,9,9 0,1,1,1,
2,3,3,4,5,6,6,6 0,1,1,1,2,2,2,4,4,4,4,6,6,8,8,8,8,
8,9,9,9
i .1 i 5
22
Figural displays of frequency data
23
Bar graphs
  • Bar graphs are used to show frequency of scores
    when you have a discrete variable.
  • Discrete data can only take on a limited number
    of values.
  • Numbers between adjoining values of a discrete
    variable are impossible or meaningless.
  • Bar graphs show the frequency of specific scores
    or ranges of scores of a discrete variable.
  • The proportion of the total area of the figure
    taken by a specific bar equals the proportion of
    that kind of score.
  • Note, in this context proportion and relative
    frequency are synonymous.

24
The results of rolling a six-sided die 120 times
100 75 50 25 0
120 rolls and it came out 20 ones, 20 twos,
etc..
1 2 3 4 5
6
25
Bar graphs and Histograms
  • Use bar graphs, not histograms, for discrete
    data. (The bars dont touch in a bar graph, they
    do in a histogram.)
  • You rarely see data that is really discrete.
  • Discrete data are almost always categories or
    rankings.ANYTHING ELSE IS ALMOST CERTAINLY A
    CONTINUOUS VARIABLE.
  • Use histograms for continuous variables.
  • AGAIN, almost every score you will obtain
    reflects the measurement of a continuous variable.

26
A stem and leaf display turned on its side shows
the transition to purely figural displays of a
continuous variable
9 9 9 9 7 7 7 7 7 7 7 6 6 6 5 5 5 5
4 4 4 3 3 3 3 2 1 0 0
4 4 4 4 2 2 2 1 1 1 0
9 9 8 6 6 6 5 5 5 5
4 4 3 3 3 2 1 0 0 0
9 9 9 8 8 8 8 8 6 6
9 8 8 6 6 6 6 5 5
4 3 3 2 1 1 1 0
3 3 3 2 1 0 0
6 6 6 5
27
Histogram of reading times notice how the bars
touch at the real limits of each class!
20 18 16 14 12 10 8 6 4 2 0
F r e q u e n c y
Reading Time (seconds)
28
Histogram concepts - 1
  • Histograms must be used to display continuous
    data.
  • Most scores obtained by psychologists are
    continuous, even if the scores are integers.
  • WHAT COUNTS IS WHAT YOU ARE MEASURING, NOT THE
    PRECISION OF MEASUREMENT.
  • INTEGER SCORES IN PSYCHOLOGY ARE USUALLY ROUGH
    MEASUREMENTS OF CONTINUOUS VARIABLES.

29
An Example
  • For example, while scores on a ten question
    multiple choice intro psych quiz ( 1, 2, 10) are
    integers, you are measuring knowledge, which is a
    continuous variable that could be measured with
    10,000 questions, each counting .001 points. Or
    1,000,000 questions each worth .00001 points.
  • You measure at a specific level of precision,
    because thats all you need or can afford.
    Logistics, not the nature of the variable,
    constrains the measurement of a continuous
    variable.

30
Histogram concepts - 2
  • If you have continuous data, you can use
    histograms, but remember real class limits.
  • Histograms can be used for relative frequencies
    as well.
  • Histograms can be used to describe theoretical
    distributions as well as actual distributions.

31
What are the real limits of the fifth class? The
highest class?
20 18 16 14 12 10 8 6 4 2 0
F r e q u e n c y
Real limits of the fifth class are ???? - ????
Real limits of the highest class are ???? - ????.

32
Real limits of the fifth class are 2.695-2.745
Real limits of the highest class are 2.945 -
2.995
20 18 16 14 12 10 8 6 4 2 0
F r e q u e n c y
33
Displaying theoretical distributions is the most
important function of histograms.
  • Theoretical distributions show how scores can be
    expected to be distributed around the mean.

34
TYPES OF THEORETICAL DISTRIBUTIONS
  • Distributions are named after the shapes of their
    histograms. For psychologists, the most important
    are
  • Rectangular
  • J-shaped
  • Bell (Normal)
  • t distributions - Close to Bell shaped, but a
    little flatter

35
Rectangular Distribution of scores
36
The rectangular distribution is the know
nothing distribution
  • Our best prediction is that everyone will score
    at the mean.
  • But in a rectangular distribution, scores far
    from the mean occur as often as do scores close
    to the mean.
  • So the mean tells us nothing about where the next
    score will fall (or how the next person will
    behave).
  • We know nothing in that case.

37
Flipping a coin Rectangular distributions are
frequently seen in games of chance, but rarely
elsewhere.
100 75 50 25 0
100 flips - how many heads and tails do you
expect?
Heads Tails
38
Rolling a die
100 75 50 25 0
120 rolls - how many of each number do you expect?
1 2 3 4 5
6
39
What happens when you sample two scores at a time?
  • All of a sudden things change.
  • The distribution of scores begins to resemble a
    normal curve!!!!
  • The normal curve is the we know something
    distribution, because most scores are close to
    the mean.

40
Rolling 2 dice
Dice Total 1 2 3 4 5 6 7 8 9 10 11 12
Look at the histogram to see how this resembles a
bell shaped curve.
41
Rolling 2 dice
100 90 80 70 60 50 40 30 20 10 0
360 rolls
1 2 3 4 5 6 7 8 9 10
11 12
42
Normal Curve
43
J Curve
Occurs when socially normative behaviors are
measured.
Most people follow the norm, but there are
always a few outliers.
44
What does the J shaped distribution represent?
  • The J shaped distribution represents situations
    in which most everyone does about the same thing.
    These are unusual social situations with very
    clear contingencies.
  • For example, how long do cars without handicapped
    plates park in a handicapped spot when there is a
    cop standing next to the spot.
  • Answer Zero minutes!
  • So, the J shaped distribution is the we know
    almost everything distribution, because we can
    predict how a large majority of people will
    behave.

45
When do you get a J shaped distribution?
46
When do you get a J shaped distribution?
Occurs when socially normative behaviors are
measured.
Most people follow the norm, but there are
always a few outliers.
47
Principles of Theoretical Curves
  • Expected frequency Theoretical relative
    frequency X N
  • Expected frequencies are your best estimates
    because they are closer, on the average, than any
    other estimate when we square the difference
    between observed and predicted frequencies.
  • Law of Large Numbers - The more observations that
    we have, the closer the relative frequencies we
    actually observe should come to the theoretical
    relative frequency distribution.
Write a Comment
User Comments (0)
About PowerShow.com