Methods for Describing Sets of Data - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Methods for Describing Sets of Data

Description:

2003 Pearson Prentice Hall, 2004 Paul Resnick. Learning Objectives ... Cog Sci | 1 4.76 9.52. Comp Sci | 3 14.29 23.81. Economics | 3 14.29 38.10 ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 67
Provided by: johnj178
Category:
Tags: cog | data | describing | methods | sets

less

Transcript and Presenter's Notes

Title: Methods for Describing Sets of Data


1
Chapter 2
  • Methods for Describing Sets of Data

2
Review
  • Descriptive vs. Inferential Statistics
  • Vocabulary
  • Population
  • (Random, representative) sample
  • Parameter
  • Statistic
  • Data types
  • Data sources

3
Learning Objectives
  • 1. Describe Qualitative Data Graphically
  • 2. Describe Numerical Data Graphically
  • 3. Create Interpret Graphical Displays
  • 4. Explain Numerical Data Properties
  • 5. Describe Summary Measures
  • 6. Analyze Numerical Data Using Summary Measures

4
Data Presentation
5
Presenting Qualitative Data
6
Data Presentation
7
Student Specializations
  • Specializat
  • ion Freq. Percent Cum.
  • -----------------------------------------------
  • HCI 9 39.13 39.13
  • IEMP 9 39.13 78.26
  • LIS 3 13.04 91.30
  • Undecided 2 8.70 100.00
  • -----------------------------------------------
  • Total 23 100.00

8
Student Specializations
9
Undergrad Majors
  • UG major Freq.
    Percent Cum.
  • -------------------------------------------------
    ------------
  • American Studies 1
    4.76 4.76
  • Cog Sci 1
    4.76 9.52
  • Comp Sci 3
    14.29 23.81
  • Economics 3
    14.29 38.10
  • English 5
    23.81 61.90
  • Environmental Engineering 1
    4.76 66.67
  • Graphic Design 1
    4.76 71.43
  • Math 2
    9.52 80.95
  • Mechanical Engineering 1
    4.76 85.71
  • Nutrition 1
    4.76 90.48
  • Sci and Tech Policy 1
    4.76 95.24
  • Telecommunications 1
    4.76 100.00
  • -------------------------------------------------
    ------------
  • Total 21
    100.00

10
Favorite Colors
  • color Freq. Percent Cum.
  • -----------------------------------------------
  • black 2 8.70 8.70
  • blue 12 52.17 60.87
  • green 1 4.35 65.22
  • orange 1 4.35 69.57
  • purple 1 4.35 73.91
  • red 5 21.74 95.65
  • white 1 4.35 100.00
  • -----------------------------------------------
  • Total 23 100.00

11
Calculus Knowledge
  • integrals Freq. Percent Cum.
  • -----------------------------------------------
  • 1 3 13.04 13.04
  • 2 1 4.35 17.39
  • 3 11 47.83 65.22
  • 4 6 26.09 91.30
  • 5 2 8.70 100.00
  • -----------------------------------------------
  • Total 23 100.00

12
Exercises
  • 2.1
  • 2.2
  • 2.9 which chart type is best for CEO degree
    categories?

13
Presenting Numerical Data
14
Data Presentation
15
Student Age (Reported) Data
  • Stem-and-leaf plot for age
  • 2 22233444555777899
  • 3 01257
  • 4
  • 5
  • 6
  • 7 6

16
Histogram
17
Starting Salaries (in K)
  • 3 8
  • 4 000025
  • 5 0000
  • 6 0000005
  • 7 5
  • 8 0

18
Summation Notation
  • Exercise 2.33
  • Observations 5, 1, 3, 2, 1

19
Numerical Data Properties
20
Thinking Challenge
400,000
70,000
50,000
... employees cite low pay -- most workers earn
only 20,000. ... President claims average pay is
70,000!
30,000
20,000
21
Standard Notation
Measure
Sample
Population
Mean
?
?
x
Stand. Dev.
s
?
2
2
Variance

s
?
Size
n
N
22
Numerical Data Properties
Central Tendency (Location)
Variation (Dispersion)
Shape
23
Numerical DataProperties Measures
Numerical Data
Properties
Central
Variation
Shape
Tendency
Mean
Range
Skew
Interquartile Range
Median
Mode
Variance
Standard Deviation
24
Central Tendency
25
Numerical DataProperties Measures
Numerical Data
Properties
Central
Variation
Shape
Tendency
Mean
Range
Skew
Interquartile Range
Median
Mode
Variance
Standard Deviation
26
Whats wrong with this?
  • Measurements 1 4 2 9 8
  • Middle measurement is 2, so thats the median

?
X
i
X
X
X
?
?
?
?
n
1
2
i
?
1
X
?
?
n
n
27
Exercise 2.39
  • Why special rule for median with even vs. odd
    number of measurements?

28
Exercise 2.37
  • 18, 10, 15, 13, 17, 15, 12, 15, 18, 16, 11

29
What if?
  • Replace one of the 18s with 1,118?

30
Exercise 2.41a
31
Exercise 2.53
32
Ages
  • Mean 29
  • Median 27
  • 2 22233444555777899
  • 3 01257
  • 4
  • 5
  • 6
  • 7 6

33
Summary of Central Tendency Measures
Measure
Equation
Description
Mean
Balance Point
??
X
/
n
i

Median
(
n
1)
Position
Middle Value
2
When Ordered
Mode
none
Most Frequent
34
Shape
35
Numerical DataProperties Measures
Numerical Data
Properties
Central
Variation
Shape
Tendency
Mean
Range
Skew
Median
Interquartile Range
Mode
Variance
Standard Deviation
36
Shape
  • 1. Describes How Data Are Distributed
  • 2. Measures of Shape
  • Skew Symmetry

Right-Skewed
Left-Skewed
Symmetric
Mean

Median

Mode
Mean


Median


Mode
Mode

Median

Mean
37
Exercise 2.47
  • Asked to submit 3 letters
  • Observed mean 2.28, median3, mode3
  • Interpret

38
Exercise 2.50 a-d
39
Variation
40
Numerical DataProperties Measures
Numerical Data
Properties
Central
Variation
Shape
Tendency
Range
Mean
Skew
Interquartile Range
Median
Mode
Variance
Standard Deviation
41
Quartiles
  • 1. Measure of Noncentral Tendency
  • 2. Split Ordered Data into 4 Quarters
  • 3. Position of i-th Quartile

25
25
25
25
Q1
Q2
Q3
a
f
i
n
?
?
1
Positionin
g Point of

Q
?
i
4
42
Ages
  • Range
  • Quartiles
  • 2 22233444555777899
  • 3 01257
  • 4
  • 5
  • 6
  • 7 6

43
Box Plots
44
Age and Salary
  • Quartiles 24, 27, 30
  • Inner fences (15,39)
  • Outer fences (6, 48)
  • Quartiles 41K, 50K, 60K
  • Inner fences ??
  • Outer fences ??

45
Variance Standard Deviation
  • 1. Measures of Dispersion
  • 2. Most Common Measures
  • 3. Consider How Data Are Distributed
  • 4. Show Variation About Mean (?X or ?)

?
X
8.3
4
6
8
10
12
46
Sample Variance Formula
c
h
n
2
?
n - 1 in denominator! (Use N if Population
Variance)
X
X
?
i
2
i
1
?
S
?
n
1
?
c
h
c
h
c
h
2
2
2
X
X
X
X
X
X
?
?
?
?
?
?
?
n
1
2
?
n
1
?
47
Equivalent Formula
48
Another Equivalent Formula
49
Deriving the shortcut (p.57)
50
Exercise 2.54
51
Exercise 2.55a
52
Exercise 2.59 Same mean, different variances
53
Exercise 2.60 Same range, different means
54
Exercise 2.61 (simplified) adding a constant
  • 2, 1, 1, 0, 6
  • Mean 10/52
  • Variance 0 1 1 16 18
  • Add 3 to each measurement
  • Mean 25/5 5
  • Variance ??
  • Why doesnt adding a constant affect variance?

55
Exercise 2.65
56
Chebyshevs Rule Preliminaries
  • Lemma For any positive variable Y, and any
    constant a,
  • Proof of Lemma
  • For values of Ygta, define Z a
  • For values of Ylta, define Z 0
  • Clearly mean of Y is bigger than mean of Z
  • But mean of Z is just

57
Chebyshevs Rule
  • Claim
  • Proof Let

(From lemma)
58
Empirical Rule
  • If x has a symmetric, mound-shaped distribution
  • Justification Known properties of the normal
    distribution, to be studied later in the course

59
Example
  • Data set has nine 0 values, and one 100
  • Mean 10, Range 100
  • s2 (910018100)/91000, s 31.62
  • 10 are at a distance gt 3s
  • Chebyshevs rule applies 10 lt 1/9 11.1
  • Empirical rule severely violated 10 gt 0.3

60
Preview of Statistical Inference
  • You observe one data point
  • Make hypothesis about mean and standard deviation
    from which it was drawn
  • Chebyshevs Rule or Empirical Rule tells you how
    (un)likely the data point is
  • If very unlikely, you are suspicious of the
    hypothesis about mean and standard deviation, and
    reject it

61
Exercise 2.67
  • N200, mean 1500, s 300
  • How many measurements in (900,2100)
  • How many measurements in (600, 2400)
  • How many measurements in (1200, 1800)
  • How many measurements in (1500, 2100)

62
Summary of Variation Measures
Measure
Equation
Description
X
-
X
Total Spread
Range
largest
smallest
Q
-
Q
Spread of Middle 50
Interquartile Range
3
1
Dispersion about
Standard Deviation
Sample Mean
(Sample)
Standard Deviation
Dispersion about
Population Mean
(Population)
Variance
2
Squared Dispersion
?
(
X
-
?
X
)
i
about Sample Mean
(Sample)
n
- 1
63
Z-scores
  • Number of standard deviations from the mean
  • Chebyshev and empirical rules apply

64
Exercise 2.93c
65
Conclusion
  • 1. Described Qualitative Data Graphically
  • 2. Described Numerical Data Graphically
  • 3. Created Interpreted Graphical Displays
  • 4. Explained Numerical Data Properties
  • 5. Described Summary Measures
  • 6. Analyzed Numerical Data Using Summary Measures

66
End of Chapter
Any blank slides that follow are blank
intentionally.
Write a Comment
User Comments (0)
About PowerShow.com