Title: Introduction to Descriptive Statistics
1Introduction to Descriptive Statistics
2Population vs. Sample Notation
Population Vs Sample
Greeks Romans
?, ?, ? s, b
3Types of Variables
4Describing data
Moment Non-mean based measure
Center Mean Mode, median
Spread Variance (standard deviation) Range, Interquartile range
Skew Skewness --
Peaked Kurtosis --
5Mean
6Variance, Standard Deviation
7Variance, S.D. of a Sample
8Coefficient of variation
9SkewnessSymmetrical distribution
10SkewnessAsymmetrical distribution
11Skewness(Asymmetrical distribution)
- Income
- Contribution to candidates
- Populations of countries
- Residual vote rates
12Skewness
13Skewness
14Kurtosis
15A few words about the normal curve
16More words about the normal curve
17SEG example
The instructor and/or section leader The instructor and/or section leader The instructor and/or section leader The instructor and/or section leader The instructor and/or section leader The instructor and/or section leader
Mean s.d. Skew Kurt Graph
Gives well-prepared, relevant presentations 6.0 0.69 -1.7 8.5
Explains clearly and answers questions well 5.9 0.68 -1.0 4.8
Uses visual aids well 5.6 0.85 -1.8 8.9
Uses information technology effectively 5.5 0.91 -1.1 5.0
Speaks well 6.1 0.69 -1.5 6.8
Encourages questions class participation 6.1 0.66 -0.88 3.7
Stimulates interest in the subject 5.9 0.76 -1.1 4.7
Is available outside of class for questions 5.9 0.68 -1.3 6.3
Overall rating of teaching 5.9 0.67 -1.2 5.5
18Graph some SEG variables
The instructor and/or section leader The instructor and/or section leader The instructor and/or section leader The instructor and/or section leader The instructor and/or section leader The instructor and/or section leader
Mean s.d. Skew Kurt Graph
Uses visual aids well 5.6 0.85 -1.8 8.9
Encourages questions class participation 6.1 0.66 -0.88 3.7
19Binary data
20Commands in STAT for getting univariate statistics
- summarize
- summarize, detail
- graph, bin() normal
- graph, box
- tabulate NB compare to table
21Explore Q9 Overall teaching evaluation
subject q9 n 3.371 6.4375 16 3.982 6.73333 15 3
.14 6.46154 13 14.02D 5.66667 3 21W.803 5.66667 1
2 21M.480 5.69231 13 17.906 5.28571 14 2.51 5.882
35 17
22Graph Q9
23Divide into 7 bins and have them span 1, 1..2,
2..3, 6..7
- . graph q9,bin(7) xscale(0,7)
24Add ticks at each integer score
- . graph q9,bin(7) xscale(0,7) xlabel(0,1,2,3,4,5,6
,7)
25Add a finer grain to the bars
- . graph q9,bin(14) xscale(0,7) xlabel(0,1,2,3,4,5,
6,7)
26Even finer grain
- . graph q9,bin(28) xscale(0,7) xlabel(0,1,2,3,4,5,
6,7)
27Superimpose the normal curve (with the same mean
and s.d. as the empirical distribution)
- . graph q9,bin(28) xscale(0,7) xlabel(0,1,2,3,4,5,
6,7) norm
28Do the previous graph with only larger classes (n
gt 20)
- . graph q9 if ngt20,bin(28) xscale(0,7)
xlabel(0,1,2,3,4,5,6,7)
29Draw the previous graph with a box plot
- . graph q9 if ngt20,box ylabel
30Draw the box plots for small (0..20), medium
(21..50), and large (50) classes
- . gen size 0 if n lt20
- (237 missing values generated)
- . replace size1 if n gt 20 n lt100
- (196 real changes made)
- . replace size 2 if n gt 100
- (41 real changes made)
- . sort size
- . graph q9 ,box ylabel by(size)
- . graph q9 ,box ylabel by(size)
31A note about histograms with unnatural categories
- From the Current Population Survey (2000), Voter
and Registration Survey - How long (have you/has name) lived at this
address? - -9 No Response
- -3 Refused
- -2 Don't know
- -1 Not in universe
- 1 Less than 1 month
- 2 1-6 months
- 3 7-11 months
- 4 1-2 years
- 5 3-4 years
- 6 5 years or longer
32Simple graph
33Solution, Step 1Map artificial category onto
natural midpoint
-9 No Response ? missing -3 Refused ?
missing -2 Don't know ? missing -1 Not in
universe ? missing 1 Less than 1 month ? 1/24
0.042 2 1-6 months ? 3.5/12 0.29 3 7-11
months ? 9/12 0.75 4 1-2 years ? 1.5 5 3-4
years ? 3.5 6 5 years or longer ? 10 (arbitrary)
34Graph of recoded data
35Density plot of data