Title: Univariate Statistics
1Univariate Statistics
- Analysis of a single variable
- Two general varieties
- Descriptive Statistics Describe Variables
(where data are any collection of observations,
sample/population) - Inferential Statistics Make inferences about the
population based on characteristics of sample data
2List of Variable Values
3Frequency Distribution
- A summary of the observations for a variable
- Includes a list of the values of the variable and
the frequency of observations for each value
4Example Interval/Ratio
- Freq. distribution of midterm grades
5Example Interval/Ratio
6Example Interval/Ratio
Freq. / Total
7Example Interval/Ratio
Freq. / Total100
8Example - Nominal
- Freq. distribution of active hate group
organizations in 1999
9Example - Nominal
10Summarizing Data in Graphs
- Pie charts, Bar charts appropriate for nominal
variables and ordinal variables (small number of
categories)
11Example Bar Chart
12Summarizing Data in Graphs
- Histograms appropriate for all interval/ratio
variables with a large number of possible values
data are collapsed into intervals, and axis
labels represent interval boundaries or interval
midpoints
13Histogram of County Unemployment Rates in Fla
14Measures of Central Tendency
- Mean
- _
- Y ? Yi / N
- Appropriate for interval/ratio variables ONLY
15Measures of Central Tendency
- Median Defined as the value of the variable in
the middle of the distribution. - Odd of obs 2 2 5 9 11 median5
- Even of obs 2 2 5 9 11 15
- median(59)/2 7
- Appropriate for ordinal, interval and ratio
16Measures of Central Tendency
- Mode Defined as the value that occurs most
often. - 2 2 5 9 11 15
- Mode2
- Appropriate for all levels of measurement
17Measures of Dispersion
- 1. Range Ymax - Ymin
- Weakness?
- 2. Percentiles - For variable Y, the pth
percentile represents the value of Y below which
p of the observations fall. - 50th percentile median
- IQR Y75pct - Y25pct
18Measures of Dispersion (contd)
- More complex measures Based on mean
deviations _ Yi Y
-
_ - Average Mean Deviation(?) S (Yi Y) / N
-
_ - Mean Absolute Deviation S Yi Y / N
- could use as measure of variation
-
_ - Mean Squared Deviation S(Yi Y)2 / N
19Variance (sample)
- _
- s2Y S (Yi - Y)2 / (N-1)
- Standard Deviation
-
- sY vs2Y
- Numerator Sum of Squares
- Denominator degrees of freedom
20The Normal Distribution
- Symmetric
- Bell-shaped
- MeanMedianMode
21The Normal Distribution
22Deviations from the normal distribution
- Bimodal distributions
- Skewed distributions
- Left skew vs. right skew
- Mean is pulled in direction of skew
23Histogram of County Unemployment Rates in Fla
24Descriptive Statistics for County Unemployment
Rates in Fla
- . sum unemp, detail
- unemp
- --------------------------------------------------
----------- - Percentiles Smallest
- 1 2 1.7
- 5 2.4 1.7
- 10 2.7 1.7 Obs
3149 - 25 3.4 1.7 Sum of Wgt.
3149 - 50 4.4 Mean
4.809908 - Largest Std. Dev.
2.129031 - 75 5.5 19.5
- 90 7.2 19.5 Variance
4.532774 - 95 8.6 19.6 Skewness
2.30285 - 99 13 19.7 Kurtosis
12.11621
25Sampling Distribution (sample means)
- Population
- Draw Random Sample of Size N
- Calculate sample mean
- Repeat until all possible random samples are
exhausted - The resulting collecting of sample means is the
sampling distribution of sample means
26Sampling Distribution of Sample Means
- A frequency distribution of all possible sample
means for a given sample size (N) - The mean of the sampling distribution will be
equal to the population mean.
27Sampling Distribution of Sample Means
- When N is reasonably large (gt30), the sampling
distribution will be normally distributed - The standard error of the sampling distribution
can be reliably estimated as (where sY sample
standard deviation for Y and N sample size). -
- sY /vN
-
28Standard Error
- How the sample means vary from sample to sample
(i.e. within the sampling distribution) is
expressed statistically by the value of the
standard deviation (i.e. standard error) of the
sampling distribution. - (Standard deviation the average distance of
each observation from the mean)
29Using the Standard Error to Calculate a 95
Confidence Interval
- Calculate the mean of Y
- Calculate the standard deviation of Y
- Calculate the standard error of Y
- Calculate a 95 confidence interval for the
population mean of Y - _
- 95 CI Y 1.96(standard error)
30Example
- Hillary Clinton Feeling Thermometer (NES 2004)
31Example
- Hillary Clinton Feeling Thermometer (NES 2004)
- Mean 64.137, s.d. 88.408, N 1212
32Example
- Hillary Clinton Feeling Thermometer (NES 2004)
- Mean 64.137, s.d. 88.408, N 1212
- Standard Error 88.408 / v1212 2.539
33Example
- Hillary Clinton Feeling Thermometer (NES 2004)
- Mean 64.137, s.d. 88.408, N 1212
- Standard Error 88.408 / v1212 2.539
- 95 CI 64.137 1.96 2.539
- 59.158, 69.116