Title: Bivariate Statistics
1Bivariate Statistics
2Overview of Todays Topic
- Two-Sample Difference of Means Test
- Matched Pairs (Dependent Sample) Tests
- Chi-Square Goodness of Fit Test
- Kolmogorov-Smirnov Test
3Differences Between Two Samples
- Are there significant differences between the two
samples? - If sample differences are significant, then
- Were can infer that the samples were drawn from
truly different populations or vise versa - Extending hypothesis testing
- Statistic (mean)
- Relationship between samples
- Independent
- Dependent
4Two Sample Difference Of Means
- Numerator
- Actual difference between sample means
- Denominator
- Standard error of the difference of the means (a
measure of expected sampling error)
5Pooled Variance/Separate Variance
- If the population variance is equal, use PV
- If the population variances are unknown but
assumed equal, then use modified formula - If population variances are assumed to be
unequal, then, use SV - Sample variances are considered best estimators
of population variances
6Matched Pairs (Dependent Sample)
- One set of observations (units)
- Same location and/or same individuals
- One variable, two time periods
- Two variables, one time period
- Absence of two independent samples
- When two sets of data are collected for one group
of observations,samples are - Dependent
- Matched pairs difference test (is the appropriate
inferential test) - Each unit in the sample has two values (a matched
pair) - Parametric, non parametric
7Wilcoxon Matched-pairs Signed Ranks Test
- Random sample
- Ordinal or downgraded to ordinal
- H0 ranked matched-pair differences are equal
- Test statistic with T rank sum
8Matched Pairs t Test
- Sample are independent of each other
- In this situation, the t-test considers the
difference between the values for each matched
pair - The greater the difference (d), the more
dissimilar the results of the two values within
the matched pair - The mean ( ) is determined for the set of
all matched pairs in the sample
9Wilcoxon Rank Sum W
- Non-parametric difference of means test
- Measures magnitude of the differences in ranked
positions
10Goodness of Fit Tests
- Comparing an actual or observed frequency
distribution to some expected frequency
distribution - Used to test the hypothesis that a a set of data
has a particular frequency distribution - Confirm or deny the relevance or validity of a
particular theory - Verify assumptions about samples
11Chi Squared Distributions
- The total area under a chi-squared curve is equal
to 1 - Chi-squared curve starts at 0 on the horizontal
axis and extends indefinitely to the right,
approaching but not touching the horizontal axis - Chi squared curve is right skewed
- As the number of degrees of freedom become
larger, chi-squared curves look increasingly like
normal curves
12c2 Function
- The above animation shows the shape of the
Chi-square distribution as the degrees of freedom
increase (1, 2, 5, 10, 25 and 50 )
13Goodness of Fit Tests
- Characteristics of the expected frequency
distribution - Uniform or equal
- Proportional or unequal
- Normal (theoretical)
- The chi-squared statistic compares
- Observed frequency counts of a single variable
(organized into nominal or ordinal categories) - An expected distribution of frequency counts
organized in the same categories
14Rules for Using c2 Test
- Samples must be taken at random
- Variables must be organized in nominal or ordinal
categories - Must use absolute frequency counts
- Cannot be applied if the observations or sampling
units are relative frequencies such as
percentages, proportions, or rates - If there are 2 nominal/ordinal categories, then
- both expected frequency counts must be at least
five - If there are 3 or more nominal/ordinal
categories, then - No expected frequency should be less than two
- At the most, only one-fifth of the frequency
counts can be less than five - This may be a reason to combine or reorganize
categories
15Test Statistic
Where, i 1, 2, to k (i.e., the different
categories) O is the observed frequency in a
particular category E is the expected frequency
in that same category k is the total number of
categories
16Interpreting the value of c2
- Null and Alternative hypothesis
- Chi squared value is small, i.e., if the observed
and expected frequencies are similar, then the
goodness of fit is strong, - Do not reject the null hypothesis
- Vice versa
17Kolmogorov-Smirnov
- Goodness of fit test
- Uses data in ordinal categories, or
interval/ratio data downgraded to ordinal
categories - Population is continuously distributed
- Null and Alternative hypothesis
- Cumulative relative frequencies are compared with
cumulative frequencies expected for a normal
distribution - K-S test statistic (D) is the maximum absolute
difference between two sets of cumulative values