Title: Exploring Group Differences
1Exploring Group Differences
2Before Break
- 1) Descriptive Statistics
- Measures of central tendency
- Measures of variability
- Z-scores
- 2) Understanding statistical significance
- Hypothesis testing
- Alpha and p-values
- 3) Testing for relationships/associations between
variables - Correlation (Pearsons r)
- Simple regression
- Multiple regression
3After Break
- 1) Testing for Group differences
- T-tests
- ANOVA
- 2) Understanding statistical significance
- Effect Size
- Power
- 3) Nonparametric statistics and other common
tests - Chi-square
- Logistic regression
If you have a firm grip on pre-break material
the second half of the course becomes much easier
(in my opinion)
4Quick review
- I have a dataset that contains information on
fitness and academic performance in middle-school
children - I want to know if fitness is related to academic
success - Im going to use PACER laps to quantify fitness
and ISAT science scores to quantify academic
success - I can answer this question in various ways
lets start with measures of association
(correlations) - What would be my null and alternative hypothesis
using a correlation?
5Results
- What is the relationship between aerobic fitness
and science ISAT results? - p 0.009, what does this mean?
- Low chance of random sampling error
- We would only see a correlation this strong, or
stronger, 9 times out of 1000 due to random
sampling error (due to chance)
6Association
- Association (and prediction) statistics like
correlation and regression are useful, but can be
limited - The other half of statistical testing is
centered around determining group differences - For example, we could ask our fitness/academics
question a different way and use a different set
of statistics - Also useful in experiments (treatment vs
control), comparing genders (males vs females),
etc
7Example
- Imagine I use PACER laps to split kids into two
different groups - High Fitness (high number of laps)
- Low Fitness (low number of laps)
- NOTE I took a continuous variable and made it
into a categorical variable (nominal/ordinal) - Now I can ask the question a different way
- What are my null and alternative hypotheses?
- Remember, I believe that fitness is related to
academic success
8Example cont
- HO There is no difference in science scores
between the high fitness and low fitness group - Notice, no difference would mean fitness has no
effect - HA There is a difference in science scores
between the high fitness and low fitness group - A difference would indicate that fitness has some
effect - This is simple enough we know how to calculate
and compare means in SPSS
9High vs Low Fitness Mean
- Conclusion? Should I reject the null
hypothesis? - Wait could this difference be due to random
sampling error?
10Need for new statistical test
- Is this difference due to random sampling error?
- Due to the effect of random sampling, the two
groups will NEVER have the exact same science
scores - I need a way to determine if this difference is
REAL or due to RSE - I need to use a statistical test that can
determine group differences and provide me with a
p-value
11T-test
- A t-test is a family of statistical tests
designed to determine if differences exist
between two groups (and ONLY two groups) - Based on t-scores (which are very similar to
z-scores), should tip you off they are based on
mean and SD - They test for equality of means
- If the two group means are equal then there is
no difference - 3 major types of t-tests
- One sample t-test, independent samples t-test,
paired-samples t-test
12T-tests
- One-sample t-test
- Compares mean of a single sample to known
population mean - i.e., group of 100 people took IQ test, are they
different from the population average? Do they
have above average IQ? - Independent samples t-test
- Compares the means scores of two different groups
of subjects - i.e., are science scores different between high
fitness and low fitness - Paired-samples t-test
- Compares the mean scores for the same group of
subjects on two different occasions - i.e., is the group different before and after a
treatment? - Also called a dependent t-test or a repeated
measures t-test - In all cases TWO group means are being compared
13Independent Samples T-Test
- Lets start here, since we need to use this test
for our fitness/science question - Independent Samples T-tests
- Used with a two-level, categorical, independent
variable (High/Low Fitness) ONLY two groups - and with one continuous dependent variable
(science ISAT scores) - Statistical assumptions 1) data are normally
distributed, 2) samples represent the population,
3) the variance of the two groups are similar
(homoscedasticity of variance)
NOTE Same as correlation/regression except we
no longer have to worry about a linear
relationship since one of our variables is
categorical (high/low fitness)
14SPSS Data format
- In SPSS, the science scores are my continuous,
dependent variable - I created the high and low fitness groups
based on how many PACER laps each child completed - When I created them, I coded high fitness as 0
and low fitness as 1 - You need to recognize how your data are coded for
a t-test
15(No Transcript)
16(No Transcript)
17SPSS T-test
- Move dependent variable to Test Variable
- Move your independent variable to Grouping
Variable - Notice, it now has 2 question marks
- SPSS needs to know which groups to compare
- Define Groups
18SPSS T-test
- Recall, high fitness was 0, low fitness was 1
- Manually enter these values into the box
- When done, hit continue, then ok
19T-test results
- The first box will contain what youve already
seen the mean of the two groups - Notice, n, mean, standard deviation (ignore SE)
for each group - The next box is too big for one screen, so Ive
split it into two pieces
20SPSS results - Output
- Recall, both groups need to have equal variance
(homogeneity of variance, or homoscedasticity) - SPSS tests for this using Levenes Test
- Null hypothesis There is equal variance
- This means you do NOT want a p-value lt 0.05
21SPSS results - Output
- If this Levenes Test p-value is gt 0.05
- Equal variances exist, use the top line of the
table - If this Levens Test p-value is lt or 0.05
- Equal variance does not exist, use the bottom
line - Becomes harder to find a statistically
significant result
22df Degrees of Freedom
- The table also shows df, or degrees of freedom
- df is used to calculate the t-score and p-value
for the t-test - df n 1
- For each group you have, subtract 1 from the
sample - We have two groups
- High Fitness, n 176, so 176 1 175
- Low Fitness, n 98, so 98 1 97
- Total n 176 98 274, we have 2 groups so
- Total df 274 2 272
23Degrees of Freedom
- df is important to understand if you are
calculating the p-values by hand we are NOT - All you need to know now is that
- Larger sample size ? df
- More groups ? df
- You want large df because it reduces your chance
of random sampling error (a large sample) and
increases the chance youll find statistically
significant results - This becomes more important beyond t-tests, since
we can have several groups (not just 2)
24df in our example
- Notice, the df in our example is 272 (274
subjects minus our two groups (high and low
fitness) - If you do not have equal variances, SPSS
downgrades your df, making it more difficult to
find statistically significant results
25Before we move on
- Questions about equality of variance test? df?
- Remember, were trying to determine if the
difference between the two groups is real or
due to RSE - What we know so far
- And, the two groups do have equal variance
26More results
- Here is the important stuff (remember, using top
line) - Our two groups (high/low fitness) had a mean
difference of 12.2 on the science ISAT - 239.1 226.9 12.2
- This difference is statistically significant, p
0.001
27Decisions
Questions about t-test results?
- HO There is no difference in science scores
between the high fitness and low fitness group - HA There is a difference in science scores
between the high fitness and low fitness group - Decision?
- Results The high fitness group scored higher
than the low fitness group on their science ISAT
test by 12.2 points. This difference was
statistically significant, t (272) 3.262, p
0.001. - Usually report the t value of the test and the
degrees of freedom in the paper (from table)
28One more thing
- Notice in the t-test table that we also were
provided with a 95 confidence interval - 95 confidence intervals are a statistic
available from most tests, and are related to
p-values. - Lower Bound 4.8, upper bound 19.5
2995 confidence intervals
- Confidence intervals are similar to p-values
- Remember, p-values indicate probability of random
sampling error - We want low p-values, which indicate a low
probability of random sampling error - We most often use a p-value cutoff of 0.05,
meaning we like to be 0.95 (or 95 confident)
that this was NOT due to random sampling error - Confidence intervals give you a similar type of
information, but in a more practical sense - Many people prefer confidence intervals over
p-values
3095 confidence intervals
- Remember, in statistics we are using samples to
try and figure out information about the
population - When we calculate a mean for a sample, we are
really trying to understand what the REAL
population mean is - But, due to random sampling error, we always know
that our sample mean is different from the real
population mean - Example mean IQ score for all 7 billion humans
is 100 - Sample 1 of 100 humans 102.1, Sample 2 105.3,
Sample 3 98.2, etc - Random sampling error
31IQ
Pretend we keep on drawing more and more samples
until we got 100 different samples and 100
different lines on this chart
1
3
2
If we did that, would there be a pattern to where
the lines were drawn?
Sample 1 Mean 102.1
Sample 2 Mean 105.3
Sample 3 Mean 98.2
Would ALL the lines be so close to the population
mean of 100?
X 100 SD 15
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
32However, a 95 confidence interval would tell you
where 95 of the 100 lines fell
Not all samples will be close to 100, just due to
random sampling error
95 Confidence Interval
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
33But notice, the more confident we want to be,
the wider the gap gets
Could also make a 99 confidence interval if we
wanted to
Usually, people stick with a 95 confidence
interval (since we usually use a p-value of 0.05)
99 Confidence Interval
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
34In this example, a 95 confidence interval
indicates that we are 95 certain that the REAL
population mean falls between these two values
We can use a 95 confidence interval for
virtually any population parameter we want to
such as a correlation coefficient, a regression
slope, or a mean difference between two groups
(like with our t-test)
95 Confidence Interval
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
35Back to our t-test
- Our 95 confidence interval
- Notice is says, Interval of the Difference
- We are 95 certain that the real difference is AT
LEAST as big as 4.8 and might be up to 19.5
points between our High and Low Fitness Groups - Our confidence level can never be 100, so there
is always a chance the real population difference
is outside of this range (just like p can never
be 0)
36Confidence Intervals and p-values
- These two values are connected because
- Both related to RSE
- Both calculated using n (and df)
- A low p-value (low chance of random sampling
error) will result in a smaller (more narrow)
confidence interval we can be more confident - A larger p-value will result in a wider
confidence interval we are less confident
Questions on confidence intervals?
37One more example t-test
- Instead of using Aerobic fitness, lets use
flexibility - I split my sample into High Flexibility and Low
Flexibility groups (based on sit and reach test) - Now, Ill run a t-test to see if the High
Flexibility kids score higher on their science
tests than the Low Flexibility kids
38What are my hypotheses?
- HO There is no difference in science scores
between the high flexibility and low flexibility
group - HA There is a difference in science scores
between the high flexibility and low flexibility
group
39T-test results Flexibility
- We can see that the high flexibility group has a
higher Science ISAT score by about 5, but is this
difference statistically significant???
40T-test results Flexibility
- Levenes Test p 0.521
- What does this mean?
- df 285
- What is our sample size?
41T-test results
- Notice the mean difference (difference between
High/Low groups) and the 95 confidence interval - I have intentionally removed the p-value for this
t-test - Is there a statistically significant difference
between the two groups? Is the p-value lt or
0.05?
42T-test results
- Remember, to reject the null hypothesis we have
to be reasonably certain that the two groups are
different - If this was the case, the difference between the
two groups could NOT be 0 - If the mean difference is 0, the two groups are
identical - When the 95 confidence interval INCLUDES 0, we
cant be 95 certain that there is a group
difference and therefore, p is gt 0.05
43T-test results
- Our 95 confidence interval includes 0 (one
number is negative and the other is positive) - Therefore, we cant be 95 certain the real group
difference is NOT 0 - p 0.311, we cant be sure this is not due to RSE
4495 CI and p
- If your 95 CI includes 0, your p-value will NOT
be less than or equal to 0.05 - Because both statistics are evaluating the chance
of RSE - If your 95 CI does not include 0 (both numbers
are positive or both are negative), then we can
be confident that the two groups are not the same
- This means that p lt or 0.05
Questions?
45Upcoming
- In-class activity
- Homework
- Cronk re-read 6.1, complete 6.3 (skip 6.2 for
now) - Holcomb Exercises 35 and 37, 38, 39
- More t-tests next week
- Single sample t-test
- Paired samples t-test (repeated measures t-test)
46Example In-Class, 10 minutes
- Go to Blackboard and open the SPSS dataset
- Fitness and Academics Reduced (week 7)
- Run two different independent samples t-tests
- Determine if kids who are aerobically fit (using
PACER) score higher in reading or math than kids
who have low fitness - Write down your results in this format (x2)
- T XX, df XX, p XX
- Mean difference XX, 95CI (XX to XX)
47Results of two t-tests
48Results of two t-tests
- Reading (equal variances NOT assumed)
- t 4.856, df 411.2, p lt 0.0005
- Mean difference 9.8, 95CI (5.9 to 13.8)
- Math (equal variances assumed)
- t 4.021, df 837, p lt 0.0005
- Mean difference 8.8, 95CI (4.5 to 13.0)