Exploring Group Differences - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Exploring Group Differences

Description:

EXPLORING GROUP DIFFERENCES 100 85 115 70 130 X = 100 SD = 15 55 145 Distribution of IQ scores from the entire population In this example, a 95% confidence interval ... – PowerPoint PPT presentation

Number of Views:177

Avg rating:3.0/5.0

Slides: 49

Provided by: DelSi9

Category:

more less

Transcript and Presenter's Notes

Title: Exploring Group Differences

1
Exploring Group Differences
2
Before Break

1) Descriptive Statistics
Measures of central tendency
Measures of variability
Z-scores
2) Understanding statistical significance
Hypothesis testing
Alpha and p-values
3) Testing for relationships/associations between
variables
Correlation (Pearsons r)
Simple regression
Multiple regression

3
After Break

1) Testing for Group differences
T-tests
ANOVA
2) Understanding statistical significance
Effect Size
Power
3) Nonparametric statistics and other common
tests
Chi-square
Logistic regression

If you have a firm grip on pre-break material
the second half of the course becomes much easier
(in my opinion)
4
Quick review

I have a dataset that contains information on
fitness and academic performance in middle-school
children
I want to know if fitness is related to academic
success
Im going to use PACER laps to quantify fitness
and ISAT science scores to quantify academic
success
I can answer this question in various ways
lets start with measures of association
(correlations)
What would be my null and alternative hypothesis
using a correlation?

5
Results

What is the relationship between aerobic fitness
and science ISAT results?
p 0.009, what does this mean?
Low chance of random sampling error
We would only see a correlation this strong, or
stronger, 9 times out of 1000 due to random
sampling error (due to chance)

6
Association

Association (and prediction) statistics like
correlation and regression are useful, but can be
limited
The other half of statistical testing is
centered around determining group differences
For example, we could ask our fitness/academics
question a different way and use a different set
of statistics
Also useful in experiments (treatment vs
control), comparing genders (males vs females),
etc

7
Example

Imagine I use PACER laps to split kids into two
different groups
High Fitness (high number of laps)
Low Fitness (low number of laps)
NOTE I took a continuous variable and made it
into a categorical variable (nominal/ordinal)
Now I can ask the question a different way
What are my null and alternative hypotheses?
Remember, I believe that fitness is related to
academic success

8
Example cont

HO There is no difference in science scores
between the high fitness and low fitness group
Notice, no difference would mean fitness has no
effect
HA There is a difference in science scores
between the high fitness and low fitness group
A difference would indicate that fitness has some
effect
This is simple enough we know how to calculate
and compare means in SPSS

9
High vs Low Fitness Mean

Conclusion? Should I reject the null
hypothesis?
Wait could this difference be due to random
sampling error?

10
Need for new statistical test

Is this difference due to random sampling error?
Due to the effect of random sampling, the two
groups will NEVER have the exact same science
scores
I need a way to determine if this difference is
REAL or due to RSE
I need to use a statistical test that can
determine group differences and provide me with a
p-value

11
T-test

A t-test is a family of statistical tests
designed to determine if differences exist
between two groups (and ONLY two groups)
Based on t-scores (which are very similar to
z-scores), should tip you off they are based on
mean and SD
They test for equality of means
If the two group means are equal then there is
no difference
3 major types of t-tests
One sample t-test, independent samples t-test,
paired-samples t-test

12
T-tests

One-sample t-test
Compares mean of a single sample to known
population mean
i.e., group of 100 people took IQ test, are they
different from the population average? Do they
have above average IQ?
Independent samples t-test
Compares the means scores of two different groups
of subjects
i.e., are science scores different between high
fitness and low fitness
Paired-samples t-test
Compares the mean scores for the same group of
subjects on two different occasions
i.e., is the group different before and after a
treatment?
Also called a dependent t-test or a repeated
measures t-test
In all cases TWO group means are being compared

13
Independent Samples T-Test

Lets start here, since we need to use this test
for our fitness/science question
Independent Samples T-tests
Used with a two-level, categorical, independent
variable (High/Low Fitness) ONLY two groups
and with one continuous dependent variable
(science ISAT scores)
Statistical assumptions 1) data are normally
distributed, 2) samples represent the population,
3) the variance of the two groups are similar
(homoscedasticity of variance)

NOTE Same as correlation/regression except we
no longer have to worry about a linear
relationship since one of our variables is
categorical (high/low fitness)
14
SPSS Data format

In SPSS, the science scores are my continuous,
dependent variable
I created the high and low fitness groups
based on how many PACER laps each child completed
When I created them, I coded high fitness as 0
and low fitness as 1
You need to recognize how your data are coded for
a t-test

15
(No Transcript)
16
(No Transcript)
17
SPSS T-test

Move dependent variable to Test Variable
Move your independent variable to Grouping
Variable
Notice, it now has 2 question marks
SPSS needs to know which groups to compare
Define Groups

18
SPSS T-test

Recall, high fitness was 0, low fitness was 1
Manually enter these values into the box
When done, hit continue, then ok

19
T-test results

The first box will contain what youve already
seen the mean of the two groups
Notice, n, mean, standard deviation (ignore SE)
for each group
The next box is too big for one screen, so Ive
split it into two pieces

20
SPSS results - Output

Recall, both groups need to have equal variance
(homogeneity of variance, or homoscedasticity)
SPSS tests for this using Levenes Test
Null hypothesis There is equal variance
This means you do NOT want a p-value lt 0.05

21
SPSS results - Output

If this Levenes Test p-value is gt 0.05
Equal variances exist, use the top line of the
table
If this Levens Test p-value is lt or 0.05
Equal variance does not exist, use the bottom
line
Becomes harder to find a statistically
significant result

22
df Degrees of Freedom

The table also shows df, or degrees of freedom
df is used to calculate the t-score and p-value
for the t-test
df n 1
For each group you have, subtract 1 from the
sample
We have two groups
High Fitness, n 176, so 176 1 175
Low Fitness, n 98, so 98 1 97
Total n 176 98 274, we have 2 groups so
Total df 274 2 272

23
Degrees of Freedom

df is important to understand if you are
calculating the p-values by hand we are NOT
All you need to know now is that
Larger sample size ? df
More groups ? df
You want large df because it reduces your chance
of random sampling error (a large sample) and
increases the chance youll find statistically
significant results
This becomes more important beyond t-tests, since
we can have several groups (not just 2)

24
df in our example

Notice, the df in our example is 272 (274
subjects minus our two groups (high and low
fitness)
If you do not have equal variances, SPSS
downgrades your df, making it more difficult to
find statistically significant results

25
Before we move on

Questions about equality of variance test? df?
Remember, were trying to determine if the
difference between the two groups is real or
due to RSE
What we know so far
And, the two groups do have equal variance

26
More results

Here is the important stuff (remember, using top
line)
Our two groups (high/low fitness) had a mean
difference of 12.2 on the science ISAT
239.1 226.9 12.2
This difference is statistically significant, p
0.001

27
Decisions
Questions about t-test results?

HO There is no difference in science scores
between the high fitness and low fitness group
HA There is a difference in science scores
between the high fitness and low fitness group
Decision?
Results The high fitness group scored higher
than the low fitness group on their science ISAT
test by 12.2 points. This difference was
statistically significant, t (272) 3.262, p
0.001.
Usually report the t value of the test and the
degrees of freedom in the paper (from table)

28
One more thing

Notice in the t-test table that we also were
provided with a 95 confidence interval
95 confidence intervals are a statistic
available from most tests, and are related to
p-values.
Lower Bound 4.8, upper bound 19.5

29
95 confidence intervals

Confidence intervals are similar to p-values
Remember, p-values indicate probability of random
sampling error
We want low p-values, which indicate a low
probability of random sampling error
We most often use a p-value cutoff of 0.05,
meaning we like to be 0.95 (or 95 confident)
that this was NOT due to random sampling error
Confidence intervals give you a similar type of
information, but in a more practical sense
Many people prefer confidence intervals over
p-values

30
95 confidence intervals

Remember, in statistics we are using samples to
try and figure out information about the
population
When we calculate a mean for a sample, we are
really trying to understand what the REAL
population mean is
But, due to random sampling error, we always know
that our sample mean is different from the real
population mean
Example mean IQ score for all 7 billion humans
is 100
Sample 1 of 100 humans 102.1, Sample 2 105.3,
Sample 3 98.2, etc
Random sampling error

31
IQ
Pretend we keep on drawing more and more samples
until we got 100 different samples and 100
different lines on this chart
1
3
2
If we did that, would there be a pattern to where
the lines were drawn?
Sample 1 Mean 102.1
Sample 2 Mean 105.3
Sample 3 Mean 98.2
Would ALL the lines be so close to the population
mean of 100?
X 100 SD 15
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
32
However, a 95 confidence interval would tell you
where 95 of the 100 lines fell
Not all samples will be close to 100, just due to
random sampling error
95 Confidence Interval
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
33
But notice, the more confident we want to be,
the wider the gap gets
Could also make a 99 confidence interval if we
wanted to
Usually, people stick with a 95 confidence
interval (since we usually use a p-value of 0.05)
99 Confidence Interval
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
34
In this example, a 95 confidence interval
indicates that we are 95 certain that the REAL
population mean falls between these two values
We can use a 95 confidence interval for
virtually any population parameter we want to
such as a correlation coefficient, a regression
slope, or a mean difference between two groups
(like with our t-test)
95 Confidence Interval
145
100
85
115
70
130
55
Distribution of IQ scores from the entire
population
35
Back to our t-test

Our 95 confidence interval
Notice is says, Interval of the Difference
We are 95 certain that the real difference is AT
LEAST as big as 4.8 and might be up to 19.5
points between our High and Low Fitness Groups
Our confidence level can never be 100, so there
is always a chance the real population difference
is outside of this range (just like p can never
be 0)

36
Confidence Intervals and p-values

These two values are connected because
Both related to RSE
Both calculated using n (and df)
A low p-value (low chance of random sampling
error) will result in a smaller (more narrow)
confidence interval we can be more confident
A larger p-value will result in a wider
confidence interval we are less confident

Questions on confidence intervals?
37
One more example t-test

Instead of using Aerobic fitness, lets use
flexibility
I split my sample into High Flexibility and Low
Flexibility groups (based on sit and reach test)
Now, Ill run a t-test to see if the High
Flexibility kids score higher on their science
tests than the Low Flexibility kids

38
What are my hypotheses?

HO There is no difference in science scores
between the high flexibility and low flexibility
group
HA There is a difference in science scores
between the high flexibility and low flexibility
group

39
T-test results Flexibility

We can see that the high flexibility group has a
higher Science ISAT score by about 5, but is this
difference statistically significant???

40
T-test results Flexibility

Levenes Test p 0.521
What does this mean?
df 285
What is our sample size?

41
T-test results

Notice the mean difference (difference between
High/Low groups) and the 95 confidence interval
I have intentionally removed the p-value for this
t-test
Is there a statistically significant difference
between the two groups? Is the p-value lt or
0.05?

42
T-test results

Remember, to reject the null hypothesis we have
to be reasonably certain that the two groups are
different
If this was the case, the difference between the
two groups could NOT be 0
If the mean difference is 0, the two groups are
identical
When the 95 confidence interval INCLUDES 0, we
cant be 95 certain that there is a group
difference and therefore, p is gt 0.05

43
T-test results

Our 95 confidence interval includes 0 (one
number is negative and the other is positive)
Therefore, we cant be 95 certain the real group
difference is NOT 0
p 0.311, we cant be sure this is not due to RSE

44
95 CI and p

If your 95 CI includes 0, your p-value will NOT
be less than or equal to 0.05
Because both statistics are evaluating the chance
of RSE
If your 95 CI does not include 0 (both numbers
are positive or both are negative), then we can
be confident that the two groups are not the same
This means that p lt or 0.05

Questions?
45
Upcoming

In-class activity
Homework
Cronk re-read 6.1, complete 6.3 (skip 6.2 for
now)
Holcomb Exercises 35 and 37, 38, 39
More t-tests next week
Single sample t-test
Paired samples t-test (repeated measures t-test)

46
Example In-Class, 10 minutes

Go to Blackboard and open the SPSS dataset
Fitness and Academics Reduced (week 7)
Run two different independent samples t-tests
Determine if kids who are aerobically fit (using
PACER) score higher in reading or math than kids
who have low fitness
Write down your results in this format (x2)
T XX, df XX, p XX
Mean difference XX, 95CI (XX to XX)

47
Results of two t-tests
48
Results of two t-tests