P1251328604MNaSK

About This Presentation

Title:

P1251328604MNaSK

Description:

the value of b0 is the mean (or predicted score) for the control group, and is ... represents the boy-girl mean difference (or here intercept difference) ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 48

Provided by: marl190

Learn more at: https://www.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: P1251328604MNaSK

1
Categorical Variables in MR
Youve heard that it is possible to do ANOVA via
multiple regression. But I have also said that
we cant use multi-category variables in MR, we
can only use dichotomies (dummy variables).
Thus, we must learn to use dummy variables to
represent multi-category factors. First, a
dummy variable is a dichotomy coded as a 0/1
variable. If gender is the variable we may
code a gender dummy variable X where X 0 if
subject is female, X 1 if subject is male.
2
The Concept of the Dummy Variable in MR
We have already seen dummy variables used in MR.
When we use a dichotomy like X in MR, we get the
typical estimated equation We know that b1
represents the predicted change in Yi associated
with 1 unit increase in Xi . Normally this one
point change can occur anywhere along the scale
of X a change from 50 to 51 or -10 to -9 would
cause the same shift or difference in Yi values.
But when Xi is a dummy variable and only takes
on two values, a difference of one point can only
occur when Xi is no longer equal to 0 and is
equal to 1, e.g., when Xi 0, the subject is
female, but when Xi 1, the subject is male.
3
The Concept of the Dummy Variable in MR

So, the slope b1 for our bivariate regression
where X is a dummy variable equals the mean
difference between the two represented groups
for this X (gender) the slope b1 is the
difference between the means of males and
females. Also we can see that
will equal b0 when Xi 0.
So the intercept represents the mean for all
cases with Xi 0 (i.e., the mean of Y for the
females).
We can actually prove this is true if we do some
algebra with the formula for b1 where the Xs are
all 0s and 1s.

4
The Concept of the Dummy Variable in MR

But it is easier to see this empirically by
running descriptive statistics and a t test on
two groups, and then running a regression using a
dummy variable that represents the 2 groups as
our X.
This was seen in our output for prinlead with the
private vs public variable (private) as our
predictor. Recall that the slope was b1 2.96
(with t 4.754) and when we did the t test we
obtained the results below.

5
The Concept of the Dummy Variable in MR

Be careful though -- the above results only hold
exactly for a dummy variable coded 0/1. For
instance, if we have a dichotomy that is coded
1/2 the slope will equal the mean difference
(because there is still a one-point difference
between the groups), but the intercept will NOT
equal the mean for either group.
If the dichotomy is coded with numbers that are
not one unit apart, then the slope will not equal
the mean difference.
You can check this by running a regression using
a dummy variable that represents the 2 groups
with numbers other than 0 and 1.

6

Dummy Variables to Represent k Groups
Suppose now we have 3 groups, say, our familiar
practice type factor. Until now we have coded
this using T 1 if subject does physical
practice, T 2 if subject does mental practice,
and T 3 if subject does not practice
(ctrl). However, MR will treat T as if it were a
number, not a label. Is mental practice (T 2)
twice as good/bad or different as physical
practice (T 1)? If not, using X in MR would be
a mistake. So we need to represent the 3 groups
in some other way. We will use (k-1) dummy
variables to represent k groups.
7

Dummy Variables for k Groups
We will use dummy (0/1) variables to
differentiate these 3 groups. Let X1 represent
Does the subject use physical practice? (1
Yes) and X2 represent Does the subject use
mental practice? (1 Yes). If we have one
subject from each group, their values of the
original factor (T) and the two dummy variables
X1 and X2 would be Subject Group T X1
X2 Jim Physical 1 1 0 John Mental 2 0 1 Joe
Control 3 0 0
8

Dummy Variables for k Groups
Again here are the scores
X1,X2 Subject Group T X1 X2
Pair Jim Physical 1 1 0 1,0 John Mental 2 0 1
0,1 Joe Control 3 0 0 0,0 We do not need a
third variable (X3) that represents Does subject
not practice? or equivalently Is the subject in
the control group? (1 Yes). The pair of
values (X1 , X2) is different for each group of
subjects so we can tell 3 groups apart using 2
dummy variables.
9
Dummy Variables for k Groups
Suppose we did make that third variable. Wed
have Subject Group T X1 X2
X3 X1 X2 1 - (X1 X2 )X3 Jim
Phys 1 1 0 0 1
0 John Ment 2 0
1 0 1 0 Joe
Ctrl 3 0 0 1
0 1 But notice, X3 is a function
of X1 and X2. The formula is X3 1 (X1 X2
). Because the variables X1 and X2 completely
determine X3 , X3 is redundant. Worse yet, if
we try to use X1, X2 and X3 in a regression, MR
will not run at all because the three are totally
multicollinear! So we can only use any pair of
the variables X1, X2 and X3.
10
(k 1) Dummy Variables in MR
Also note, we interpret b1 as the difference
between physical-practice subjects and all
others, holding X2 constant. For any one subject,
X2 will be constant, but the change in X2 tells
us which of the other two groups a subject is in.
For all the physical-practice subjects, X2 0.
For the others, X2 0 if they are control
ss (and X1 0) or X2 1 if they are using
mental practice (and X1 0). Because of the
way we created X1 and X2, we will never have a
case where both X1 1 and X2 1.
11
(k-1) Dummy Variables in MR
Suppose now that we use the variables X1 and X2
in MR. Our estimated regression equation will
be b0 b1 X1i b2 X2i The slope b1
represents predicted change in Y for 1 unit
increase in X1, holding X2 constant. It is
also the difference between the physical-practice
group mean and the mean for all other groups
combined. The slope b2 represents predicted
change in Y for 1 unit increase in X2, holding X1
constant, and it is the mean difference between
the mental-practice s's and all others.
12
(k-1) Dummy Variables in MR
Finally in our estimated regression equation
b0 b1 X1i b2 X2i the value of b0 is
the mean (or predicted score) for the control
group, and is the predicted value of Y when the
values of all the X's are zero. In a case such
as this we might actually be interested in
testing whether b0 0 because b0 represents the
population mean of the control group. So, with
the intercept and two slopes we can compute all
of the group means.
13
(k 1) Dummy Variables in MR
Specifically we can see that bo
for control group bo b1 for
physical practice group bo b2
for mental practice group None of our cases ever
has because no one ever uses both physical
and mental practice together.
14
Interactions in MR

Now that we have seen that we can use dummy
variables in MR to represent groups, we will
tackle another idea.
Suppose we are examining a regression model with
a dummy variable and we expect that X relates to
Y differently in our 2 groups. Say we are
studying the prediction of locus of control (LOC)
scores for males and females, and we believe
locus of control relates differently to parental
control for boys and girls.
We can say that there is an interaction of gender
and parental control with respect to the locus of
control outcome.

15
Detecting an Interaction

How can we tell if we need to be concerned about
an interaction like this? One way is to let SPSS
help us find it by using the scatter plot.
Another is to examine the correlations or slopes.
Pull down the Graph menu and make a scatterplot
where Y is the variable (f1locus2) and X is
f1ctrl2 (parental control). Before you click
OK, move the variate f1sex into the Set
Markers by box.
This forces SPSS to use different symbols for
boys and girls and enables us to edit
(double-click) the plot then use Chart Options
to plot separate regressions for boys and girls.

16
Detecting an Interaction

Here we first see the plot without markers the
relationship looks weak. Then we add the markers
by sex and plot two lines. The lines are
different, but not dramatically so.
It is safe to say the slopes of the two lines are
not identical.

17
Modeling Interactions in MR

One solution when we see an interaction is to run
separate regression analyses for the groups
involved (e.g., boys and girls). To do this we
must split the file and run a model with only the
predictor variable in it.
However, there are drawbacks to this. First of
all our sample size for the regressions will be
much smaller because we will do two analyses, on
parts of the data.
Also we will end up with separate slopes and need
to do computations by hand to test whether the
slopes differ by gender. But as an exercise we
will run these two models anyway and compare them
to the other approach to see how they relate.

18

Modeling Interactions in MR

The separate models for boys (left) and girls are

19

Modeling Interactions in MR

The estimated regression models are
For boys And for girls
.165 -.002 f1ctrl2 -.179 -.013
f1ctrl2
If we run the same model for all subjects
together, we get
-.02 -.006 f1ctrl2

20
Modeling Interactions in MR

With the above approach we run separate
regressions for the groups involved if we see a
potential interaction. However, a more
parsimonious solution would be to model the two
slopes via an interaction in a single regression
model. Also this approach gives us a test of the
interaction (i.e., the difference between the
slopes).
To do this we need to compute an interaction
variable. Suppose X1 is a gender dummy variable
and X2 is f1ctrl2 is Xa . Then we can compute
the product X3 X1 ? X2 sexdummy ?f1ctrl2.
This new variable takes on values as follows
X3 0 if the subject is female
X3 f1ctrl2 X2 if the subject is male.

21

Modeling Interactions in MR

For the more elegant solution we run a regression
that includes X1 (the dummy for gender), X2 (the
continuous predictor f1ctrl2), and X3 (the
interaction) so our model is
Yi ?o ?1 X1i ?2 X2i ?3 X3i ei
?1 represents the boy-girl mean difference (or
here intercept difference), controlling for X2
and X3,
?2 represents the slope of the predictor f1ctrl2,
controlling for X1 (gender differences) and X3
(the interaction), and
?3 is the interaction, or the sex difference in
the f1ctrl2 slope, controlling for X1 and X2.

22

Modeling Interactions in MR

Because X1 is a dummy variable and also because
X3 takes on the value 0 for girls, the girls and
boys have different models that we can determine
even without running SPSS to estimate the model.
Since girls have the value 0 for X1 and X3, those
variables do not appear in the model for girls.
The girls' model is
Yi ?o ?1 X1i ?2 X2i ?3 X3i ei ?o
?2 X2i ei ,
For the boys X1 1 and X3 X2. Thus the boys'
model is
Yi ?o ?1 X1i ?2 X2i ?3 X2i ei
(?o ?1) (?2 ?3 ) X2i ei

23
Checking on the Interaction

From this output we can see that the girls'
estimated regression line is -.179 -.013
f1ctrl2. This is the same result that appears on
slide 19.
Also we can compute the boys' model. Their
intercept is b0 b1 -.179 .344 .165 and
the slope (b2 b3) -.013 .011 -.002. On
page 19 the boy's model was .165 -.002 f1ctrl2.

24
Checking on the Interaction

SPSS also then tells us whether the interaction
is significant because the test of H0 ?3 0 is
an interaction test.
In this output we see that X3 sexdummy
f1ctrl2 does not have a significant slope. So
even though the lines look different, they are
not different enough for us to need to keep the
interaction variable X3 in the model.

25
Adjusted R2
Until now we have only used R2 SS Regression/
SSTotal as one index of variance explained.
However, we have heard that R2 can increase even
if we add predictors like the kitchen sink (all
kinds of minimally useful Xs). So because R2 is
sensitive to sample size and the number of Xs
used, we will learn how adjusted R2 works. The
formula is R2 adjusted We can think of (1 -
R2) as the proportion of variance not accounted
for by our Xs. The number (n-1)/(n-p-1) is the
ratio of total df to error df. It is always
larger than 1, and gets increasingly larger as we
add more predictors.
26
Adjusted R2
Consider a case with 30 subjects and R2 .50 for
one X (p 1), but with R2 .60 with p 10
predictors. It looks like adding more Xs helped
us, but what if many of the added 9 variables are
not too useful? Here are the adjusted R2
values R2 .5, n 30, p 1 R2 .6, n 30,
p 10 R2 adj 1 (.50) (29/28) R2 adj 1
(.40) (29/19) R2 adj .48 R2 adj
.39 Once we adjust for throwing in a pile of
not-too-useful Xs we see we have effectively
explained less than with one X! So unless our
additional 9 Xs really add considerably to the
model we wont get a larger adjusted R2.
27

Tests of Change in R2 (Do more Xs explain more?)
Often we will want to know whether adding one X
or a set of Xs has significantly increased the
explanatory power of our model. We can do this
approximately by comparing R2 values, or we can
do an exact test of whether R2 has increased in
the population. However, for the test to be
legitimate the smaller set of Xs must be a subset
of those in the larger model. So we can compare
these models prinlead f (tchhappy, f1ses) vs
prinlead f (tchhappy, f1ses, tchinfl). But
we cant compare these prinlead f (tchhappy,
f1ses) vs prinlead f (tchhappy, tchinfl,
tchangst).
28

Notation for Increment to R2 Test
Let us call the model with more Xs (say pL) the
larger model and its R2 will be R2L. Then the
model based on a subset of those Xs will be the
smaller model with ps predictors and R2s. To
test whether the large model has a signifcantly
larger explained variance than the small one we
test H0 -
0 or H0 ?pS1 ?pL 0 We use an
F test, called the Increment to R2 or "change
in R2 test to examine these hypotheses.
29

Notation for Increment to R2 Test
Here is the formula for the F test. (R2L -
R2s) / (pL ps) F (1- R2L) / (n pL
1) Notice that (pL ps) is the number of
predictors we have added to the smaller model in
an effort to explain more variance in Y. As
noted above, we are also asking whether any of
the added Xs has a nonzero slope. If all the Xs
have slopes equal 0, they have not added anything
to our model.
30

R2 Tests
We first begin by noticing that the overall F
test in MR is also a test of We can write
R2 /p
SSReg/SST / p F (1- R2) / (n - p -1)
SSResidual/SST/(n p 1) However, this is a
test based on one R2, so we can say H0 ?1 ?2
?p 0 and H0 0 are
equivalent hypotheses. However, we want to test
for a difference or increase in R2 values.
31

Example of Increment to R2 Test
Suppose we want to see whether the three
variables to our model for prinlead in the NELS
school data set. Our larger model has pL 4
predictors (tchhappy, tchcomm, tchinfl, and
f1ses). Recall that the R2 for the model with 4
Xs is R2L .368. Our smaller model will be the
one with only tchcomm (ps 1). The smaller R2
will be R2s .327. So (R2L - R2s ) / (pL
ps) (.368 - .328)/3) . .040/3
20.19 F (1 - R2L) / (n - pL- 1) (1 -
.368)/ 957 .632/957 This
is an F test with (pL ps) 3 and (n pL 1)
957 degrees of freedom. It is significant.
32
Venn Diagrams for R2 Values

If you think back to your primary school math
classes many of you will recall encountering
Venn diagrams
Venn diagrams are often used to show
intersections and unions of sets and subsets in
mathematics, and are typically drawn as
overlapping circles or ovals
The hatched section
where the two circles
overlap represents the
intersection of two sets.

33
Venn Diagrams for R2 Values
We use Venn diagrams in MR to show how much of
the variance in our outcome Y we are explaining
through the use of different predictors (say, X1,
X2 and X3.) One circle will represent variation
in Y (I will give this circle a solid outline),
and the others will be the predictors. Let us
first consider only one predictor, X1. In MR we
will use the hatched section where the two
circles overlap to represent the common
variance X1 Y between the two variables.
The higher the correlation between X1
and Y, the more the circles will overlap.
34
Venn Diagrams for R2 Values
Since R2(X, Y) represents the proportion of
variance in common between the two variables X
and Y if we are willing to consider the total
area of the circle for Y to equal 1, then the
area where the circles overlap equals R2(X, Y).
It is difficult to draw the hatched section to be
exactly equal to the proper proportion of the
area of Y without special software. So just do a
decent job and that will be sufficient. The
higher the correlation between X1 and Y, the
more X1 Y the circles should overlap.
.15 Perhaps here the R2(X1, Y) is
about .15.
35
Venn Diagrams for R2 Values
When we have several predictors we need to also
consider the degree to which the Xs are
intercorrelated when we draw the Venn diagrams.
The ideal situation would be one where the
circles for the Xs cover a great deal of the area
of the circle for Y but do not overlap with each
other. Here is such a diagram. I am leaving out
the hatching but labeling the overlapping areas
instead. Here the total R2 for a
regression model of Y on X1, X2 and X3 would be
the sum of the areas a b c. Also since none
of the circles for X1, X2 and X3 overlap, they
are all independent.
X2
b
X3
c
a
X1
Y
36
Venn Diagrams for R2 Values
Also in this diagram since X1, X2 and X3 are
independent, we know that R2(Y, ) a b
c R2(Y, X1 ) R2(Y, X2 ) R2(Y, X3
) However in most cases where the X's are
intercorrelated the computation of R2(Y, ),
etc., for these diagrams is a bit more
complex. So we will see how to compute the
values for other cases.
X2
b
X3
c
a
X1
Y
37
Venn Diagrams for R2 Values
Let us consider a diagram with two predictors
plus Y. As before we will use R2 to represent the
proportion of variance in common between two
variables, but now we need to worry about the
shared variance among the Xs. We need values of
R2(X1, X2), R2(Y, X1), R2(Y, X2) and R2(Y X1,
X2) R2(Y, ). Note that according to the
diagram, R2(Y, ) a b c R2(Y, X1 ) a
c R2(Y, X2 ) b c R2(X1, X2) c d
Y
a
X1
c
b
d
X2
38
Venn Diagrams for R2 Values
We will run SPSS to get the values of R2(X1,
X2), R2(Y, X1), R2(Y, X2) and R2(Y X1, X2)
R2(Y, ). This will require us to run four
regression models. This is easier than using
correlations and squaring them, and is necessary
when we have two or more predictors (unless we
save predicted values which can get very
tedious). Suppose we are examining the NELS
student data with Y student morale
(f1stumor) X1 locus of control (f1locus2) and
X2 socioeconomic status (f1ses)
39
Venn Diagrams for R2 Values
Y student morale (f1stumor) X1 locus of
control (f1locus2) X2 socioeconomic status
(f1ses) R2(Y locus ses) a b c
.223 R2(Y, locus X1) a c .206 R2(Y, ses
X2) b c .039 R2(X1, X2) c d
.023 We will now get the components by
subtraction.
Y
a
X1
c
b
d
X2
40
Venn Diagrams for R2 Values
R2(Y locus ses) a b c .223 R2(Y, locus
X1) a c .206 R2(Y, ses X2) b c
.039 R2(X1, X2) c d .023 We use the four
quantities and subtract to get components. Area
a. This is the amount explained by X1 (locus)
only a R2(Y locus ses) - R2(Y, ses X2) a
(a b c) - (b c) a .223 - .039
.184 Area b. The amount explained by X2 (ses)
only b R2(Y locus ses) - R2(Y, locus
X1) b (a b c) - (a c) b .223 - .206
.017
X1
Y
a
c
b
d
X2
41
Venn Diagrams for R2 Values
R2(Y locus ses) a b c .223 R2(Y, locus
X1) a c .206 R2(Y, ses X2) b c
.039 R2(X1, X2) c d .023 There are several
ways to get components c and d. Area c. This is
the amount explained by both X1 and X2 c R2(Y
locus ses) - a - b ) OR R2(Y, ses X2) -
b c (a b c) - a - b OR (b c) - b c
.223 - .184 - .017 .039 - .017 .022 Area d.
The variance shared by X1 and X2 but not common
to Y d R2(locus ses) - c (c d) - c d
.023 - .022 .001
X1
Y
.184
c
.017
d
X2
42
Venn Diagrams for R2 Values
Here are all the areas Area a. The amount
explained by X1 (locus) only .184 Area b. The
amount explained by X2 (ses) only .017 Area c.
The amount explained by both X1 and X2
.022 Area d. The variance shared by X1 and X2
but not common to Y .001 R2(Y locus ses)
a b c .223 R2(Y, locus X1) a c
.206 R2(Y, ses X2) b c .039 R2(X1, X2)
c d .023
X1
Y
.184
.022
.017
.001
X2
43
Venn Diagrams for R2 Values
Lest you think this is some ridiculous exercise I
have created simply to torture you, next we will
see how this kind of diagram is used in practice
-- specifically in an article by Leonie J. Rennie
and Keith F. Punch appearing in the Journal of
Research Science Teaching (1991, JRST, 28(2),
193-209). They examined the prediction of
science achievement from science affect (e.g.,
liking of science) and prior achievement. Also
they examined science achievement at two time
points (March and October) for two schools that
they refer to as Red School and Blue School.
44
This slide shows the results for achievement and
affect in one of two schools.
This R2 shows much common variance.
39 of variance in science achievement was
explained by prior achvt and affect.
45
Slightly less was explained in the second school,
but here affect and prior achvt were more
independent.
This R2 is much smaller.
46
Later in the year in Red School, even more
variance was explained.
47
Later in the year in the second school, things
changed quite a bit in Blue School!

Write a Comment

User Comments (0)