Qualitative Independent Variables - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Qualitative Independent Variables

Description:

Qualitative Independent Variables Sometimes called Dummy Variables In ... Some examples include gender and method of ... Summary 1 qualitative variable would ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 13
Provided by: WayneS54
Category:

less

Transcript and Presenter's Notes

Title: Qualitative Independent Variables


1
Qualitative Independent Variables
  • Sometimes called Dummy Variables

2
In the simple and multiple regression we have
studied so far the dependent variable, y, and the
independent variable(s), x(s) have been
quantitative variables. But the regression can
be used with other variables. We will study the
case where The dependent variable, y, is
quantitative, One (or more, in general)
independent variable is quantitative, and, One
independent variable is qualitative. Remember
that a qualitative variable is of the type where
different values for the variable are just
categories. Some examples include gender and
method of payment (cash, check, credit card).
3
An example y the repair time in hours. The
company provides maintenance and it would like to
understand why the repair time takes as long as
it does. With an understanding of repair time
maybe it can schedule employee hours better or
improve company performance in some other way. x1
the number of months since the last repair
service was performed. The idea is that the
longer since the last repair the more that will
be need to be done. The is a quantitative
variable. x2 the type of repair service needed.
In this example there are only two types of
repairs electrical and mechanical. So, the
company has clients that need repairs and the
company is exploring what accounts for the time
it takes to make a repair.
4
On the next slide I have a graph where two
quantitative variables are on the axes. The two
ovals represent the cloud of data points. Here
the points suggest a positive relationship
between months since last repair and repair time.
Of course, we will have to test if this is the
real case or not, but the graph suggests that is
the case. I have two ovals because it is thought
that maybe each type of repair has a different
impact on repair time. The different ovals
represent what is happening for each type of
repair and here I am suggesting that there is a
difference in repair time for each level of
repair type. Here we will also do a test to see
if the different types of repair lead to
different repair times.
5
Repair time
Months since last repair
6
The model Here the regression model is y Bo
B1x1 B2x2. When we estimate the model we use
data on y and x1 and x2. Here we make the data
for x2 special. We will say that x2 0 if the
data point is for a mechanical repair and x2 1
if the data point is for an electrical
repair. Now, when we look at the model for the
two types of repair we get the following When
x20 y Bo B1x1 B2(0) Bo B1x1, and when
x2 1, y Bo B1x1 B2(1) Bo B2 B1x1.
The impact of creating x2 as a 0, 1 variable is
that when the value is 0 we have one line and
when the value is 1 we have another line with a
different intercept. The intercept is Bo with
the mechanical repair and the intercept is Bo
B2 with the electrical repair.
7
(No Transcript)
8
Getting and interpreting the results The
previous slide has the Excel printout for this
regression model. The interpretation starts with
the F test. The null is that both B1 and B2 are
equal to zero. Here the F stat is 21.357 with a
p-value (Significance F) .001. Then we would
reject the null with alpha as small as .001
(certainly we reject at alpha .05) and we go
with the alternative that at least one of the
betas is not equal to zero. In other words, as
a package the xs exhibit a relationship with the
y variable. The next step is to do the t tests on
each slope value B1 and B2 (even here we tend to
ignore the test on Bo because we typically do not
have much data with all the xs 0) separately.
Here the p-values on both have values less than
.05 so we reject the null and conclude each
variable has an impact on y.
9
Repair time
Electrical y (.9305 1.2627) .3876x1
Mechanical y .9305 .3876x1
.9305 1.2627
.9305
Months since last repair
10
On the previous slide I reproduced the graph I
had before, and I added the equations for repair
time under each value of x2. When x2 0 we have
the line for mechanical types of repair. When x2
1 we have the line for electrical types of
repair. Ultimately the difference in the two
lines here is in the intercept. But, the slope
of each line is the same. This means that months
since the last repair has the same impact on
repair under either type of repair. Since b2
1.2627 (really since we rejected the null that B2
0) the electrical line has a higher intercept.
We can use each equation to predict repair time
given the value of months since last repair, and
given the type of repair. Of course, if the type
is mechanical we use the mechanical line and we
use the electrical line for the electrical
type. The next thing we would do is evaluate R
square. Here the value is .8592 and this
indicates that just over 85 of the variation in
y is explained by the xs.
11
The qualitative variable In our example we had a
qualitative variable with two categories. Note
we added 1 x variable for this 1 qualitative
variable. The reason is because the 1 variable
had 2 categories. Now if the 1 qualitative
variable has 3 categories we would have to have 2
x variables. Say we had mechanical, electrical
and industrial repair types. We would need x2
and x3 variables, in addition to repair time,
x1. With 3 categories we would have 3 lines.
When x2 0 and x3 0 the intercept would be Bo
for the mechanical line. When x2 1 and x3 0
the intercept would b Bo B2 for the electrical
line (assuming the tests had us reject the
null). When x2 0 and x3 1 the intercept would
be B0 B3 for the industrial line.
12
In general, if the 1 qualitative variable has k
categories, we add k-1 xs. When all the xs
are zero we have intercept Bo and the line
represents the equation for 1 of the categories
and then the other xs account for the change
from Bo the other k-1 category values
have. Summary 1 qualitative variable would have k
lines associated with it (assuming tests reject
Ho) and we add k-1 xs of the 0,1 type to account
for all the k categories. 1 category is made the
base category and its line will have intercept
Bo and the other categories will have intercept
Bo Bt, where the t would be different for each
case of the other categories on the variable.
Write a Comment
User Comments (0)
About PowerShow.com