Title: Regression Analysis
1Regression Analysis
Modeling Relationships
2Regression Analysis
- Regression Analysis is a study of the
relationship between a set of independent
variables and the dependent variable.
The Linear Equation representing the true or
population relationship
3Variables
- Dependent Variable Also called the predicted
variable. Its value depends on, or can be
predicted by the independent variables. - Independent Variables Also called the predictor
variables. These can be measured directly, and
are used to predict the dependent (or to simply
understand it better).
4Modeling Process
Define Goal To study the impact of various factors on individual health
Choose y Lung Capacity, measured in cc.
List possible Xs Minutes of Exercise per day, of days/week of exercise, ethnicity, gender, age, height, altitude at which lived.
Collect Data Primary, Secondary sources
Preliminary Analyses Univariate, bivariate
Build Regression Model How is y related to all the Xs?
Evaluate Model How good is the model at predicting y?
Implement/Monitor Create DSS, monitor, update
5The Data
A portion of the data is shown below. See
Spreadsheet for all data.
Y X1 X2 X3 X4 X5
Lung Capacity (cc) Gender Height Smoker Exercise Age
5673 1 69.5 0 25 47
5632 1 70.1 0 24 67
5712 1 68.2 0 26 36
5723 1 70.9 0 26 68
5484 1 71.9 1 20 58
5308 1 69.2 1 15 19
5133 1 71.9 1 0 40
6Preliminary Analyses
The table below shows some descriptive statistics
for each variable. What basic statements about
our data can we make from this?
 Lung Capacity (cc) Gender Height Smoker Exercise Age
Mean 5325.60 0.50 68.23 0.39 21.35 46.42
Stdev 410.48 0.50 3.45 0.49 8.91 13.98
Min 4233.71 0.00 58.93 0.00 0.00 19.00
Max 6261.00 1.00 76.61 1.00 40.29 82.14
7Capacity by Gender, Smoking
  Gender  Â
Smoker Data Female Male Grand Total
Non-Smoker Average of Lung Capacity (cc) 5427.67 5662.22 5546.87
 StdDev of Lung Capacity (cc) 256.41 284.71 293.75
 Count of Smoker 30.00 31.00 61.00
Smoker Average of Lung Capacity (cc) 4837.45 5129.05 4979.51
 StdDev of Lung Capacity (cc) 273.74 297.51 318.12
 Count of Smoker 20.00 19.00 39.00
Total Average of Lung Capacity (cc) Total Average of Lung Capacity (cc) 5191.58 5459.61 5325.60
Total StdDev of Lung Capacity (cc) Total StdDev of Lung Capacity (cc) 391.51 387.93 410.48
Total Count of Smoker Total Count of Smoker 50.00 50.00 100.00
Does there appear to be a relationship between,
Smoking, Gender, and Lung Capacity?
8Distributions
9Bivariate Analysis Matrix Plot
10Capacity distribution by Gender, Smoking
Men have a larger lung capacity than women, on
average.
Non-Smokers have a larger lung capacity than
smokers on average. What about the variance?
11Simple Regression
- How well can exercise time alone predict the
lung capacity?
12Multiple Regression
SUMMARY OUTPUT
Regression Statistics Regression Statistics
Multiple R 0.8798341
R Square 0.7741081
Adjusted R Square 0.7620926
Standard Error 200.21
Observations 100
- How do all the Xs together help predict y?
 Coefficients Standard Error t Stat P-value
Intercept 1662.3965 475.1456634 3.498709192 0.000716253
Gender 202.3282 41.86861042 4.832456809 5.23607E-06
Height 50.3468 7.08207335 7.109058989 2.24959E-10
Smoker -278.9711 52.71395448 -5.292169492 7.88193E-07
Exercise 11.2949 2.991170972 3.776112614 0.000279023
Age -0.1174 1.462303258 -0.080303367 0.936166702
13Final Model
SUMMARY OUTPUT SUMMARY OUTPUT
Regression Statistics Regression Statistics
Multiple R 0.879825
R Square 0.774093
Adjusted R Square 0.764581
Standard Error 199.164
Observations 100
1656.937 202.104 Gender 50.359 Height
279.025 Smoker 11.259 Exercise
 Coefficients Standard Error t Stat P-value
Intercept 1656.937 467.7903 3.54205 0.000617
Gender 202.104 41.55695 4.86332 4.57E-06
Height 50.359 7.043082 7.150271 1.78E-10
Smoker -279.025 52.43341 -5.3215 6.85E-07
Exercise 11.259 2.943494 3.825342 0.000234
14Prediction Exercise
- Predict the lung capacity for a non-smoking
female who does not exercise, and is 66 inches
tall, based on the model above. - What would be the predicted value if she smoked?
- What would it be for a male in both the above
cases?