Title: Stata 3, Regression
1Stata 3, Regression
- Hein Stigum
- Presentation, data and programs at
- http//folk.uio.no/heins/
2Agenda
- Linear regression
- GLM
- Logistic regression
- Binary regression
- (Conditional logistic)
3Linear regression
- Birth weight
- by
- gestational age
4Regression idea
5Model and assumptions
- Model
- Assumptions
- Independent errors
- Linear effects
- Constant error variance
6Association measure RD
Model
Start with
Hence
7Purpose of regression
- Estimation
- Estimate association between outcome and exposure
adjusted for other covariates - Prediction
- Use an estimated model to predict the outcome
given covariates in a new dataset
8Adjusting for confounders
- Not adjust
- Cofactor is a collider
- Cofactor is in causal path
- May or may not adjust
- Cofactor has missing
- Cofactor has error
9Workflow
- Scatterplots
- Bivariate analysis
- Regression
- Model fitting
- Cofactors in/out
- Interactions
- Test of assumptions
- Independent errors
- Linear effects
- Constant error variance
- Influence (robustness)
10Scatterplot
11Syntax
- Estimation
- regress y x1 x2 linear regression
- xi regress y x1 i.c1 categorical c1
- Post estimation
- predict yf, xb predict
- Manage models
- estimates store m1 save model
12Model 1 outcomeexposure
13Model 2 Add counfounders
Estimate association m1m2
Prediction m2 is best
14Dummies
Assume educ is coded 1, 2, 3 for low, medium and
high education
Choose low educ as reference
Make dummies for the two other categories generat
e medium(educ2) if educlt. generate high
(educ3) if educlt.
15Interaction
Model
Start with
Hence
16Model 3 with interaction
17Test of assumptions
- Predict y and residuals
- predict y, xb
- predict res, resid
- Plot resid vs y
- independent?
- linear?
- const. var?
18Violations of assumptions
- Dependent residuals
- Mixed models xtmixed
- Non linear effects
- gen gest2gest2
- regress weigth gest gest2 sex
- Non-constant variance
- regress weigth gest sex, robust
19Measures of influence
Remove obs 1, see change remove obs 2, see change
- Measure change in
- Outcome (y)
- Deviance
- Coefficients (beta)
- Delta beta, Cooks distance
20Points with high influence
lvr2plot, mlabel(id)
21Added variable plot gestational age
avplot gest, mlabel(id)
22Removing outlier
23Influence
24Final model
Give meaning to constant term
sum gest / find smallest value / generate
gest2gest-204 / smallest gest204 / generate
sex2sex-1 / boys0, girls1 / regress weight
gest2 sex2 / final model / estimates store m4
25Logistic regression
26Model and assumptions
- Model
- Assumptions
- Independent residuals
- Linear effects
27Association measure, Odds ratio
Model
Start with
Hence
28Syntax
- Estimation
- logistic y x1 x2 logistic regression
- xi logistic y x1 i.c1 categorical c1
- Post estimation
- predict yf, pr predict probability
- Manage models
- estimates store m1 save model
- est table m1, eform show OR
29Workflow
- Bivariate analysis
- Regression
- Model fitting
- Cofactors in/out
- Interactions
- Test of assumptions
- Independent errors
- Linear effects
- Influence (robustness)
30Bivariate
Generate dummies gen Island (country2) if
countrylt. gen Norway (country3) gen
Finland (country4) gen Denmark (country5)
31Model 1 outcome and exposure
Alternative commands
xilogistic bullied i.country use xi i.var for
categorical variables xilogistic bullied
i.country , coef coefs instead of
OR's xilogistic bullied i.country if sex!.
age!. do if sex and age not missing
32Model 2 Add confounders
33Interaction
Model
Start with
Hence
34Model 3 interaction
35Test of assumptions
- Linear effects (of age)
- findit lincheck search and install
- lincheck xilogistic bullied age I.country sex
36Points with high influence
estimates restore m2 restore best model predict
p, p probability (mu in our notation) predict
db, db delta-beta (one value, not one per
estimate) scatter db p delta-beta plot
37Removing 2 observations
Conclusion Robust results
38Generalized Linear Models
39Designs and measures
Models  Measures
GLM RR, RD, OR
Survival  Rate Ratio
40Generalized Linear Models, GLM
Linear regression
Logistic regression
Poisson regression
41GLM Distribution and link
- Distribution family
- Given by data
- Influence p-value, CI
- Link function
- May chose
- Shape (link-1)
- Scale
- Association measure
Normal Binomial Poisson
Identity Logit Log
Additive Multi. Multi.
RD OR RR
42Distribution and link examples
OBS not for traditional case control data
Link Identity ? linear model ? additive scale
43Being bullied, 3 models
glm bullied Island Norway Finland Denmark sex
age, family(binomial) link(logit)
glm bullied Island Norway Finland Denmark sex
age, family(binomial) link(log)
glm bullied Island Norway Finland Denmark sex
age, family(binomial) link(identity)
44Convergence problems
Stop
- If glm does not converge, use
- poisson y x1 x2, irr robust RR
- regress y x1 x2, robust RD
45Association measure, RR
Model
Start with
Hence
46Association measure RD
Model
Start with
Hence
47The importance of scale
Additive scale Absolute increase Females
30-2010 Males 20-1010 Conclusion Same
increase for males and females RD
Multiplicative scale Relative increase Females
30/201.5 Males 20/102.0 Conclusion More
increase for males RR
48Conditional logistic regression
- For
- Matched Case Control data
49Truths and Misconceptions
- Cohort studies
- Exposed and unexposed should be as similar as
possible, except for exposure - Matching removes confounding
- Case-Control studies
- Cases and controls should be as similar as
possible, except for disease - Matching removes confounding
Exposed
Diseased/Cases
Unexposed
Healthy/Controls
50Matching and analysis
- Unmatched (age)
- Ordinary model
- May adjust for age
- May interpret age effect
- Frequency matched (age)
- Ordinary model
- Must adjust for age
- Can not interpret age effect
- One-one matched (age)
- Conditional model
- No effect measure for age
51Data preparation
- Save as tab-delimited in Excel
- Read and fix in Stata
- insheet using file.txt", clear
- mvdecode m ,mv(9)
- gsort id -cc
52Syntax
- Estimation
- clogit y x1 x2, group(id) conditional logistic
- clogit y x1 x2, group(id) or OR instead of coef
- Post estimation
- predict yf, pc1 predict probability
- Manage models
- estimates store m1 save model
53Bivariate analysis
- Loop thru all variables
- foreach var of varlist m
- quietly clogit cc var', group(id) or
- est store var'
-
- Show results
54Multivariable analysis
- Stepwise
- stepwise, pe(0.25) clogit cc m2 m4 m5 m12 m13
m18, group(id) or - Final model
55Stata regression commands
56- Regression with simple error structure
- regress linear regression (also heteroschedastic
errors) - nl non linear least squares
- GLM
- logistic logistic regression
- poisson Poisson regression
- binreg binary outcome, OR, RR, or RD effect
measures - Conditional logistc
- clogit for matched case-control data
- Multiple outcome
- mlogit multinomial logit (not ordered)
- ologit ordered logit
- Regression with complex error structure
- xtmixed linear mixed models
- xtlogit random effect logistic
57Syntax
- Estimation
- regress y x1 x2 linear regression
- logistic y x1 x2 logistic regression
- xiregress y x1 i.x2 categorical x2
- Manage results
- estimates store m1 store results
- estimates table m1 m2 table of results
- estimates stats m1 m2 statistics of results
- Post estimation
- predict y, xb linear prediction
- predict res, resid residuals
- lincom b02b3 linear combination
- Help
- help logistic postestimation