Stata 3, Regression - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Stata 3, Regression

Description:

Estimate association between outcome and exposure adjusted for other covariates. Prediction ... mlogit multinomial logit (not ordered) ologit ordered logit ... – PowerPoint PPT presentation

Number of Views:616
Avg rating:3.0/5.0
Slides: 51
Provided by: hest
Category:

less

Transcript and Presenter's Notes

Title: Stata 3, Regression


1
Stata 3, Regression
  • Hein Stigum
  • Presentation, data and programs at
  • http//folk.uio.no/heins/

2
Agenda
  • Linear regression
  • GLM
  • Logistic regression
  • Binary regression
  • (Conditional logistic)

3
Linear regression
  • Birth weight
  • by
  • gestational age

4
Regression idea
5
Model and assumptions
  • Model
  • Assumptions
  • Independent errors
  • Linear effects
  • Constant error variance

6
Association measure RD
Model
Start with
Hence
7
Purpose of regression
  • Estimation
  • Estimate association between outcome and exposure
    adjusted for other covariates
  • Prediction
  • Use an estimated model to predict the outcome
    given covariates in a new dataset

8
Adjusting for confounders
  • Not adjust
  • Cofactor is a collider
  • Cofactor is in causal path
  • May or may not adjust
  • Cofactor has missing
  • Cofactor has error

9
Workflow
  • Scatterplots
  • Bivariate analysis
  • Regression
  • Model fitting
  • Cofactors in/out
  • Interactions
  • Test of assumptions
  • Independent errors
  • Linear effects
  • Constant error variance
  • Influence (robustness)

10
Scatterplot
11
Syntax
  • Estimation
  • regress y x1 x2 linear regression
  • xi regress y x1 i.c1 categorical c1
  • Post estimation
  • predict yf, xb predict
  • Manage models
  • estimates store m1 save model

12
Model 1 outcomeexposure
13
Model 2 Add counfounders
Estimate association m1m2
Prediction m2 is best
14
Dummies
Assume educ is coded 1, 2, 3 for low, medium and
high education
Choose low educ as reference
Make dummies for the two other categories generat
e medium(educ2) if educlt. generate high
(educ3) if educlt.
15
Interaction
Model
Start with
Hence
16
Model 3 with interaction
17
Test of assumptions
  • Predict y and residuals
  • predict y, xb
  • predict res, resid
  • Plot resid vs y
  • independent?
  • linear?
  • const. var?

18
Violations of assumptions
  • Dependent residuals
  • Mixed models xtmixed
  • Non linear effects
  • gen gest2gest2
  • regress weigth gest gest2 sex
  • Non-constant variance
  • regress weigth gest sex, robust

19
Measures of influence
Remove obs 1, see change remove obs 2, see change
  • Measure change in
  • Outcome (y)
  • Deviance
  • Coefficients (beta)
  • Delta beta, Cooks distance

20
Points with high influence
lvr2plot, mlabel(id)
21
Added variable plot gestational age
avplot gest, mlabel(id)
22
Removing outlier
23
Influence
24
Final model
Give meaning to constant term
sum gest / find smallest value / generate
gest2gest-204 / smallest gest204 / generate
sex2sex-1 / boys0, girls1 / regress weight
gest2 sex2 / final model / estimates store m4
25
Logistic regression
  • Being bullied

26
Model and assumptions
  • Model
  • Assumptions
  • Independent residuals
  • Linear effects

27
Association measure, Odds ratio
Model
Start with
Hence
28
Syntax
  • Estimation
  • logistic y x1 x2 logistic regression
  • xi logistic y x1 i.c1 categorical c1
  • Post estimation
  • predict yf, pr predict probability
  • Manage models
  • estimates store m1 save model
  • est table m1, eform show OR

29
Workflow
  • Bivariate analysis
  • Regression
  • Model fitting
  • Cofactors in/out
  • Interactions
  • Test of assumptions
  • Independent errors
  • Linear effects
  • Influence (robustness)

30
Bivariate
Generate dummies gen Island (country2) if
countrylt. gen Norway (country3) gen
Finland (country4) gen Denmark (country5)
31
Model 1 outcome and exposure
Alternative commands
xilogistic bullied i.country use xi i.var for
categorical variables xilogistic bullied
i.country , coef coefs instead of
OR's xilogistic bullied i.country if sex!.
age!. do if sex and age not missing
32
Model 2 Add confounders
33
Interaction
Model
Start with
Hence
34
Model 3 interaction
35
Test of assumptions
  • Linear effects (of age)
  • findit lincheck search and install
  • lincheck xilogistic bullied age I.country sex

36
Points with high influence
estimates restore m2 restore best model predict
p, p probability (mu in our notation) predict
db, db delta-beta (one value, not one per
estimate) scatter db p delta-beta plot
37
Removing 2 observations
Conclusion Robust results
38
Generalized Linear Models
  • Being bullied

39
Designs and measures
Models   Measures
GLM RR, RD, OR
Survival   Rate Ratio
40
Generalized Linear Models, GLM
Linear regression
Logistic regression
Poisson regression
41
GLM Distribution and link
  • Distribution family
  • Given by data
  • Influence p-value, CI
  • Link function
  • May chose
  • Shape (link-1)
  • Scale
  • Association measure

Normal Binomial Poisson
Identity Logit Log
Additive Multi. Multi.
RD OR RR
42
Distribution and link examples
OBS not for traditional case control data
Link Identity ? linear model ? additive scale
43
Being bullied, 3 models
glm bullied Island Norway Finland Denmark sex
age, family(binomial) link(logit)
glm bullied Island Norway Finland Denmark sex
age, family(binomial) link(log)
glm bullied Island Norway Finland Denmark sex
age, family(binomial) link(identity)
44
Convergence problems
Stop
  • If glm does not converge, use
  • poisson y x1 x2, irr robust RR
  • regress y x1 x2, robust RD

45
Association measure, RR
Model
Start with
Hence
46
Association measure RD
Model
Start with
Hence
47
The importance of scale
Additive scale Absolute increase Females
30-2010 Males 20-1010 Conclusion Same
increase for males and females RD
Multiplicative scale Relative increase Females
30/201.5 Males 20/102.0 Conclusion More
increase for males RR
48
Conditional logistic regression
  • For
  • Matched Case Control data

49
Truths and Misconceptions
  • Cohort studies
  • Exposed and unexposed should be as similar as
    possible, except for exposure
  • Matching removes confounding
  • Case-Control studies
  • Cases and controls should be as similar as
    possible, except for disease
  • Matching removes confounding

Exposed
Diseased/Cases
Unexposed
Healthy/Controls
50
Matching and analysis
  • Unmatched (age)
  • Ordinary model
  • May adjust for age
  • May interpret age effect
  • Frequency matched (age)
  • Ordinary model
  • Must adjust for age
  • Can not interpret age effect
  • One-one matched (age)
  • Conditional model
  • No effect measure for age

51
Data preparation
  • Save as tab-delimited in Excel
  • Read and fix in Stata
  • insheet using file.txt", clear
  • mvdecode m ,mv(9)
  • gsort id -cc

52
Syntax
  • Estimation
  • clogit y x1 x2, group(id) conditional logistic
  • clogit y x1 x2, group(id) or OR instead of coef
  • Post estimation
  • predict yf, pc1 predict probability
  • Manage models
  • estimates store m1 save model

53
Bivariate analysis
  • Loop thru all variables
  • foreach var of varlist m
  • quietly clogit cc var', group(id) or
  • est store var'
  • Show results

54
Multivariable analysis
  • Stepwise
  • stepwise, pe(0.25) clogit cc m2 m4 m5 m12 m13
    m18, group(id) or
  • Final model

55
Stata regression commands
56
  • Regression with simple error structure
  • regress linear regression (also heteroschedastic
    errors)
  • nl non linear least squares
  • GLM
  • logistic logistic regression
  • poisson Poisson regression
  • binreg binary outcome, OR, RR, or RD effect
    measures
  • Conditional logistc
  • clogit for matched case-control data
  • Multiple outcome
  • mlogit multinomial logit (not ordered)
  • ologit ordered logit
  • Regression with complex error structure
  • xtmixed linear mixed models
  • xtlogit random effect logistic

57
Syntax
  • Estimation
  • regress y x1 x2 linear regression
  • logistic y x1 x2 logistic regression
  • xiregress y x1 i.x2 categorical x2
  • Manage results
  • estimates store m1 store results
  • estimates table m1 m2 table of results
  • estimates stats m1 m2 statistics of results
  • Post estimation
  • predict y, xb linear prediction
  • predict res, resid residuals
  • lincom b02b3 linear combination
  • Help
  • help logistic postestimation
Write a Comment
User Comments (0)
About PowerShow.com