Title: Social Science Computing Statistical Computing Group Programming in Stata
1Social Science ComputingStatistical Computing
GroupProgramming in Stata
- Tim Cheney
- Senior Programmer/Analyst
2Programming in Stata
Why use programs????? -Soooo tempting to start
blasting away at command line...so why not?
3 4Do-files and Ado-files
- Ado is for perfect programs that will be run
repeatedly. Built-in commands such as regress
and tabulate are implemented as .ado files.
5Dofilology
Do-files are more than a mechanical method of
entering command...it is a moral philosophy which
separates enlightened programmers from the
foolhardy. Nobody likes a null finding, but far
worse is a finding that cannot be reproduced.
Imagine making a sensational discovery and then
having to admit it must have all been a mistake
6 7USING THE DO-FILE EDITOR
- The Do-file Editor lets you submit several
commands to Stata at once. If there is an error
in some command, Stata stops the implementation
of the do-file and reports the error. - IN THEORY
- - Type the commands into the editor
- - Then click the Do button.
- IN PRACTICE
- - When the error is revealed correct your code.
- -ToolsgtDo Selection or ToolsgtDo to Bottom
- -Repeat until all errors discovered and corrected
- -Finally log close and clear and run the whole
thing cleanly for posterity
8- To enter the Do-file Editor click the Do-file
Editor button, or type doedit and press Enter in
the Command window. - A do-file is just an ASCII text file that has the
extension .do in the name. - Dont use a word processor for your do files.
Therein lies only pain and sorrow. MS-Word will
kill you with kindness. - You can use other text editors. The Do-file
Editor of Stata has one advantage over external
editors it is fully integrated with Stata and
commands can be executed directly from the
editor, with a keyboard shortcut or the click of
a button. External editors often have a richer
set of features, including syntax highlighting,
but they lack integration with Stata. - Some notes on text editors for Stata users
http//fmwww.bc.edu/repec/bocode/t/textEditors.htm
l
914. USING THE DO-FILE EDITOR
- The Do-file Editor toolbar
- The Do-file Editor has twelve buttons. If
you ever forget what a button does, hold the
mouse pointer over the button for a moment, and a
box will appear with a description of that
button. - New start a new do-file.
- Open open a do-file from disk.
- Save save to disk the current do-file.
- Print print the current do-file.
- Find search for a string in the current do-file.
- Cut cut the selected text from the current
do-file and copy it to the Clipboard. - Copy copy the selected text to the clipboard.
- Paste paste text from the Clipboard into the
current do-file. - Undo undo the last change.
- Do execute the commands in the current do-file.
- Run execute the commands in the current do-file
without showing any output. - Preview preview the current file in the Viewer.
The current file must be saved first.
10Do file window
11Using The Do-file Editor
- Comments can be added to the do-file as you work.
Just start the respective line with a . - A really long command can be written on several
lines /// - by using three forward slashes to tell stata
/// - that the command continues on the next line
- You can also summon the do-file editor with the
stata command doedit - Saving interactive commands from Stata as a
do-file - While working interactively with Stata, you
may want to rerun the last several commands that
you typed interactively. You can save the
contents of the Review window as a do-file and
open that file in the Do-file Editor.
12Do-file Editor is elusive
- Use the Bring Do-file Editor to foreground icon
to get it back
13Stata Window
14Macros
- A macro is a shorthandone thing standing for
another. For instance, - local list "age weight sex"
- regress outcome list' is the same as
- regress outcome age weight sex
15local or global?
- What is the difference?
- Which one should I use?
16- Global can get you into a mess
- Better to stick with local variables rather than
get in over your head
17The backtick and single quote
Notice that to get the contents of a local
macro, you must type macroname' with an infamous
backtick (aka accent grave) in front and a
trailing right single quote which are two
different characters. A typical keyboard has the
left quote character in the upper left corner of
the keyboard (under the tilde character).
The trailing quote aint a problem for most
peoples.
18Equal sign
You can also use an equal sign local list "age
weight sex" Truncates to 80 characters
evaluates numerical expressions
19Equal sign bad
local patchr "certsu missflag age agesqr male
emergenc transfer drg110-drg501 d1484573 d1485459
d1505459 n150 d1708622 n170 d2098152 n209
d2178622 n217 d2638622 n263 d2878622 n287
d4998051 n499 mic arrc uanginac hyperc astenoc
strokec copdc asthmac psychosc liverc renaldc
renalfc diabc iddiabc alchoc plegiac coagc hemoc
thromboc smokingc pipfc eleflabc cancerc bcancerc
cstenoc addisonc gangrec osteomc
inter1-inter42" with the equals sign we are
limited to 80 characters without, to 67,784
characters
20Equal sign good
- . local i 10
- . local j 10 i'
- . di "j'"
- 10 10
- . local j 10 i'
- . di "j'"
- 20
21Fruitful Loops
22Loop Syntax - foreach
. The syntax for foreach is foreach lname in
of listtype list Stata commands referring
to lname where lname is the name of the new
local macro and listtype is the type of list on
which you want to operate.
23Loop syntax forvalues and while
- The syntax for forvalues is
- forvalues lname range Stata commands
referring to lname where lname is the name of
the new local macro and range specifies the range
of values over which you want to operate. The
syntax for while is - while exp ...
24Foreach Simple Example
I had a variable called orgcat with values of 0,1
or 2 that I use as a linear predictor. I wanted
to generate dummy variables. Normally I would
just do tab orgcat, gen(orgdum) But that gives me
orgdum1 for orgcat0, orgdum2 for orgcat1, etc.
and I wanted orgdum0 for orgcat0, etc.
25Foreach Simple Example
foreach num of numlist 0/2 gen
orgdumnum'(orgcatnum') if orgcat !. label
var orgdumnum' "Dummy for orgcatnum' Could
also be done as forvalues num 0(1)2 ...
26Foreach Loop over variables
- Had hospital data with 28 averaged nurse-reported
scales. Needed to find score each hospital by
number of items above the median
27Foreach Loop over variables
- local affairs "nrsgov policy advance admnlis
dirnrs develop commit nrsexec" - local foundtns "quality precep nrsemod samenrs
nrsphil carepln stndrds cntined nrscomp" - local mnals "headnrs headsup superv praise"
- local sra "staff enough support problem"
- local nprelate "teamwrk drnrs jntprac"
- local nwi "affairs' foundtns' mnals' sra'
nprelate'"
28Foreach Loop over variables
- Generate medians for each item
- foreach x in nwi
- egen median_x' median(x')
- gen x'_above 1(x'gtmedian_x')
- label var x'_above "x' above median"
-
29Foreach Loop over variables
- create a variable that has the count of
items above median - gen items0
- foreach var of varlist nwi
- replace items items var_above
-
30Levelsof
- levelsof factor, local(levels)
- foreach l of local levels
- di "-gt factor l'"
- whatever if factor l'
-
31Using results of Stata commands
- Stata's statistical commands save calculated
results such as numbers r(), e(), or in built-in
vectors like _b so you can access them when you
need them (see R saved results). You can also
use the return list and ereturn list commands
after running a command to see a listing of saved
results.
32Results in r()
- . tab mpgcat foreign, chi
- 2
- quantiles Car type
- of mpg Domestic Foreign Total
- -------------------------------------------
- 1 33 5 38
- 2 19 17 36
- -------------------------------------------
- Total 52 22 74
- Pearson chi2(1) 10.2681 Pr 0.001
33return list
- . return list
- scalars
- r(N) 74
- r(r) 2
- r(c) 2
- r(chi2) 10.26813172207909
- r(p) .0013534779223182
34Results in e()
- . regress mpgcat foreign
- Source SS df MS
Number of obs 74 - -------------------------------------------
F( 1, 72) 11.60 - Model 2.56515782 1 2.56515782
Prob gt F 0.0011 - Residual 15.9213287 72 .221129565
R-squared 0.1388 - -------------------------------------------
Adj R-squared 0.1268 - Total 18.4864865 73 .253239541
Root MSE .47024 - --------------------------------------------------
---------------------------- - mpgcat Coef. Std. Err. t
Pgtt 95 Conf. Interval - -------------------------------------------------
---------------------------- - foreign .4073427 .1195986 3.41
0.001 .1689271 .6457582 - _cons 1.365385 .0652111 20.94
0.000 1.235389 1.495381 - --------------------------------------------------
----------------------------
35Ereturn list
- . ereturn list
- scalars
- e(N) 74
- e(df_m) 1
- e(df_r) 72
- e(F) 11.6002481014011
- e(r2) .138758536784853
- e(rmse) .4702441545405587
- e(mss) 2.565157815157823
- e(rss) 15.92132867132867
- e(r2_a) .1267968497957537
- e(ll) -48.15444954754827
- e(ll_0) -53.68152319278958
36Ereturn list
- macros
- e(title) "Linear regression"
- e(depvar) "mpgcat"
- e(cmd) "regress"
- e(properties) "b V"
- e(predict) "regres_p"
- e(model) "ols"
- e(estat_cmd) "regress_estat"
- matrices
- e(b) 1 x 2
- e(V) 2 x 2
- functions
- e(sample)
37Parameter estimates _b vector
- . matrix list e(b)
- e(b)1,2
- foreign _cons
- y1 .40734266 1.3653846
- . local foreign _bforeign
- . display foreign'
- .40734266
38Post
39Not that PostCan you spot the typo?
40Post and Postfile
- You can save results into a different dataset as
you accumulate them. This is would be useful in
applications like bootstrapping and simulation.
This is called posting.
41Post and Postfile
- postfile mysim b lb ub using simres forvalues
i 1/100 - / insert code to construct a sample /
- / insert code to calculate statistics /
- post mysim (b) (lb) (ub)
-
- postclose mysim
42Finito
- Please fill out your evaluations
- Leave your e-mail address if you want to receive
a copy of the lectures from Stata Netcourse 151
which cover all the preceding materials and much
more in great depth.