Title: Stata
1CCPR Computing ServicesWorkshop Introduction to
StataJune, 2006
2Outline
- Stata
- Command Syntax
- Basic Commands
- Abbreviations
- Missing Values
- Combining Data
- Using do-files
- Basic programming
- Special Topics
- Getting Help
- Updating Stata
3Stata Syntax
- Basic command syntax
- by varlist
- command varlist exp if exp in range
weighttypeweight , options - Brackets optional portions
- Italics user specified
4Stata Syntax, cont.
- Complete syntax
- by varlist
- command varlist exp if exp in range
weighttypeweight , options - Example 1 (webuse union)
- Stata Command
- .bysort black summarize age if year gt 80,
detail - Results
- Summarizes age separately for different values of
black, including only observations for which year
gt 80, includes extra detail.
5Stata Syntax, cont.
- Complete syntax
- by varlist
- command varlist exp if exp in range
weighttypeweight , options - Example 2 (webuse union)
- Stata Commands
- .generate agelt30 age
- .replace agelt30 1 if age lt 30
- .replace agelt30 0 if age gt 30 age lt .
- Result
- Variable agelt30 set equal to 1, 0, or missing
- Generally exp used with commands generate and
replace
6Basic Commands Load auto data and look at
some vars
- Load data from Statas website
- webuse auto.dta
- Look at dataset
- describe
- Summarize some variables
- codebook make headroom, header
- inspect weight length
7Basic Commands Load auto data and look at
some vars
- Look at first and last observation
- list make price mpg rep78 if _n1
- list make price mpg rep78 if _n_N
- Summarize a variable in a table
- table foreign
- table foreign, c(mean mpg sd mpg)
8Keep/Save a Subset of the Data
- Keep a subset of the variables in memory
- keep make headroom trunk weight length
- List variables in current dataset
- ds
- List string variables in current dataset
- ds, has(type string)
- Save current dataset
- save tempdata/myauto
9Generating New Variables
- Create new variable headroom squared
- generate headroom2 headroom2
- Generate numeric from string variable
- encode make, generate(makeNum)
- list make makeNum in 1/5
- Cant tell its numeric, but look at storage
type in describe - describe make makeNum
10Generating New Variables, cont.
- Create categorical variable from continuous
variable - price is integer-valued with minimum 3291 and
max 15906 - Generate categorical version - Method 1
- generate priceCat 0
- replace priceCat 1 if price lt 5000
- replace priceCat 2 if price gt 5000 price lt
10000 - replace priceCat 3 if price gt 10000 price lt .
11Generating New Variables, cont.
- Generate categorical version of numerical
variable Method 2 - generate priceCat2 price
- recode priceCat2 (min/5000 1) (5000/100002)
(10000/max3) - Compare price, priceCat, and priceCat2
- table price priceCat
- table priceCat priceCat2
12Variable Labels and Value Labels
- Create a description for a variable
- label variable priceCat Categorical price"
- Create labels to represent variable values
- label define priceCatLabels 1 cheap 2 mid-range 3
expensive - label values priceCat priceCatLabels
- View results
- describe
- list price priceCat in 1/10
13Reshape
Wide format Long format
- Wide -gt Long
- reshape long uniqueschool author, i(year session
order) j(count) - Long -gt Wide
- reshape wide author, i(year session order)
j(count)
14A few other commands
- compress - saves data more efficiently
- sort/ gsort
- order
- rename
- more
15Abbreviations in Stata
- Abbreviating command, option, and variable names
- shortest uniquely identifying name is sufficient
- Example
- Assume three variables are in use make, price,
mpg - UN-abbreviated Stata command
- .summarize make price
- Abbreviated Stata command
- .su ma p
- Exceptions
- describe (d), list (l), and some others
- Commands that change/delete
- Functions implemented by ado-files
16Missing Values in Stata 8 and 9
- Stata 8 and later versions
- 27 representations of numerical missing
- ., .a, .b, , .z
- Relational comparisons
- Biggest number lt . lt .a lt .b lt lt .z
- Mathematical functions
- missing nonmissing missing
- String missing
- Empty quote
17Missing Values in Stata - Pitfalls
- Pitfall 1
- Missing values changed after Stata7
- Pitfall 2
- Do NOT
- .replace weightlt200 0 if weight gt 200
- INSTEAD
- .replace weightlt200 0 if weight gt 200
weight lt .
18Combining Data
- Append vs. Merge
- Append two datasets with same variables,
different observations - Merge two datasets with same or related
observations, different variables - Appending data in Stata
- Example append.do
19Combining Data- merge and joinby
- Demonstrate with two sample datasets
- Neighborhood and County samples
- One-to-one merge
- onetoone.do
- One-to-many merge use match merge
- onetomany.do
- Many-to-many merge use joinby
- manytomany.do
20Combining Data
- Variable _merge (generated by merge and joinby)
- Pitfalls
- pitfall_merge1.do Merging unsorted data
- pitfall_merge2.do many-to-many using merge
instead of joinby
21Do-files
- What is a do-file?
- Stata commands can be executed interactively or
via a do-file - A do-file is a text file containing commands that
can be read by Stata - Running a do-file within Stata
- .do dofilename.do
22Do-files
- Why use a do-file?
- Documentation
- Communication
- Reproduce interactive session?
- Interactive vs. do-files
- Record EVERYTHING to recreate results in your
do-file!
23Do-files gt Header, Version Control
- Header
- Include in do-files name, project, project
location, date, purpose, inputs, outputs, special
instructions - Version Control
- include version at top of do-file
- Why?
- Example
- Under version 7, ..a.b..z
24Do-files gt Comments
- Comments
- Lines beginning with will be ignored
- Words between // and end of line will be ignored
- Spanning commands over two lines
- Words between / and / will be ignored,
including end of line character - Words between /// and beginning of next line will
be ignored
25Do-file gt End of Line Character
- Commands requiring multiple lines
- delimit
- This command tells Stata to read semi-colons as
the end-of-line character instead of the carriage
return - Comment out the carriage return with
- / at the end of line and / at the beginning of
next - Comment out the carriage return with ///
26Do-files gt Examples
- webuse auto, clear
- this is a comment
- delimit
- summarize price mpg rep78
- headroom trunk weight
- delimit cr
- summarize price mpg rep78 headroom trunk weight
//this is a comment - summarize price mpg rep78 ///
- headroom trunk weight
- summarize price mpg rep78 /
- / headroom trunk weight
27Saving output
- Work in do-files and log your sessions!
- log using filename
- replace, append
- log close
- Output choices
- .log file - ASCII file
- .smcl file - nicer format for viewing and
printing in Stata
28Saving Output, cont.
- Graphs are not saved in log files
- Use saving option of graph commands
- saving(graph.ext)
- Export current graph
- graph export graph.ext
- Ex graph export graph.eps
- Supported formats
- .ps, .eps, .wmf, .emf .pict
29Example using local macro
- . local mypath "C\Documents and
Settings\MyStata" - . display mypath'
- C\Documents invalid name
- r(198)
- . display C\Documents and Settings\MyStata
- C\Documents invalid name
- r(198)
- . display "mypath'"
- C\Documents and Settings\MyStata
30Example foreach, return, display
- see samplePrograms.do, runLoop
- foreach var of varlist tenure-lnwage
- quietly summarize var'
- local varmean r(mean)
- display "Variable var' has mean varmean "
31Example using forvalues, display
- see samplePrograms.do, runCount
- forvalues counter 1/10
- display counter'
-
- forvalues counter 0(2)10
- display counter'
-
32Example forvalues, generating random variables
- see samplePrograms.do, runRandomGen
- forvalues j 1/3
- generate xj' uniform()
- generate yj' invnormal(uniform())
-
- foreach x of varlist x1-x3 y1-y3
- summarize x'
33Example if/else
- see samplePrograms.do, runIfElse
- foreach var of varlist tenure-ln_wage
- quietly summarize var'
- local varmean r(mean)
- if varmean' gt 10
- display "var' has mean greater than 10"
-
- else
- display "var' has mean less than 10"
-
34Special Topic regular expressions
- webuse auto
- List all values of make starting with a capital
and containing an additional capital - list make if regexm(make, "A-Z.A-Z.")
- AND ending in a number
- list make if regexm(make, "A-Z.A-Z.0-9
")
35Special Topic accessing data in another database
- odbc list
- odbc query testStata
- odbc query testStata
- odbc desc "Summary2006
- odbc load year type session order author1
author2, table("Summary2006") dsn("testStata")
36Special Topic Exporting results using outreg
- User-written program called outreg
- From within Stata, type findit outreg
- Very simple!!
- Basically add one line of code after each
regression to export results - For an example of code, see http//www.ats.ucla.ed
u/stat/stata/faq/outreg.htm
37Getting Help in Stata
- help command_name
- abbreviated version of manual
- search
- search keywords, local
- search keywords, net
- search keywords, all
- findit keywords
- same as search keywords, all
- Search Stata Listserver and Stata FAQ
38Stata Resources
- www.stata.com gt Resources and Support
- Search Stata Listserver
- Search Stata (FAQ)
- Stata Journal (SJ)
- articles for subscribers
- programs free
- Stata Technical Bulletin (STB)
- replaced with the Stata Journal
- Articles available for purchase, programs free
- Courses (for fee)
39Updating Stata
40