Title: P1253296661AfPWK
1INTRODUCTION TO STATA Third group training course
in application of information and communication
technology to production and dissemination of
official statistics 10 May 11July 2007
Gereltuya Altankhuyag, Lecturer/Statistician,
UNSIAP gereltuya_at_unsiap.or.jp
2Objectives
- Gain experience in using STATA to
- Obtain descriptive statistics
- Tabulate
- Create Graphs
- Convert data files
- Create Do-files
- Create Log-files
- Use user written programs - Ado-files
3Method of teaching
- Lectures using PowerPoint slides
- Performance of in-class practical exercises using
real survey datasets - Group assignment
- Presentation of group assignment
- Note Subject to availability of datasets
4Contents
- Resources for learning and using Stata
- Getting started with Stata
- Basic commands to inspect datasets
- Basic commands to create and change variables,
labeling etc. - Basic commands to reorganize datasets
- Advanced commands merging and appending
5Contents
- Basic commands of statistics
- Basic commands of graphics
- Programming using do files
- Use of ado-files
- Creating log-files
- Inputting data from keyboard, a file or
spreadsheet
6(No Transcript)
7Resources for learning and using STATA
- STATA
- Is a statistical package for managing, analyzing
and graphing data. - Has both a command and menu driven interface.
- Is Designed for research.
- Has cross platform compatibility Windows, MacOS,
Unix, Solaris.
8Resources for learning and using STATA
- STATA
- Has three versions
- Small (restricted version) this is not
available for Unix - Intercooled (full version)
- Special Edition STATA/SE
- Newest release Version 9 we will use
-
9Resources for learning and using STATA
Stata/SE Intercooled Stata Small Stata
Speed Fastest Very Fast Fast
No. of variables 32766 2047 99
Observations Memory dependent Memory dependent 1000 (approx)
String variable 244 chars 80 chars 80 Chars
Matrices 11000 x 11000 800 X 800 40 X 40
Version Professional Professional Small computers
10Resources for learning and using STATA
- Manuals
- Getting Started
- Users Guide
- Reference
- The Stata website http//www.stata.com
- The Stata Press website - contains datasets used
throughout the Stata manuals - http//www.stata-press.com
11Resources for learning and using STATA
- Additional subject-specific volumes may be
purchased separately. These include - Longitudinal/Panel Data Reference Manual
- Mata Reference Manual
- Multivariate Statistics Reference Manual
- Programming Reference Manual
- Survey Data Reference Manual
- Survival Analysis and Epidemiological Tables
- Reference Manual
- Time-Series Reference Manual.
12Resources for learning and using STATA
- The Stata listserver an active group of Stata
users communicate over Internet - The Stata journal reviewed papers, regular
columns, user-written software http//www.stata-jo
urnal.com/ - NetCourses offers training via the Inernet
- Books and other support materials
13Resources for learning and using STATA
- Technical support by email, phone or fax
- tech_support_at_stata.com
- To subscribe to Statalist, send email to
- majordomo_at_hsphsun2.harvard.edu
- body of message subscribe statalist
14Resources for learning and using STATA
- Searchable Statalist archives at
- http//www.stata.com/statalist/archive
- This includes requests for programs, solutions
or advice, as well as answers and general
discussions.
15Resources for learning and using STATA
- STATA Features
- Command prompt driven
- Batch mode
- Interactive mode
- Modularity in its nature Stata code can be
shared, reused and easy to write extensions. - Can incorporate survey design into estimation
process.
16Resources for learning and using STATA
- STATA Capabilities
- Elementary and Specialized Statistical Analysis
- Graphics most 2D
- Data Management user-friendly
- Matrix Operations
17Resources for learning and using STATA
- Why STATA?
- Precise estimation for complex surveys
- User-written programs for non-standard estimation
- Excellent tools for panel data analysis
- Many econometric routines
- Command driven (in new version, menu-driven as
well)
18Resources for learning and using STATA
- Why STATA?
- Runs efficiently on many platforms
- Concise and clear documentation (with friendly
technical support) - Cost less to buy
- Contact stata_at_stata.com
- or visit http//www.stata.com
19- Getting Started with STATA
20Getting Started
- STARTING UP
- Click Start ? Programs ? Stata ? StataSE 9
- Alternatively, from Windows Explorer, go to
folder - c\stata9
- Double click
- wstata.exe
21Getting Started
22Getting Started
23Getting Started
- Verifying version and installation of Stata
- Command called verinst
- Syntax verinst
- Result
- verinst
- You are running Stata/SE 9.2 for Windows.
- Stata is correctly installed.
- You can type exit to exit Stata.
24Getting Started
- Comparing updates
- Update
- Stata executable
- folder \\Unitednations\Stata9\
- name of file wsestata.exe
- currently installed 21 Nov 2006
- Ado-file updates
- folder \\Unitednations\Stata9\a
do\updates\ - names of files (various)
- currently installed 21 Nov 2006
- Recommendation
- Type -update query- to compare these dates
with what is available from - http//www.stata.com.
25Getting Started
RESULTS WINDOW results and commands displayed
here
REVIEW WINDOW past commands appear here
VARIABLE WINDOW variable list shown here
COMMAND WINDOW commands typed here
26Getting Started
- If at least one of the 4 Windows has not
displayed, say, VARIABLE WINDOW, click on - Window ? Variables
- or press CTRL6
- You can type in only in Command Window
- You cannot close Results and Command Windows
27Getting Started
- Window Colors
- Click on Prefs ? General Preferences
28Getting Started
- Fonts of Windows
- The fonts or font size may be changed in each
window by clicking the upper left window button
and then clicking on Font.
29Getting Started
- The Command window
- Page Up Steps backwards through the command
history - Page Down Steps forward through the command
history - Tab Auto-completes a partially typed variable
name
30Getting Started
- The Review window
- To enter a command from the Review window
- Click once on a past command to copy it to the
Command window - Double-click on a past command to copy it to the
Command window and execute it
31Getting Started
- The Review window
- Right-clicking on the Review window displays
- Save Review Contents
- Copy Review Contents to Clipboard
- Font
32Getting Started
- The Variables window
- To enter a variable from the Variable window
- Click once on a variable to copy it to the
Command window - Double-click on a variable and the variable will
be copied twice - Right-clicking on the Variables window displays a
menu - Define Notes for Variable varname to open the
Notes dialog for variable varname - Font
33Getting Started
MENU BAR
TOOL BAR
34Getting Started
- Ask participants to open STATA
- Ask participants to open each command of menu bar
and explain it. - Ask participants to point cursor at each command
of tool bar and explain it.
35Getting Started
- In Help Option of Menu/Header bar
- Contents (for beginners unfamiliar with STATA
commands) - Search (for users who know the name of the
command or topic they wish to search)
36Getting Started
- Obtaining Online Help on Commands
- For a user who wants more info on the regress
command, enter - help regress
- or use the Menu bar
- Help ? STATA Command
37Getting Started
- Obtaining Topic Search
- We can do a search with the Menu bar
- Help ? Search
- If you want to learn about regression, type
- search regression
- or about memory management
- search memory
38Getting Started
- Obtaining Net Search
- In the Searchpop-up window,we can also do
aninternet search. - Alternatively, we can issue the command
- net search regression
- to search on regression
39Getting Started
- Four ways of quitting from Stata
- Enter in Command Window
- exit
- Press ALT-F4 keys
- Click on
- File ? Exit/Clear
- Click on Close button (X at upper right hand
corner of Stata window).
40Getting Started
- Reading Pre-Existing Stata Dataset
- (1) When the data set is very large, we may want
to enter -
- set mem 64m
-
- Results
- Current memory allocation
- current
memory usage - settable value description
(1M 1024k) - ----------------------------------------------
---------------------- - set maxvar 5000 max. variables
allowed 1.733M - set memory 64M max. data space
64.000M - set matsize 400 max. RHS vars in
models 1.254M -
----------- -
66.987M - (2) STATA can read only one dataset at a time.
41Getting Started
- Reading Pre-Existing Stata Dataset
- In folder c\intropov\data, we have three Stata
files, suppose we wish to read hh.dta, then
enter - use c\intropov\data\hh.dta
- or alternatively, issue the two commands
- cd c\intropov\data
- use hh
- Stata datasets have extension names as dta
NOTE The Default folder is c\DATA We use the CD
command to change directory
42Getting Started clear
- Deletes all contents ( data, variables, labels)
from the STATA memory - Does not delete any data already saved to the HD
- Does not clear Review window contents
- It does not need any arguments
- Syntax
- clear
43Getting Started clear
- Use of clear command
- cd c\intropov\data
- clear
- use hh
- Or
- use hh, clear or
- use c\intropov\data\hh.dta, clear
44Getting Started -Arithmetic operators
- addition
- - subtraction
- multiplication
- / division
- power
45Getting Started - Relational operators
- gt Greater than
- lt Less than
- gt More than or equal
- lt Less than or equal
- Equal
- Not equal
- ! Not equal
46Getting Started - Logical operators
47Getting Started - Numlist
- Numlist a list of numbers.
- 1/3 three numbers 1, 2, 3
- 3/1 the same three numbers in reverse order
- -8/-5 four numbers -8, -7, -6, -5
- 1 2 to 4 four numbers 1, 2, 3, 4
- 10 15 to 30 five numbers 10, 15, 20, 25, 30
- 1 24 same as 1 2 to 4
- 10 1530 same as 10 15 to 30
- 1(1)3 three numbers 1, 2, 3
- 1(2)9 five numbers 1, 3, 5, 7, 9
- 9(-2)1 five numbers 9, 7, 5, 3, and 1
- 1 2 3/5 8(2)12 eight numbers 1, 2, 3, 4, 5, 8,
10, 12 -
48Getting Started - Syntax
- The basic Stata language syntax is
- by varlist command varlist exp if exp
- in range weight ,options
49Getting Started
- varlist denotes a list of variable names
- exp denotes an algebraic expression
- command denotes a Stata command
- options denotes a list of options. Many
commands take command-specific options. Options
are indicated by typing a comma at the end of the
command, followed by the options you want to use.
For instance sum, details
50Getting Started
- if / by / in
- these are not commands
- associated with commands when a condition needs
to be satisfied
51Getting Started
- if exp - restricts the scope of a command to
those observations for which the value of the
expression is true - if exp - is added at the end of a command with
associated variable if any - Syntax
- command .. if sex male
52Getting Started
- by varlist asks Stata to repeat a command for
each subset of the data for which values of the
variables in the varlist are equal - by varlist - is added before a command followed
by variable name and then command - Syntax
- by sex command .
- Note sort dataset by sex and then run this
syntax.
53Getting Started
- in range restricts the scope of the command to a
specific observation range. - in range - is added at the end of command with
associated variable if any - Syntax
- Command in 1 / 100
54Getting Started
- Command, option and variable names may be
abbreviated to the shortest string of characters - . summarize region, detail
- . sum reg,d
- Stata respects case Stata commands are lowercase
- Summarize, SUMMARIZE and summarize are three
distinct names
55Getting Started - naming
- A name is a sequence of one to 32 letters (A-Z
and (a-z), digits (0-9) and underscores (_). - The first character of a name must be a letter or
an underscore - Not begin variable names with an underscore
- All of Statas buil-in variables begin with an
underscore
56Getting Started - naming
- Stata reserves the following names
_all double long _rc
_b float _n _se
byte if _N _skip
_coef in _pi using
_cons int _pred with
57Getting Started prefix commands
- Prefix commands are used to prefix Stata commands
- An example of prefix command Syntax
- by varlist, option
- by region, sort sum educhead agehead
- region is bys varlist
- Sort is bys option
58Getting Started prefix commands
- Examples of prefix commands in Stata are
Prefix commands Description
by Run command on subset of data
svy Run command and adjust results for survey sampling
stepwise run command with stepwise variable inclusion /exclusion
Capture run command and capture its return code
59Getting Started - weight
- Weight used for
- Estimation of population from a sample
- Compensate under/over representation of HH in a
sample
60Getting Started - weight
- Weight indicates the weight to be attached to
each observation. The syntax of weight is - weightwordexp
- weightword is not Stata commands
- Where weightword is one of
61Getting Started - weight
Weightword Meaning
Weight Default treatment of wieghts
fweight Frequency weights
pweight Sampling weights
aweight Analytic weights
iweight Importance weight
62Getting Started - weight
- Frequency weight (fweight)
- indicates the number of duplicated observations.
Must take integer values. - For instance if fweight associated with an
observation is 5, that means there are 5 such
observations each identical. - Syntax
- command var weightwordweightvar
63Getting Started - weight
- Sampling weight (pweight)
- the inverse of the probability that this is
included observation was sampled. - For instance pweight of 100 indicates that this
observation is representative of 100 subjects - Syntax
- Command varname weightwordweightvar
64Getting Started - weight
- Analytic weight (aweight)
- Inversely proportional to the variance of an
observation (d2/wj). It means that the variance
of jth observation is assumed to be (d2/wj). - Useful when working with data that contain
averages. - Syntax
- command varname weightwordweightvar
65Getting Started - weight
- importance weight (iweight)
- No formal definition available
- Indicates the relative importance of the
observation - Syntax
- Command varname weightwordweightvar
66Now please proceed to perform EXERCISE 1