Title: Why STATA
1Why STATA?
You have a Question
2You have the Data
10100001010100100101010 00101011000001100100010 00
010010101001010010011
You need The Answer
You have a Question
3Data analysis software paradigm
HYS or BRFSS
101000010101001001 001010110000011001 000100101010
010100
STATA software
Your Answer
Your Question
4Data analysis software paradigm
DATA
101000010101001001 001010110000011001 000100101010
010100
STATA software
Commands
Output
5OK STATA, really?
6Why not SPSS, SAS, SUDAAN or Excel?
- STATA is cheaper perpetual license
- STATA works easily with survey data (complex
sampling designs) - STATA works fairly easily with data, in general
- I only know STATA
- STATA is cool
7Orientation to STATA
8What You Will Learn
- How to Open and Close STATA
- What STATAs Windows include
- What options are available on the Shortcut Bar
- What options are available in the Drop Down Menus
9Opening STATA
- Like any other Windows app
- Can be opened form start menu
- Shortcut on desktop
- Shortcut on task bar
- Do not double click on a STATA data file to open
STATA it will open STATA, but most likely not
your data file (need to prepare STATAs reserved
memory capacity)
10Closing STATA
- Like any other Window app
- Using menus File Exit
- Click on X in upper right window
- Either option will prompt you to save the work on
your data - NOTES
- Always be careful when saving stata data files
you may be saving changes that you dont want to
keep. To be safe, save it under another file
name and always keep your original.
11Look at STATAs Windows
- Results
- Biggest window (black background)
- Where all the results appear
- Command
- Small window under Results window
- Where you type in your STATA commands
12Look at STATAs Windows
- Review
- Medium window in upper left corner
- Where STATA documents every command you type into
the Command window including erroneous
commands - Variable
- Medium window in lower left corner
- Where all the variables of an open dataset appear
with variable descriptions
13The Shortcut Bar
- Functions like all Windows apps
- Shortcut icons underneath menu bar
- Clicking on a shortcut icon once will activate
- Holding your cursor over the icon tells you what
it is - Not all shortcuts are useful
14The Shortcut Bar (key features)
15The Shortcut Bar (other features)
16Drop Down Menus
- Functions like all Windows apps
- Menu bar is at the top of the window
- Navigation through menu options functions like
all windows apps - Most menus related to STATA commands are more
trouble than they are worth
17Drop Down Menu (key features)
18Drop Down Menus (other features)
19Start to use STATA to look at Healthy Youth Survey
20What You Will Learn
- How to use a Log file (records what you do)
- How to open Data files
- How to save Data files
- How to explore variables
- Generating new variables
- Collapsing and recoding variables
- Labeling variables
21Before we beginHow we will be presenting this
to you
- What is STATA language?
- The stuff we type into the command window to tell
STATA what to do - How we will Learn STATA via examples
- On the slides, STATA language is in
- type-writer font AKA Courier font.
22As we learn STATA, lets think about a research
question
How many students in each grade report they
smoked cigarettes on any days in the past 30
days?
23Before Opening Data Files
- Set STATAs storage capacity first
- Usually setting 100 megabytes is okay
- (Basically reserves memory from the system for
opening data files) - set memory 100000k
- OR
-
- set mem 100m
24Log files Keeping Track of What You Do
- Log files document all your actions in STATA.
- There are 2 types
- .log files - opens in notepad, word pad, MS
Word, other text editors - .smcl files - pronunciation - rhymes with
pickle - opens in STATA only - great for copying
and pasting tables into excel - Log is recommended for general portable
documentation
25Using and Manipulating Log files
- Opening log files
- Click brown book icon in toolbar
- Then, in SAVE dialog window, select .log from
Save as type
26Using and Manipulating Log files
- Closing log files
- click brown book icon in toolbar select Close
log file OR - Type log close
27Opening Data Files
- 3 options
- Two point and click options (The easiest)
- Use menu option
- File Open
- Click folder icon in toolbar
-
- Line command - need filepath and filename
- use filepath\filename, clear
- OKAY, lets open state 2008.dta
28Saving Data Files
- 3 options
- Two point and click options (easiest)
- Use menu option
- File Save As
- Click disk icon in toolbar
- NOTE STATA will ask you if you want to save
over or create a new file - Line command need file path
- NOTE STATA will not ask you if you want to save
over or replace old files if specified - save filepath\filename
29Saving Data Files
- It is important to have a backup dataset and a
working dataset. - You may(will) accidentally save over an old
dataset and permanently change your data - Okay, lets save a working Healthy Youth Survey
data - Save as state 2008 working.dta
30Exploring Variables
- Describing a variable using codebook
- General info on variable, missing values, some
labeling info, datatype -
- Example
-
- codebook d14
31Exploring Variables
- More descriptive information using tab
- Distribution of the values of a variable
(percents) -
- Example
-
- tab d14
32Exploring Variables
- Cant find the variable?
- Use the data dictionary/codebook
- Scroll through the variable window
- Use command
- aorder alphabetizes variable names
- You can search variable list via key words
- lookfor lttype a key wordgt
33What was our research question again?
How many students in each grade report they
smoked cigarettes on any days in the past 30
days?
34Generating, Collapsing Recoding Variables
- Make a variable for tinkering using gen
- Example gen smokers d14
- Note
- Use the tab command to check to see if your new
variable came out like you planned - tab d14 smokers
-
35Now, lets change the coding of smokers
Generating, Collapsing Recoding Variables
Change from codebook d14 Freq. Numeric
Label 26597 1 none 943 2 1-2
days 405 3 3-5 days 301 4
6-9 days 489 5 10-29 days 689
6 all 30 days 922 .
Change to codebook smokers Freq. Numeric
Label 26597 0 No 2827 1 Yes
922 .
36Generating, Collapsing Recoding Variables
- Modifying a tinkering variable using recode
- Example
- recode smokers 10 21 31 41 51 61
-
37Generating, Collapsing Recoding Variables
- Lets check our recode
- Use the tab command to check to see if your new
variable came out like you planned -
- tab d14 smokers
- NOTE New variables are listed at the bottom of
the list in the Variables window -
38Generating, Collapsing Recoding Variables
- Shortcut - tinkering variable using recode
- drop smokers
- gen smokers d14
- shortcut
- recode smokers 10 2/61
-
39Generating, Collapsing Recoding Variables
- NOTE regarding drop/keep
- drop ltvariable namegt - will drop the variable
from your data set no undos - keep ltvariable namegt - will drop all variables
from your dataset except the variable you specify
no undos - drop ltconditional statementgt - will drop all
respondents from dataset that do meet these
criteria no undos - keep ltvariable namegt - will drop all respondents
from dataset that do not meet these criteria
undos -
40Generating, Collapsing Recoding Variables
- Relational operators that are used for if
statements especially handy for creating a
variable from two different variables - gt greater than
- lt less than
- gt greater than or equal to
- lt less than or equal to
- equal
- not equal
- ! not equal
- Also and or
41Generating, Collapsing Recoding Variables
- Using Relational operators
- Example
-
- gen nosmokers 1 if (d14 1)
- replace nosmokers 0 if (d14 gt 1 d14 lt 6)
- NOTE STATA sees missing values as a maximum
value.
42Generating, Collapsing Recoding Variables
- Using Relational operators (continued)
- Example
- gen nosmokers 1 if (d14 1)
- replace nosmokers 0 if (d14 gt 1 d14 lt 6)
-
43Generating, Collapsing Recoding Variables
- Lets check our conditional recode
- Use the tab command to check to see if your new
variable came out like you planned -
- tab d14 nosmokers
-
44Labeling Variables
- Labeling of variable using label variable
- Example
-
- label variable smokers Current cigarette smoker
- NOTE label will appear in codebook variable
results in upper right hand corner. Will also
appear in the variables window info
45Labeling Variables
- Generating a value label using label define
- Example
- label define yesno 1Yes 0No
- NOTE This is a local nametag that can be used
anywhere in your dataset for multiple variables
46Labeling Variables
- Attaching the value label to the variable using
- label value
- Example
- label value smoker yesno
- NOTE can call up all labels label list
47Back to our research question
How many students in each grade smoke cigarettes?
tab smokers grade
48Exercise 1 Generating, Collapsing, Recoding,
and Labeling
49Running Basic Frequencies
50What You Will Learn
- Setting up STATA for survey analysis
- Running basic frequencies
51Preparing Survey Data for Analysis (finally)
- For survey analysis, STATA needs to know about
- 1) weights and 2) design information
- NOTE you should always check to see if STATA
already knows - Here is how
- example svyset
- This command will tell us about weights, stratum
and probability sampling units (psus)
52Setting up for HYS Analysis
- For HYS analysis, your weighting and design
depend on what type of data you have. - Weighting
- For state, county, district and school building
level analysis there is no weighting, so we
create a fake weight that is equal to 1 - gen fakewt1
53Setting up for HYS Analysis
- For HYS analysis, your weighting and design
depend on what type of data you have. - Design information
- For sampled data, psu schgrd
- For example 2008 State sample,
- King, Pierce, Snohomish(All grades),
- Clark(6th 8th)
- Spokane, Thurston(6th)
- For other counties, districts buildings, psu
students - STATA defaults to individual students
54Setting up for HYS Analysis
- Here is how to tell STATA about your data
- For state sample and sampled counties
- svyset pweightfakewt, psu(schgrd)
- For other counties, districts and school
buildings - svyset pweightfakewt
- For ESDs LOOK IN THE WA HYS Data Analysis
Technical Assistance Manual website link on
last slide of Day 3
55Setting up for HYS Analysis
- One last note about STATA and weights and design
information - Strata and psu do not have to be designated in
order for the svyset command to execute. - Thus, excluding these design variables could
yield erroneous standard errors and confidence
intervals.
56Setting up for Analysis
- How to change weighting and design variables
- Clear it all out
- svyset, clear
- Then redefine weighting design variables using
svyset - Notes
- STATA will remember your weighting and design
variable designations as long as the data file is
open - If data are saved after designation, then STATA
will remember your designation next time. - Always type in svyset to find out what has been
designated
57Okay, Some useful survey data analysis commands
58Before we get started, lets use a research
question again
What is the percent of 10th graders who smoke
cigarettes?
59Running Basic Frequencies
- One-way weighted tabulations with svytab
-
- Example
- svytab d14use if grade 10
60Running Basic Frequencies
- Special NOTE re use of if
-
- Example
- svytab d14use if grade 10
NOTE using an if statement in a tabulation
command may yield inaccurate standard errors
depending on sampling design. For HYS, grade was
part of the psu sampling strategy so a
conditional statement here is okay. Normally
this is not recommended for survey data.
61Output Formatting Options
- Output options follow a comma after your
svytab statement - col or row specify the direction of the
proportional tabulation - ci specifies to include asymmetrical confidence
intervals - se specifies to include the standard error
(used for calculating symmetrical confidence
intervals) - obs specifies that numbers of actual
respondents are included in the results
62Output Formatting Options
- There are a number of different codes that you
can include to format your STATA output - per will produce estimates as percents
- format(3.1f) will produce estimates with to
one decimal point
63Running Basic Frequencies
- Lets add some options
-
- Example
- svytab d14use if grade 10,
- col ci se obs per
64Standard error versus confidence interval
- Standard error designated in option of
tabulation command with se - Noted in output as value in parenthesis
- (in crosstabs)
- used to create a margin of error
- multiply standard error by 1.96 to get 95 margin
or error - Can be used to created symmetric confidence
intervals using arithmetic and point prevalence - May cross 0 or 100
65Standard error versus confidence interval
- Confidence Interval designated in option of
tabulation command with ci - Noted in output as values in square brackets
- (in crosstabs)
- Calculated as asymmetric confidence intervals
never cross 0 or 100 - Preferred for comparing confidence interval
overlap to detect differences - Not as easy to communicate to lay-person
66Exercise 2 Survey Analysis Running basic
frequencies