STATA Lab: EP521

About This Presentation

Title:

STATA Lab: EP521

Description:

how much is the risk of CHD elevated if we have high blood pressure ... What format was used? Verify that you have indeed save the Stata data file ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 13

Provided by: raycb

Category:

more less

Transcript and Presenter's Notes

Title: STATA Lab: EP521

1
STATA Lab EP521 Session 1 Exploring Data
Ray Boston boston_at_vet.upenn.edu Room 604
Blockley 610 925 6557
2
Problem Woodward presents the following table
(Table 2.9, p. 48) relating to sex versus smoking
status in the Scottish Heart Health Study. Adapt
the information in this table for analysis with
STATA
Variables sex, smoker, count
Coding
smoker 0, non-smoker 1, smoker sex 0, female
1, male count actual cell count
3
The data can be entered into STATA via the data
editor
Label the values of sex and smoker so that our
tables make sense
. label define smlabel 0 "Non smoker" 1 " Smoker
" . label define selabel 0 "Female" 1 " Male
" . label val sex selabel . label val smoker
smlabel
Note that cell counts, and NOT margins are
entered into STATA
4
Label the variable count
. label var count "Cell count"
For some preliminary limbering lets explore the
data as it stands
. list sex smoker count
1. Female Smoker 1562 2.
Male Non smoker 2241 3. Female
Non smoker 2259 4. Male Smoker
2279
Why wasnt count labeled like sex, and smoker?
We should now save the table as a file
. save table 2_9 Woodward.dta",replace
Where was the data saved? cd Why did we
include the replace option? pre-existence Why do
we refer to replace as an option? , Why did we
use quotes () around the file name? space What
format was the data saved in? .dta
5
Lets see a table of this data
. table sex smoker fwecount, row
col ---------------------------------------------
- smoker
sex Non smoker Smoker
Total -------------------------------------------
-- Female 2,259 1,562
3,821 Male 2,241 2,279
4,520 Total 4,500
3,841 8,341 --------------------------------
--------------
Lets see how we recall the coding schemes .. why
would this be needed?
. codebook sex sex -------------------------------
-------------------------------- (unlabeled)
type numeric (byte)
label selabel range 0,1
units 1 unique
values 2 coded missing 0 /
4 tabulation Freq. Numeric
Label 2 0
Female 2 1
Male
6
We will explore this data together using the
Stata command sequence which follows

First some EXTREMELY important points
In practice you will ALWAYS build your
statistical exploration of data
using command sequences such as we now
demonstrate
Why?
The nature of the commands in the command
sequence is ALWAYS
retained on your computer in a disk file,
usually close to the dataset
(table 2_9 Woodward.dta) for which it was
developed.
Why?
Commands are stored as ordinary text in files
called do files
Why?
Stata has a special editor, the do file editor,
for the creation, and
editing of do files.
Why?

7
use "C\Stata\EP521\Epi 521 04\Session 1\table
2_9 Woodward.dta",clear Information about the
raw data correctness/screening list codebook desc
ribe summarize summarize sex smoke
fwecount label define smlabel 0 "Non smoker"
1 " Smoker " label define selabel 0 "Female" 1
" Male " label val sex selabel label val smoker
smlabel list If we want to copy the table to
Excel Select, and Edit copy table, and
Paste the following table list, nolabel noobs
clean codebook inspect describe Some tables
describing the data tabulate sex fwecount,
su(smoke) mean table sex fwecount, c(mean
smoke freq) format(7.2f) tabulate sex smoke
fwecount, chi table sex smoke fwecount, row
col tabstat smoke fwecount, s(mean sd sem N)
by(sex) long Present some simple graphs of
this data preserve collapse smoke fwecount,
by(sex) gen pos3(sex1)
Get the data into Stata
Screening the input using list
describe summarize codebook inspect,
and table variations
Preparing to graph
8
scatter smoke sex, c(l) ml(sex) more scatter
smoke sex, c(l) ml(sex) mlabv(pos) more Now for
adjustments required by Stata 8 graphics
syntax d scatter smoke sex, c(l) ml(sex)
mlabv(pos)
title("Smoking Proportion By Sex")
ytitle(" ") ylabel(,angle(0)) d cr more
gr7 requests a Stata 7 type graph You establish
Stata 7 graph preferences using 'oldgprefs' gr7
smoke sex, c(l) s(sex) xlabel(0 1) ylabel
l1("Smoking Proportion By Sex") more Let's
determine the malefemale risk ratio for
smoking di "Risk ratio " max(smoke1,smoke2)/
min(smoke1,smoke2) restore Two alternate
ways of looking at the data - Risk perspective cs
smoke sex fwecount poisson smoke sex
fwecount, irr nolog Using scalars let's
calculate the malefemale odds ratio for
smoking gsort sex -smoke scalar
prob_femalecount1/(count1count2) scalar
odds_female prob_female/(1-prob_female) scalar
prob_malecount3/(count3count4) scalar
odds_maleprob_male/(1-prob_male) scalar
odds_ratioodds_male/odds_female scalar list
_all Two alternate ways of looking at the data
- Odds perpsective cc smoke sex fwecount logit
smoke sex fwecount, or nolog
Stata 8 Graphing commands
Stata 7 Graphing command
Manual rr calculation
Two other ways of determining risk ratio - rr
Manual or calculation
Two other ways of determining odds ratio - or
9
An exercise to get you started using Stata
productively on your own
10
The following table is from Kahn Sempos (p. 81)
and reflects a distillation of some information
extracted from the Framingham study.
Ultimately we would like to use these numbers to
possibly tell us to what degree blood pressure
elevation disposes us to CHD what is the overall
risk for CHD amongst study participants in the
table how much is the risk of CHD elevated if we
have high blood pressure
11
Getting the CHD data into STATA and naming the
variables. What do we mean by naming the
variables?
12
Perform the following tasks Screen the data
entered to confirm its correctness How could you
generate the margins to add confidence here? Do
it. Label the variables appropriately. What
constitutes appropriate labeling? Save the Stata
data file. Where did you save it? What format
was used? Verify that you have indeed save the
Stata data file Perform tests to verify that you
have correctly prepared your data Tables Reprodu
ce the table in which the problem is first
introduced Tabulate the proportion of subjects
with CHD by blood pressure grouping Add standard
error estimates to this table Are the proportions
with CHD different by blood pressure
group? Graphs Collapse the data into
proportions with CHD, by blood pressure
group Produce a simple Stata 8 graph of CHD
proportion against blood pressure Add features to
your graph to make it publication ready Produce a
Stata 7 graph of the same data which was easier?

Write a Comment

User Comments (0)