Stata Seminar

About This Presentation

Title:

Stata Seminar

Description:

... if we wanted to have a look at the difference in income between women ... Graph the distribution might give us a quick and very informative look over the data. ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 33

Provided by: suss

Category:

more less

Transcript and Presenter's Notes

Title: Stata Seminar

1
Stata Seminar

Session 1
Francisco Jose Gonzalez Carreras
fjg23_at_sussex.ac.uk

2
Source

This is the source used. The sessions will be
First session Hands on session. (Chapter 1)
Second session Grammar of Stata. (Chapter 3)
Third session. Creating and changing variables.
(Chapter 5)
Fourth session. Charts and Linear Regression.
(Chapters 6 and 8).
This is the 2005 edition, there is a new one
forthcoming.
Not in library but could be borrowed by
interlibrary loan (need to pay 2, though)

3
Starting Stata

Download the data from
http//www.stata-press.com/data/kk.html kk.zip
file
To start the session Start gt All programs gt
Intercooled Stata.

4
Screen
Pop-Up menu
Past commands appear here
Results appear here
Working directory displayed here
Variable list displayed here
Commands typed appear here
5
Stata Screen

Change the default windows
Right click the mouse in the results screen and
you can change the font of the results windows.
You can also move windows around
If something was changed and you want to restore
the original settings, Pop Up menu gt Prefs gt
Manage Preferences gt Load preferences gt Factory
settings (In version 10 you have to go Pop Up
menu gt Edit gt Preferencesgt and the rest is the
same)

6
Analysis Input commands

Type d in the command window and press Return
d is the abbreviation of describe, a command that
describes file.
The number of observations and variables is zero
we have not loaded any file.
Memory is of working memory being used. Data
are loaded in the RAM memory.
Sorted by no sorting criteria.
We have not loaded any data, lets load a file.

7
Analysis Directory

Command cd change directory. We have to move to
the directory where the files are. Type cd
c\data\kk to move. This allow us to name the
file without having to write the whole path of
the file.
Dir will show all the files containded in the
file.
See that more- appears click enter and you
advance a file, click space bar and you will see
the next screen. Click q to stop results coming.
You can use dir .dta, which will show you all
the files with dta extension.

8
Analysing data loading data

To use a file, type
use data1
This command loads stata file into the working
memory (RAM).
Default memory size is 1 mega and sometimes you
will need to set a bigger capacity. For big files
set memory would be needed
Stata assumes it is a .dta file.
Then type describe

9
Analysing data Variables Observations
10
Analysing data Variables Observations

The file is a subsample of the German
Socioeconomic Panel (GSOEP). It is a survey
taking place since 1984 in which the same
households, families, individuals are interviewed
once a year.
In the screen
Observations are 3,340 and the nr of variables
47. This means that 47 pieces of information are
stored for each individual.
The first, persnr is the first variable and does
contain only the code that is unique for each
individual. Sometimes you will need to create
this from other information.
Storage type has to do with the size and is
important to save resources (more on this next
sessions)
Labels is a brief description of the variable
(more coming)

11
Analysing data Looking at data

We have too many observations, so we get rid of
some. Type
drop ymove-np9507 (get rid of the range of the
variables included in the command, keep gets rid
of the variables not included)
To have a look all observations type list (Then
q to stop more screens!!)
Too much information, not operational. We will
reduce it.
Lets focus on the second Man, was born in 1971,
household head, single..
Missing value . was not questioned of the
person did not respond.

missing value
12
Analysing data Looking more carefully

Listing data in this fashion is not useful so we
will be more specific.
We could type list to list just a number of
variables. Type list gender income
but again we have more than 3,000
observations!!.
To narrow down our look we will use first the in
qualifier. This qualifier limits by the position
of the observation in a particular order. Type
sort income
list gender income in 1/10

13
Analysing data looking more carefully

What does these commands do????
sort income sort the data in ascending order, so
the person with the lowest income is the first.
This establish the order.
list gender income in 1/10 list the first 10
observations. It will show gender and income
corresponding to the ten observations
(respondents) with the lowest income (remember
we sorted in ascending order by income)
What would this do?
list gender income in 2/4
Individuals from the second to the fourth

14
Analysing data Summary statistics

To obtain summary statistics about income, type
Summarize income
The information about is the nr. of observations
used to calculate the arithmetic mean, the
standard deviation, the minimum and the maximum.
You have only 3,034 observations because some of
them were set to missing (.) and they are not
taken into account when doing the calculations
You can summarize a list of variables simply by
adding more in the list. If you want to summarize
all the variables, just type summarize (also sum
as an abbreviation would work)

15
Analysing data if qualifier

What if we wanted to have a look at the
difference in income between women and men? We
use the if qualifier and summarize data
conditional on the variable meeting the if
condition. Type
summarize income if gender1
summarize income if gender2
The first summarize only the observations in
which gender is equal to a particular value. 1
refers to males and 2 to females in this survey.
See the difference in income.
The double equal is necessary, otherwise it will
show invalid syntax

16
Analysing data missing values

Men seem to earn more than women. But these
averages are calculated taking into account those
observations with income0. These might be to be
more frequent among women so in order to compare
only individuals with positive income we can
either type
sum income if gender1 incomegt0
sum income if gender2 incomegt0
or recode 0 incomes to . (missing) so that
they will not be taken into account when
calculating the average. Type
mvdecode income, mv(0.a)
sum income if gender1
sum income if gender2

17
Analysing data by prefix

A prefix is a command that is written in front of
the actual stata command.
It has two parts
prefix itself, by
variable list, in our example only gender.
Structure would be
prefix command actual command
In the case of by, the actual command is repeated
for all the categories in the prefix list or
bylist.
A condition is that the data have to be sorted by
the variables in the bylist
Type
sort gender
by gender summarize income

18
Analysing data missing recoding by prefix
Same mean
19
Analysing data Command options

Options are command specific, unlike in and if
qualifiers or by prefix.
They are written after the actual command,
following a comma.
In the case of summarize, the detail option
will give much more information about the income
distribution skewness, kurtosis
Type
sum income, detail

Median
Moments
20
Analysing data Frequency tables

The command that generates frequency tables is
tabulate (or tab), which has to be followed by
one or two variables, generating one way
frequency table or two way frequency table.
Type
tabulate gender
tabulate emp gender
First variable is the row variable, second
variable is the column variable.
Options for this command are row or column which
return the row and column percentages

21
Variable labels and value labels

See the differences between the first and the
second table. In the second we only have the
values that correspond to the different types of
employment status.
label of the variable is a brief description of
the variable. Lets change it typing (does not
matter if it already had one)
label variable emp Status employment in 97
label values is the label for the different
values. In income, the label for value 1 was
male and the label for value 2 was female.
This variable has seven different values. Lets
label the values. Type (not breaking the line)
label define emplb 1 Full time 2 Part time 3
Retraining gt 4 irregular 5 not working 6
military service 7 gt gtunemployed , modify
label values emp emplb
Labels are stored in emplb. They can be assigned
to any other variable with same values. Let
tabulate again to see the changes
tab emp gender, column nofreq

22
Variable labels and value labels
was Employment Status 1997
value labels created
23
Analysing data Graphs

Part-time employment or unemployment is more
frequent among women. Maybe income differences
are due to employment status.
Graph the distribution might give us a quick and
very informative look over the data. Type
graph box income, over(emp)
To get a box-and-whisker plot, this result in a
graph with one distribution graph over each group
of emp ,(over (emp))
Outliers are the dots. Income are skewed for all
subgroups. Median for full time is higher than
for the rest. If there are relatively more part
time women represented, we might think that
income inequality could be due to division of
labor within the couples than to gender
discrimination. We first must control for
employment status

24
Analysing data graphs
Outliers
Third quartile
Median
First quartile
employment status
25
Getting help

How to find out about the effects of gender and
employment status on income?. Regression
analysis. How to do it? Lets have a look at the
help
command search looks in all stata resources some
topic that might be linked to your search. Type
search Linear Regression
search model
search OLS
Also you can use the help command to get
information about a command (now that we know
that we should use regress). Type
help regress
You find the syntax, explanation and description
of the available options (options, as formerly
said are command specific)
Pop up menu Help gt Search or Help gt Stata
command

26
Getting Help
27
Analysing data Recoding variables

We have the dependent variable, income, and two
independent variables gender and employment
status.
Gender dichotomous variable. They conventionally
take the values of 0 or one in regressions. We
recode the variable to make men 1, women 0.
Type
generate men1 if gender1
replace gender0 in gender2
Create a variable 1 if gender1 and missing
otherwise. Then replace with 0 those missing
values that meet the criteria of gender2
With employment status we will do something
similar. This variable is not dichotomous. We
will do just
generate fulltime1 if emp1
replace fulltime 0 if emp2
because the analysis will be limited to full
time/part time.

28
Analysing data Linear Regression

We will run a basic linear regression with the
data at hand. We saw that we needed to use
regress. We type regress followed by the
dependent variable followed by the independent
variables. Type
regress income men fulltime
Interpretation
average monthly income for individuals with
income0 and fulltime0 (part time women
employees) is 965.
female full time workers earn on average 806
more.
independent of full time/part time, men earn on
average 451 more than women.
Therefore, income inequality cannot be explained
by the higher proportion of female part time
workers in the data file.

29
Analysing data Linear Regression
30
Do files

To reproduce the results in your session do
files.
Text file where you save your commands in order
to store your work sessions.
Type
doedit
Opens the do file. The first line establish the
version so that the do file can be run with any
future version. We have written the commands that
were necessary to do the regression as above.
Once you have copied the commands File gt Save as
gt an1.do in the current directory. Next type
do an1.do
You can run all the commands again!!!!

31
Do files
32
Exiting Stata

Once you have saved your session or your work in
your do file, it is better to leave stata without
saving changes.
Changes in do files are easy do to, changes in
the original database might not be possible to
undo. HAVE ALWAYS MORE THAN ONE COPY OF THE
ORIGINAL DATABASEjust in case.
If you want to save changes, save them in a new
file, typing, for instance
save mydata
Then you can exit Stata exit, clear

Write a Comment

User Comments (0)

About PowerShow.com

Stata Seminar - PowerPoint PPT Presentation

Stata Seminar

... if we wanted to have a look at the difference in income between women ... Graph the distribution might give us a quick and very informative look over the data. ... – PowerPoint PPT presentation