Title: The Integrated Public Use Microdata Series database
1Lab 1 Background on the IPUMS and SPSS
IPUMS www.ipums.org
The Integrated Public Use Microdata Series
database
2Lab 1 Introduction to the datasets
? Who uses IPUMS?
- What research is IPUMS best for?
- Other IPUMS-like datasets
- Getting and using the data
3Census Samples Included in the IPUMS
4WHAT ARE MICRODATA?
Individual-level data every record
represents a separate person all of their
individual characteristics are recorded
users must manipulate the data themselves
Different from aggregate/summary/tabular data
a disability table from www.factfinder.census.g
ov an occupation table from a published
census volume from the library
51930 Census Population Schedule, made public
April 2002
6Raw Census Microdata from IPUMS
7IPUMS Data Structure
Household record (shaded) followed by a person
record for each member of the household
For each type of record, specific columns
correspond to different variables
8The Advantages of Microdata
? Combination of all of a
persons characteristics
? Characteristics of everyone with
whom a person lived
? Freedom to make any table you need
? Freedom to make models to look at
multivariate relationships
9INTEGRATION
What the IPUMS actually does to the original
census samples
10IPUMS Translation Table for RACE
11IPUMS Translation Table for RELATIONSHIP
12IPUMS Documentation Farm Status Variable
13Additional ways in which IPUMS improves the
original samples
? Additional documentation, including
all enumeration forms and instructions ?
Consistent occupation/industry classifications ?
Consistent metropolitan classifications ?
Constructed family variables ? Locator
variables for spouse and parents
14Lab 1 Introduction to the datasets
? Who uses IPUMS?
- What research is IPUMS best for?
- Other IPUMS-like datasets
- Getting and using the data
15Quantity of IPUMS Data Downloaded
16Who uses the data?
17How do people get IPUMS data
18Lab 1 Introduction to the datasets
? Who uses IPUMS?
- What research is IPUMS best for?
- Other IPUMS-like datasets
- Getting and using the data
194 Key Strengths of the Census Microdata Samples
20Limitations of the Census Microdata Samples
1-in-100 samples (1-in-20 for 1970-2000)
Too small to answer some questions
Decennial
Any historical analysis must use 10-year gaps
Cross-sectional data
Not longitudinal
Need knowledge of a statistical package
21What type of question is IPUMS best suited for?
- Studies that do not need to identify geographic
areas of less than 100,000 after 1940 (e.g.,
cannot identify Clemson, SC. Can identify a
group of several counties of which Clemson is a
part). - Subjects that are likely to deal with at least
10,000 people, preferably more. 10,000
individuals will generate about 100 cases in
IPUMS. Anything less than this is probably too
small a sample for useful analysis. - Any analysis of census-related question that is
not answered via the published census volumes or
summary files.
22An example Southern migrants in the
North 1870-1970
Published census volumes can tell you
--How many southern-born persons of each race
lived in each state in 1900, 1920, 1930, and
1960 --occupations of all African-Americans in
the North
But youre also interested in
--The jobs held by actual migrants --How their
jobs compared to those who stayed home --How
their jobs compared to northern-born blacks
--How their settlement changed from 1870 onward
23An example Why this analysis works
The numbers are very large
--over 500,000 southerners are in the North in
every decade from every decade from 1870 on
I dont need to know particular towns
--state of residence is available in every
census --a sub-state designation known as State
Economic Area (SEA) is even available for every
census
Data not available anywhere else
--and so it is worth the trouble
24An example What you cant do with the IPUMS
How did the southerners do in Pittsburgh?
--IPUMS has data on 90 employed southern black
men in Pittsburgh in 1970, fewer in previous
years.
Were the migrants segregated in the north?
--you dont know their street, tract, or
ward --all you know is their city, and only that
if it was a pretty big one (gt100K for 1940-50
and 1980-90 gt250K for 1960-70 gt100K in 2000).
Did migrants jobs improve over time?
--The census samples are cross-sectional
databases, not longitudinal ones
25Lab 1 Introduction to the datasets
? Who uses IPUMS?
- What research is IPUMS best for?
- Other IPUMS-like datasets
- Getting and using the data
26Ongoing data projects at the MPC
27Ongoing data projects at the MPC
28Ongoing data projects at the MPC
29Ongoing data projects at the MPC
30Lab 1 Introduction to the datasets
? Who uses IPUMS?
- What research is IPUMS best for?
- Other IPUMS-like datasets
- Getting and using the data