Title: Using Census Summary File Data for Research
1Using Census Summary File Data for Research
- Nancy A. Denton
- SUNY Albany
- n.denton_at_albany.edu
2Why use summary data?
- Available more quickly than microdata
- Interested in places rather than people
- Believe that places affect people
3Research with Summary Data
- Contains the same basic variables as individual
census data - Involves a different way of thinking
- Requires somewhat different programming skills
- Raises different statistical issues and different
error sources
4Characteristics of Summary Data
- Each value of what is a traditional variable in
microdata becomes a separate variable in summary
data - a summary variable - Each summary variable is a single cell of an
n-way cross tabulation so it contains a count - Each table has a universe of persons, families,
households, houses
5Individual v. Summary Data
- Variable is RACE
- Values of Race
- 1white
- 2black
- 3 Native American
- 4Asian
- 5NHOPI
- 6Other
- 7Two or more
- Values are codes
- Variables in Table 7
- P007001Total pop
- P007002White alone
- P007003Black alone
- P007004Nat. Am. alone
- P007005Asian alone
- P007006NHOPI alone
- P007007Other alone
- P007008Two or more
- Values are counts
6Characteristics of Data Files
- Individual Data File
- Household 1
- person 1 race, age, sex ..
- person 2 race, age, sex ..
- person n race, age, sex ..
- Household 2
- Summary Data File
- Geographic Unit 1 Total, Whites, Blacks, Asians,
NHOPI, Native Americans, Other, Age 0, Age 1, Age
2, Age 3, Age 4 Age n, Male, Female - Geographic Unit 2 Total, Whites, Blacks, etc.
7Compared to Individual Data Files, Summary Files
are
- Not as flexible
- Very large
- Seldom are amenable to analysis with traditional
statistical programs without some preliminary
manipulation
8But the big advantage of summary files is that
they
- Allow you to break the 100,000 person barrier --
the size of the smallest PUMA on microdata
9For large geographical units, you can use either
type of data
- How does the child poverty rate compare across
states? - Which is more important in determining the state
child poverty rate, racial composition or family
structure?
10How would you answer these questions with PUMS
data?
- Essentially, youd create aggregate data by
adding up the individual data to the state level - Programming wise, youd create dummy variables
for children, poor children, blacks, and single
mother families and then add up to the state
level to get the rates
11More concretely (in SAS)
- Data temp
- Set pumsdat person file w/hhold data
attached - If age le 17 then child1 else child0
- If child1 and pov1 then poorchild1
- else poorchild0
- If race2 then black1 else black0
- If hhtyp3 then singmom1
- else singmom0
12Now add up
- Proc Means noprint
- by state
- Var child poorchild black singmom
- tpop tfam
- output outstatdat
- sum
13Now compute rates
- Data srate
- set statdat
- Chpovpoorchild/child100
- Pcblkblack/tpop100
- Pcsingsingmom/tfam100
- Proc print var state chpov pcblk pcsing
- Proc reg
- Model chpovpcblk pcsing
14Nested Geographic LevelsState Files
- Nation
- States
- Counties
- County subdivisions
- Places
- Census tracts
- Block groups (for selected
tables) -
-
15Small-Area Geography Overview
16Census Tracts
- For the first time for Census 2000 Cover the
nation - Relatively homogenous population characteristics
- 65,000 Census tracts across U.S.
- Size optimally 4,000 people, range between 1,000
and 8,000
172000 Summary File (SF) Data
- Short form data
- --- PL94-171
- --- SF1
- --- SF2 (race detail)
- Long form data
- --- SF3
- --- SF4 (race/ethnic detail)
18Census 2000 Short Form Questionnaire
- 7 Questions
- Name
- Sex
- Age
- Relationship
- Hispanic Origin
- Race
- Owner/Renter Status
19Population SubjectsSummarized to Census Tract
- Ancestry
- Disability
- Employment Status
- Grandparents as Caregivers
- Households and Families
- Income (Family, Nonfamily, Indiv)
- Language Spoken
- Marital Status
- Migration
- Birthplace, Year of Entry, Citizenship
- Poverty Status
- School Enrollment and Educational Attainment
20Housing Subjects
- Units in Structure
- Year Built
- Rooms
- Year Householder Moved In
- Rent/Value
- House Heating Fuel
- Vehicles Available
- Mortgage Status and Monthly Costs
- Plumbing and Kitchen Facilities
- Telephone Service
- Occupants Per Room
21SF Data features
- Same data available for ALL units of geography
covered by that file - Smallest unit of geography varies across files
- All files have nested geography
22Some Uses of Summary Data in Research
- Find out about a particular place
- Compute Metro Area Indices
- Construct Patterns of Neighborhood Race/Ethnic
Composition - Calculate Neighborhood Profiles
- Trace Paths of Neighborhood Change over time
- Attach Summary Data to Individual Data to Predict
Neighborhood Effects
23I. Find out about a particular place
- Go to the library
- Go on line to www.census.gov
- Use American Factfinder
24(No Transcript)
25II. Metro Area Indices
- What is an index?
- An Index is a single number
- which reflects the characteristics of tracts (or
any other unit of geography) - aggregated to the metro (city, suburban, county)
level - in such a way that it reveals something about
the distribution of groups in space
26Index of Dissimilarity
- D is its common name
- measures evenness
- what proportion of either group would have to
change neighborhoods if each neighborhood had the
same racial composition as the city (or metro
area) as a whole? - workhorse of segregation studies
27Formula for Dissimilarity
- where xi and yi refer to tract totals and X and Y
refer to metro-wide totals
28So how would we calculate that
- Create little x and ys for each tract
- Sum up to get big X and big Y for metro area
- Calculate the index and add up across all tracts
29In SAS
- Data temptract Set tractdata
- xP007002 whites
- yP007003 blacks
- Proc means noprint
- Var x y
- output outtots sum TX TY
30Then
- Data index
- merge temptract tots put denominators on
- by msa
- calculate index
- Dwb .5ABS(sum((x/TX),-(y/TY)))
- if last.msa then do
- output write out index for 1st msa
- Dwb0 to begin anew for next msa
- end
- Proc print by msa Format Dwb 5.3
31Issue to face with Dissimilarity
- Youre only comparing two groups at a time while
the population of almost all areas contains more
than that - If you define your groups as group x and the
remainder, then when you compare indices to each
other, the reference group changes for each group
studied
32P-star Indices
- Look at things from the perspective of within the
neighborhood - How many people look like me?
- Isolation
- How many people are different and of what type?
- Contact
- Both are calculated the same way
33P-star Formula
-
- Where xi and yi are tract-level populations of
groups x and y, ti is the total population of the
tract, and X is the metro-wide total of group X - For isolation, use same group on both sides
34Issue to face with P-star
- What is the tract total population in a
multi-group world?
35Other Dimensions of Segregation
- Centralization
- Clustering
- Concentration
- Indices representing these dimensions also have
the two-group problem - See Reardon and Firebaugh for latest information
on Multigroup Measures.
36So, what do we know about segregation today?
37U.S. Census Bureau report on Residential
Segregation 1980-2000 says
- The trend for Blacks or African Americans is
clearest of all -- declines in segregation were
observed over the 1980 to 2000 period across all
dimensions of segregation we considered. - Despite these declines, residential segregation
was still higher for African Americans than for
the other groups across all measures. Hispanics
or Latinos were generally the next most highly
segregated, followed by Asians and Pacific
Islanders, and then American Indians and Alaska
Natives, across a majority of the measures.
38Same report continues
- Asians and Pacific Islanders, as well as
Hispanics, tended to experience increases in
segregation, though not across all dimensions. - Increases were generally larger for Asians and
Pacific Islanders than for Hispanics. - Iceland, Weinberg and Steinmetz, 2002.
39(No Transcript)
40III. Portray Patterns of Neighborhood Race/Ethnic
Composition
- Assume youre interested in four groups
- Whites, Blacks, Hispanics, Asians
41 - Number of Groups in Neighborhood
- 1 group 2 group 3 group 4 group
- W--- WB-- WBH- WBHA
- Wddd W-H- WB-A
- -B-- W--A W-HA
- --H- -BH- -BHA
- ---A -B-A
- --HA
- Need to establish group presence cut-off
42In 50 Largest MSA/CMSAs in 2000
- Pattern Tracts Pop (000)
- W--- 14.1 18,899
- Wddd 11.0 19,275
- WB-- 14.8 23,291
- W-H- 14.2 26,686
- WB-H 13.2 25,534
- W-HA 9.1 18,857
- WBHA 9.3 19,943
- Total 85.7 152,485
- 41,521 85.9
43In Suburbs of 50 Largest MSA/CMSAs in 2000
- Pattern Tracts Pop (000)
- W--- 22.4 15,666
- Wddd 13.5 12,473
- WB-- 12.0 9,773
- W-H- 15.4 14,930
- WBH- 10.3 11,011
- W-HA 9.2 10,165
- WBHA 7.5 8,887
- Total 90.3 82,905
- 23,505 85.6
44IV. Calculate Neighborhood Profiles
- Average Neighborhood Characteristics for members
of a particular group - Variation on the P-star Index
- Strategy
- compute characteristics for each tract
- use population groups as weights
45Cleveland, 2000
- Pattern Tracts Pop (000)
- W--- 36.8 991
- Wddd 9.4 241
- -B-- 12.6 209
- WB-- 21.5 421
- W-H- 3.2 64
- WA 1.5 39
- WBH- 11.3 208
- WB-A 2.5 52
- Total 98.8 2,225
- 99.2
46Cleveland, Neighborhood SES Characteristics
- 2000 Group Med. Inc. Med. House
- Total 42,937 117,149
- W--- 54,051 146,330
- Wddd 51,135 142,348
- -B-- 22,334 60,652
- WB-- 41,196 108,965
- W-H- 38,359 91,909
- WA 65,234 231,590
- WBH- 27,503 70,829
- WB-A 38,830 148,829
47V. Trace Paths of Neighborhood Change over time
- Basically just computing patterns for different
years and then cross classifying them - More difficult is fact that the tract boundaries
must be matched over time
48Change in All-White (95) Neighborhoods
1970-1990
49Ten Most Frequent Paths of Neighborhood Change
1970-1990 for All-white Neighborhoods in 1970
- Start All-White N __
- W---W---W--- 2744 30.1
- W---W---Wddd 815 8.9
- W---WdddWddd 271 3.0
- W---WB--WB-- 220 2.4
- W---WdddW--A 168 1.8
- W---W-H-W-H- 140 1.5
- W---W---W-H- 123 1.3
- W---WdddW-H- 120 1.3
- W---W---W--A 112 1.2
- W---W---WB-- 98 1.1
501990 Neighborhood Characteristics of All-white
Neighborhoods in 1970
- 1990 Group Med. Inc. Med. House
- All-white 70-90 41, 273 113,949
- Wddd 46,629 152,951
- WB-- 35,743 88,955
- WA 61,331 283,363
- W-H- 38,224 139,400
- WBH- 33,035 101,165
- W-HA 47,738 236,144
- WBHA 42,129 154,509
51VI. Attach Summary Data to Individual Data
- If youre collecting your own data, then you can
geocode it from the address - With US Census data, because of privacy issues,
you must use a confidential data center
52Neighborhood Effects
- Some publicly available data have already done
this - PSID
- Add-Health
- MCSUI
- MTO
53In conclusion
- Summary data are currently underutilized in
research - Methodological issues remain to be solved
- Availability of confidential sites should
increase their potential for use by researchers