Title: Introduction to CSSCR Archive and Campus Data
1Introduction to CSSCR Archive and Campus Data
2Topics
- Major Sources of CSSCR Data Archive
- Finding Data Sets at CSSCR
- Other Data Resources at CSSCR
- Introduction to Decennial Censuses and American
Community Survey
3CSSCR Archive
- The Center for Social Science Computation and
Research (CSSCR) maintains a large electronic
data archive related to social science research. - Data set are available through web viewer,
network server or CDROM.
4Major Sources of CSSCR Data Archive
- Inter-University Consortium for Political and
Social Research (ICPSR) - US Census Bureau
- Bureau of Labor Statistics
- Washington State Data Center
- IASSIST International Association for Social
Science Information Service Technology
5Major Sources of CSSCR Data Archive
- Inter-University Consortium for Political and
Social Research (ICPSR) http//www.icpsr.umich.edu
- Membership-based organization founded in 1962.
- Provides access to the worlds largest archive
of computerized social science data.
- Offers training facilities for the study of
quantitative social analysis techniques (e.g. the
ICPSR Summer Program in Quantitative methods of
Social Research).
6Major Sources of CSSCR Data Archive
- US Census Bureau http//www.census.gov
- 1990, 2000 Decennial Census of Population
Housing - Summary Tape File/Summary File (STF/SF)
- Public Use Microdata Sample (PUMS)
- American Community Survey (ACS)
-
7Major Sources of CSSCR Data Archive
- Bureau of Labor Statistics www.bls.gov/nls
- National Longitudinal Survey of Youth 79,97
Public-use File (CDs are available at CSSCR, or
free downloadable on BLS website) - National Longitudinal Survey of Youth 79,97
Geocode data (confidential data) - Provides geographic variables for data file
- To protect the confidentiality of respondents,
- the agreement letter has to be signed with BLS.
8Major Sources of CSSCR Data Archive
- Sources of Economic Data
- Economagic Economic Time Series Page
- http//www.economagic.com/
- Provides internet browsing for the U.S.
business, economic and trade information - DRI_WEFA Basic Economics Database
- Datastream
9DRI_WEFA Basic Economics Database
- A macroeconomics database that contains about
7000 monthly, quarterly and annual time series
dated back to 1946 when available and end with
the latest available observations. - Includes financial data, construction housing
data, industrial statistics, population counts
estimates, foreign trade interest rates - Accessible through E-Views in CSSCR lab. A
reference book is available at Room 601E.
10DataStream Database
- Provides access to various global economic and
financial databases (e.g. National Government
OECD Series, International monetary funds,
equities, bond indices, interest and exchange
rates, company account definitions, etc). - At CSSCR, Datastream is only available through
the Archivist at Room 601E.
11Major Sources of CSSCR Data Archive
- Washington State Data Center
- http//www.ofm.wa.gov
- WA State Vital Statistics
- WA State Population Projections
- WA state Population Surveys
- Pregnancy Abortion Data
12Other Sources of CSSCR Data Archive
- Data Access via DataFerrett http//dataferrett.cen
sus.gov - Current Population Survey www.beta.ipums.org/cps/
- Survey of Income Program Participation
www.cdc.gov/nchs/hus.htm - National Health Interview Survey
- National Hospital Ambulatory Medical Care Survey
- iPOLL databank at The Roper Center for Public
Opinion Research is available through UW library - http//roperweb.ropercenter.uconn.edu/cgi-
bin/hsrun.exe/Roperweb/iPOLL/iPOLL.htxstartHS_iP
OLL_LoginSetup
13Finding Data Sets at CSSCR
- Web Site
- CDROM Log
- Codebook
- All these materials are available at Condon
611 or CSSCR web site
14Finding Data Sets throughCSSCR web viewers
- A complete list of data sets at CSSCR is
available on the CSSCR Web page. - Most online data sets at CSSCR can be accessed
through a web browser. - The CSSCR archive website address is
- http//julius.csscr.washington.edu
15Finding Data Sets throughCSSCR web viewers
- The data sets on the CSSCR homepage are divided
into several categories - ICPSR data
- CDROM data
- Census 2000
- ACS
- Census 2010
- Clicking on one of these five icons will bring
you to ICPSR Resource or CDROW list or
Census 2000, ACS Washington data
16Finding Data Sets throughCSSCR web viewers
- In ICPSR resource, click on
- Archive Brower lets you search the data to get
files you want. Under each title, information
such as data source, codename, abstract and
storage medium is displayed.
17Types of File
- Codebooks Documentation
- Dataset codebook ltfile namegt.cod
- Data dictionaryltfile namegt.dic or ltfile
namegt.doc - file descriptionltfile namegt.des
- Frequency listingltfile namegt.fre
- Dataset errataltfile namegt.err
18Types of File
- Data Files
- ASCII fileltfilenamegt.dat
- SPSS system fileltfilenamegt.sav or ltfilenamegt.svf
- SPSS portable fileltfilenamegt.por or
ltfilenamegt.exp - SPSS data definition statementsltfilenamegt.spss
- SAS data fileltfilenamegt.sas7bdat
- SAS catalog fileltfilenamegt.sas7bcat
- SAS transport fileltfilenamegt.xpt
- SAS data definition statementsltfilenamegt.sas
19Seattle Data Viewer
- A neighborhood information system.
- Provides access to a comprehensive set of
information about the city infrastructure and
environment. - Allows to organize and print data and maps of the
city. - Accessible at CSSCR lab through
- P\Data\Seattle_Data_viewer.
20Seattle Data Viewer
- Neighborhood statistics are grouped into the
units - base map
- Crimes and public safety
- Housing, health, education and civic
locations - Land use, value and zoning
- Landscape and environmental features
- Municipal and district Boundaries
- Park, recreation and open space
- Population and demographics
- Streets and transportation Utilities
21Introduction to Decennial Censuses
- Decennial Census of Population Housing
- Summary Tape File/Summary File (STF/SF)
- Public Use Microdata Sample (PUMS)
-
22Introduction to Decennial Censuses
- What is Summary Tape File/Summary File (STF/SF)
- The basic unit of analysis is a specific
geographic area. - About counts of persons or housing units in
particular categories. - Also called tabulated summary statistics.
23Example of STF/SF
Geography TOTAL POPULATION White alone Black or African American alone American Indian and Alaska Native alone Asian alone
Alabama 4442558 3153627 1144330 23283 38444
Alaska 641724 443874 22103 91013 28838
Arizona 5829839 4440804 180769 275321 129197
Arkansas 2701431 2135069 414260 18481 25249
California 35278768 21491336 2163530 253774 4365548
Colorado 4562244 3809054 165729 40063 117506
Washington 6146338 4988017 202286 88363 405030
24Introduction to Decennial Censuses
- The Types of STF/SF
- STF/SF 1 and 2 present tabulated data from the
Census short-form (100) questionnaire. - STF/SF 3 and 4 present cross-tabulations of
information from the long-form (sample)
questionnaire. - Tables in STF/SF 2 and 4 are iterated for many
detailed racial groups, as well as American
Indian and Alaska Native tribes. In SF4, many
data are also tabulated by detailed ancestry
groups.
25Introduction to Decennial Censuses
- 2000 Census short-form questionnaire
- full population
- six questions
- Household relationship
- Sex
- Age
- Hispanic or Latino origin
- Race
- Tenure (whether the home is owned or rented)
26Introduction to Decennial Censuses
- 2000 Census long-form questionnaire
- a sample includes 15.8-17 of full population
- separates as two parts
- Population
- social and economic characteristics (14 areas)
- Housing
- physical and financial characteristics (11 areas)
27Introduction to Decennial Censuses
- In 1980, and 1990 census data
- Letter A,B,C,D indicate different level of
geographic area - A - block groups B - block, zip codes
- C place, county D - Congressional district
- In 2000 census data
- P - person H - housing unit
- PCT - households
- HCT - occupied housing unit
28Introduction to Decennial Censuses
- What is Public Use Microdata Sample (PUMS)
- The basic unit of analysis is a housing unit or
the person who live in it with identifiers (such
as addresses, names, etc) removed to protect
individual confidentiality. - Its a stratified sample of the population which
was created by sub sampling the full census
sample that received census long form
questionnaires
29Example of PUMS
Person ID Age Genter Education Level
00001 34 F College
00002 21 M HighSchool
00003 14 M Middle School
00004 67 F HighSchool
00005 54 F HighSchool
00006 26 M College
30Introduction to Decennial Censuses
- The Types of PUMS
- 5-percent sample file (PUMS-A file)
- 1-percent sample file (PUMS-B file)
31Introduction to Decennial Censuses
- 5-percent sample file (PUMS-A file)
- provides the user records for over 14 million
people and over 5 million housing units - Each PUMA (Public Use Microdata Areas) must meet
a minimum population threshold of 100,000 (the
PUMA minimum) - Sample has only been produced since 1980
32Introduction to Decennial Censuses
- 1-percent sample file (PUMS-B file)
- Provides a fuller range of detailed
characteristics - Provides the user records for over 2.8 million
people and over 1 million housing units - Each super-PUMAs meet a minimum population of
400,000 and are composed of a PUMA or PUMAs
delineated on the 5-percent PUMS files - Samples from the 1960 through current censuses
33Introduction to Decennial Censuses
- Integrated Public Use Microdata Series (IPUMS)
http//www.ipums.umn.edu/ - Consists of thirty-eight high-precision samples
of the American population drawn from fifteen
federal censuses (1850 2000) and from the
American Community Surveys of 2000-2006 - Is particularly useful for historical research
because data can be comparable across time
34What is American Community Survey (ACS)
- is a large, continuous demographic survey
- produces annual and multi-year estimates of the
characteristics of the population and housing - will replace the 2010 census long form by
collecting detailed information throughout the
decade - Short form will still remain in 2010 decennial
census
35ACS Program Schedule
- Testing and development 1994-2004
- Full implementation began in 2005
- Group Quarters data collection began in 2006
36Full Implementation
- Annual national sample of approximately 3 million
addresses in every county and American Indian and
Alaska Native area in the United States - Provide profiles every year for communities of
65,000 or more - Provide 3-year cumulations for communities of
less than 20,000 population - Provide 5-year cumulations for all communities,
the lowest geographic level could be block group
37ACS Data Release Schedule
Before 2004 ACS the population threshold is
250,000
38ACS file types
- ACS Summary File (ACS SF)
- Public Use Microdata Sample (5 PUMS)
39Comparing ACS with the Decennial Census long form
questionnaires
- Samples rate/size design
- Data collection
- Residence rules reference periods
40Samples rate/size designComparison
- Census sample estimates based on about 18 million
housing units ACS 5 year estimates based on
about 11 million housing units, 1 year estimates
based on about 3 million housing units - ACS samples every year and spreads sample over 12
months census samples once a decade and uses the
entire sample at the same time
41Data Collection Comparison
- ACS nonresponse follow-up uses computer-assisted
telephone and computer-assisted personal
interviews past censuses have used only paper
questionnaires - ACS data collected only from household members
census data often collected from neighbors
42Residence Rules Comparison
- ACS uses a two-month rule
- - Resident of an address if a person
- Lives there year round
- Lives there more than 2 months but not year round
- Is living there now with no other place to live?
- Is away now for 2 months or less?
- - Not a resident of an address if a person
- Lives there 2 months or less with another
residence - Is away now for more than 2 month
- Decennial census based on concept of usual
residence -
43Reference Periods Comparison
- ACS uses the interview data as the single
reference point, or as the end of a reference
period, for all data collection - Examples
- Income
- ACS asks for income for the previous 12 months
- Decennial census income data refer to the
previous calendar year April 1 - School enrollment
- ACS asks if a person attended school during the
last three months - Census 2000 asks if a person attended school any
time since April 1
44Comparison Conclusion
- ACS estimates have higher sample error than
census long form, however shown as 90 confidence
limits or margins of error in every table - ACS has higher level of overall response and
individual item response, so less chance of
nonresponse bias, means lower potential
nonsampling error - ACS is a better way to collect this wide-ranging
information than was the decennial census because
the distribution of the data over the collection
time frame is more meaningful
- Comparing ACS Data to Other Sources
- http//www.census.gov/acs/www/UseData/compACS.ht
m
45Available ACS data
- 2005 single-year ACS provides household
population only for areas with populations of
65,000 or more - 2006 single-year ACS provides household
population and group quarters population for
areas with populations of 65,000 or more - 2007 single-year ACS provides household
population and group quarters population for
areas with populations of 65,000 or more by the
end of September - 2007 three-year ACS provides household population
and/or group quarters population for areas with
populations of 20,000 or more by the end of
December
46Available Census Data at CSSCR
- 1980 census data
- STF1, STF3 (raw data)
- 1990 census data
- STF1, STF2, STF3, STF4, 1PUMS, 5PUMS
- 2000 census data
- SF1, SF2, SF3, SF4, 1PUMS, 5PUMS
- 2005 ACS
- ACS SF, 5PUMS
- 2006 ACS
- ACS SF, 5PUMS
47Census CDs (GeoLytics)
- CensusCDMap (run ocensus3.bat to access)
- US 1990 Census, 1990 Estimates, 2004
projections, Consumer Expenditures, Time series
and Maps. - CensusCD Blocks (run occdblock.bat to access)
- Demographic data and boundaries for 7
millions blocks from STF 1B and PL94-171 files. - CensusCD 1980 (run occd1980.bat to access)
- US 1980 Census data from STF 1 and STF 3.
- StreetCD98 (run otiger.bat to access)
- Over 100 layers of map data from TIGER 98.
- available in the lab
48Census CDs (GeoLytics)
- Census CD 1970
- Census CD 1980 Long Form in 2000 Areas
- Census CD 1980
- Census CD 1990 blocks
- Census CD 1990 Long Form
- Census CD 1990-2000
- Census CD 2000 blocks
- Census 2000 Redistricting
- Census CD SF1 Blocks
- NCDB Neighborhood Change Database
- Available in the Room 601C