Title: Statistical Issues in Census Taking
1Statistical Issues in Census Taking
- Introduction
- Census taking had its beginning in ancient
times in Babylonia, China, Egypt, Palestine, and
Rome. - The word "census" comes from Latin word
"censere" that is to mean "to tax," or "to
value."
2- We need current and accurate information to
run our democracy. Our founding fathers
recognized this idea when they made the census
as integral part of the American Constitution. - Today, census taking is an important
statistical function of government in all
countries. This function has steadily grown
more complex as the needs for data have
increased.
3- US Bureau of the Census and other statistical
agencies contributed significantly to
development of statistical theory and methods,
especially in probability sampling and
statistical analysis. - Sampling was introduced into the 1950 census
in collecting socioeconomic data and in
evaluating quality of census data. - Sampling may play a bigger role in future
census taking. -
4- Methods of census in history of statistics
- 1. Enumeration Method(US Constitution, Article
1, section 2) - "Representatives and direct Taxes shall be
apportioned among the several States which may be
included within this Union, according to their
respective Numbers, which shall be determined by
adding to the whole Number of free Persons,
including those bound to Service for a Term of
Years, and excluding Indians not taxed,
three-fifths of all other Persons. The actual
Enumeration shall be made within three Years
after the first Meeting of the Congress of the
United States, and within every subsequent Term
of ten Years, in such Manner as they shall by Law
direct. - The Constitution was adopted in 1787 and the
first census was taken in 1790.
5- Estimation Method - Laplace proposed in 1786 and
employed in 1802 as a method of estimating
population (Stigler, History of Statistics, pp.
163-164) - Take a census in a few carefully selected
communities and determine the ratio of population
to birth - Multiply this ratio to the number of births in
France in the past year from the birth registers
(which were considered to be quite accurate) to
estimate the total population. - This is a forerunner of modern "ratio estimation"
methods.
6- Uses of census data
- 1. Reapportionment of political representation
- Only a simgle head count is needed. As a
result of the 1990 census, 8 states gained and
13 states lost at least one congressional seat. - 2. Statistical uses (information about the
state) - James Madison urged the first Congress to
collect additional information in the first
census about the people as a guide to future
legislation. He proposed that white people be
classified by gender and white males by age, and
that a count is made of people employed in each
occupation.
7- The first Census Act of 1790 specified
collection of data on the name of the head of
the family and the number of persons in each
household of the following description free
white males 16 years old and upward free white
males under 16 years free white females all
other free persons and slaves. Madison's
suggestion relating to occupational
information was deleted and did not appear
again until 1820. -
8- Public health agencies rely on census
statistics for their planning and evaluation
activities. Economic and social policy analysis
utilizes census information. Researchers in
demography and epidemiology also make use of
census data. Businesses make extensive use of
census data for formulating their marketing
strategies and organizing their future plans.
9- Problems with the past Censuses
- 1. Costs have escalated dramatically.
- The 1990 census spent 2.6 billion, which
represents an increase of 65 (in constant 1990
dollars) over the 1.67 billion in 1980. GAO
projected 4.8 billion for the 2000 census.
10- 2. Accuracy not improved.
- Coverage of 1990 census got worse the net
undercounting was estimated as - 2.1 (5.3 million persons) by the Post
Enumeration Survey (PES), and 1.8 (4.7
million persons) by demographic analysis.
11- The 4.4 difference in the 1990 undercount
between Blacks and non-Blacks was the highest
since the Bureau began estimating coverage
in1940. - The 1990 census contained 14.1 million gross
errors including 9.7 million persons missed
and 4.4 million persons double-counted.
12- 3. Undercounting adjustment not made
- Several unsuccessful lawsuits were filed to
compel the Bureau to adjust the 1980 census.
During the 1980s, the Bureau carried out a
research program to improve its undercount
estimation and adjustment procedures and
designed a PES for 1990 in preparation to
adjust the 1990 census. -
13- Public images of statistics
- 1. Lies, damned lies, and statistics (Mark
Twain) - 2. Statistics may be faked
- In no event may sampling or other statistical
procedures be used in determining the total
population by states." (HR3589, sponsored by
Congressman Thomas Petri, R- Wisconsin) - "Sampling isn't counting votes could be
statistically manufactured by bureaucrats"
(Wall Street Journal, editorial, May 22, 1997) -
14- Preparing for the future censuses
- 1. Statistical estimation will be an important
part of census methodology - The Bureau tested this statistical
methodology extensively during 1980s
(Mississippi in 1986, Los Angeles in 1986,
North Dakota in 1987, Missouri and Washington
in 1988, and extended PES in 1990). - The Bureau worked with external advisory
groups (National Academy of Science panels,
ASA Blue Ribbon panel)
15- 2. Fundamental reform in making
- Scrapping the long form in 2010, replaced by
a monthly household survey designed to provide
annually updated data (American Community
Survey). - Modifying the census to respond to social
changes in living arrangements of Americans and
reexamining the de jure method of enumeration
and changing some of its residency rules. -
-
-
16 Application of Sampling Methodology in Census
Post Enumeration Survey Design - 1990 1.
Sample design - single-stage, stratified cluster
sampling Sampling units were block clusters
(a block or a collection of blocks) 5,290
block clusters were taken, which contained
about 380,000 people in 166,000 households.
17- Results of PES
- Occupied housing units 143,818
- Interviews with household 134,808
93.7 - Member
-
- Proxy interviews
6,745 4.7 - Non-interviews
2,265 1.6
18- Two samples are designated
-
- P-sample - the persons found by PES in the
selected clusters - E-sample - the persons found by the census
in the selected clusters -
19- 2. Post-stratification
- Based on geography, race, place, housing
tenure, age and sex - resulted in 1,392
post- strata - 3. Estimation within each post-stratum
- Estimate the number of erroneous census
enumeration -
- Estimate the total population by the
dual- system estimator by matching the P-sample
and E-sample (see Chapter 14, Section 6, page
443) -
20- Data preparation
- 1. Determine the matching status in P-sample
-
- Matched with persons in E-sample computer
match and manual match - Non-match (not enumerated in census)
- Unresolved cases
- Non-Hispanic white 1.6
- Black 2.5
- Hispanic 2.5
- Asian 2.0
-
21- 2. Determine erroneous enumeration (follow-up of
unmatched cases) - Types of error duplicates, fictitious
enumeration, children born after Census Day,
people died before Census Day, people counted
in the wrong location, insufficient
information. - Non-error (missed by PES)
- Unresolved cases
- Non-Hispanic white 0.7
- Black 2.1
- Hispanic 1.8
- Asian 1.3
22- Estimation procedure
- Allocate the P and E sample data to
post-strata - Imputation of unresolved cases by using a
logistic regression model and weight
adjustment. -
23- a. Unresolved cases in P-sample
- Calculate predicted probability of resolution
- The sample weight of resolved cases are
inflated - by multiplying
24- b. Unresolved cases in E-sample
-
- Similar adjustment is made as above
- Estimation (dual-system estimator)
- - weighted E-sample total
- - weighted total of erroneous
enumeration - - weighted P-sample total
- - weighted total of P- sample
matches -
-
25- Estimated total population -
-
- Net undercounting -
-
- Adjustment factor -
26- Adjustment and estimation procedure
- 1. Smoothing of adjustment factors in 1,392
post-strata by Bayesian regression approach - The "pre-smoothed" variance based on a
Poisson distribution was used as a basis for
smoothing. - Much controversy attended the use of such
models in the court debate. - True uncertainty was difficult to assess
- Analytical solution vs. alternative design
(the smallest post-stratum had only 8 persons). -
27- 2. Synthetic estimation
- Census count in all areas were adjusted within
post-strata
28- Criticisms of methods
- 1. The smoothing model unjustified
- 2. The post-strata were possibly too
heterogeneous, e.g., 12 age-sex groups 0-9,
10-19, 20-29, 30-44, 45-64, 65 - Later reduced to 7 age-sex groups 0-17 (both
sexes), 18-29 (male, female), 30-49 (male,
female), 50 (male, female) - the revision
produced 357 post-strata. -
29- 3. Possible biases in estimation (bias in
address file used on Census Day, matching error,
coding error, whole person imputation, error in
computer editing). -
- Some modification has been introduced new
manual matching, modification in matching rules,
and correction in computer editing routine. -
30References General historical 1.
Desrosieres, A. (translation by Naish, C.).
(1998), The Politics of Large Numbers A History
of Statistical Reasoning, Cambridge, Mass.
Harvard University Press. 2. Stigler, S.M.
(1986), The History of Statistics The
Measurement of Uncertainty before 1900,
Cambridge, Mass. Harvard University Press.
31Non-technical 1. Edmonston, B., and Schultze,
C. (1994), Modernizing the U.S. Census,
Washington, DC National Academy Press. 2.
Stefey, D.L., and Bradburn, N.M. (1994),
Counting People in the Information Age,
Washington, DC National Academy Press. 3.
Duncan, J.W., and Shelton, W.C. (1992), "U.S.
Government Contribution to Probability Sampling
and Statistical Analysis," Statistical Science
7320- 338.
32Technical 1. Mulry, M.H., and Spencer, B.D.
(1991), "Total Error in PES Estimates of
Population," J of American Statistical
Association 86(416) 839-863. 2. Hogan, H.
(1993), "The 1990 Post-Enumeration Survey
Operations and Results," J of American
Statistical Association 88 (423) 1047-1060. 3.
Breiman, L. (1994), "The 1991 Census
Adjustment Undercount or Bad Data?" Statistical
Science 9(4) 458-537.