Title: Research Methods in Economics
1Research Methods in Economics
- ECO 4451
- Sampling and Statistical Testing
2Sampling Terminology
- Population
- The complete set of items of interest
- Population element
- An individual member of population
- Census
- A complete enumeration of all elements in
population - Sample
- A subset of the population selected for
investigation
3Terminology (Cont)
- Frame
- Population frame list of all elements in
population - Sample frame list of elements from which sample
will be drawn
4Why Sample (not census)?
- Cost
- Sufficiently accurate for most purposes if well
designed probability sample - Sometimes decrease in accuracy from attempt to
make complete census - Destruction of sample units
5Why Sample (cont)?
- But sampling introduces error in that it is
virtually impossible for a sample to perfectly
represent the population from which it was drawn.
- Two categories of errors
- Non-sampling error
- Sampling error
6Representative?
- How well does the sample represent the
population? - Population
Sample - Parameters Statistics
Estimation
7Whats a population?
- Technically the population is the complete set of
elements of interest. - For example, in a study of corporate profits, the
population is the set of profits of all
corporations - We think of univariate, bivariate or multivariate
populations. - If we are interested in whether profits are
related to CEO compensation, we have a bivariate
population where the elements are the sets of
pairs of profits compensation.
8What makes a good sample?
- It must be representative of the population.
- Basically this means it must contain the same
variations that exist in the population. - Estimators based on sample must be valid.
- Validity depends on
- Accuracy
- Precision
9Accuracy is
-
- The degree to which bias is absent from the
estimator. - To have Accuracy
- Overestimates and Underestimates must balance out
in repeated sampling.
10Precision
- Is low sampling error.
- Repeated samples would yield similar estimates.
- Is measured by the standard error of estimate, a
type of standard deviation measurement we will
discuss later.
11Errors from Investigating a Sample (rather than a
census)
- Nonsampling (systematic) error
- Results from some imperfection in research design
or mistakes in execution of design. - Sampling frame error
- Non-response bias
- Response or recording error
12Systematic (Nonsampling) Errors
- Sampling frame error
- Some population elements not represented in
sampling frame - Non-response error
- When results are affected because some elements
selected into sample do not respond or are not
measured - Response or recording error
- Errors in making or recording responses or
measurements
13Errors from Sample rather than census (Cont)
- Sampling (random) error
- Difference between sample statistic and
population parameter that results from chance
variation in elements selected for inclusion in
sample. - Two determinants of sampling error
- Homogeneity (larger sampling error) vs.
heterogeneity (smaller sampling error) of
population - Sample size (larger sample reduces sampling error)
14Errors
- Target Population
- Sampling Frame
- Planned Sample
- Actual Sample
Sampling frame error
Sampling error
Nonresponse error
15Stages in the Selection of a Sample
Define the target population
Select a sampling frame
Determine if a probability or nonprobability
sampling method will be chosen
Plan procedure for selecting sampling units
Determine sample size
Select actual sampling units
Conduct fieldwork
16Sampling Units
- A single element or group of elements subject to
selection in sample. - When sampling occurs in one stage, the elements
selected in the sample are the sampling units. - Example simple random sample of college
students. - In multi-stage sampling we distinguish
- Primary Sampling Units (PSU) first or top-level
- Secondary Sampling Units second level
- Tertiary Sampling Units third.
17Sampling Units (Cont)
- Multi-stage sampling
- Primary, secondary, tertiary sampling units
- Example first select a region (PSU), then
colleges within region (SSU), then students at
the colleges (TSU).
18Two Major Categories of Sampling
- Probability sampling
- Known, nonzero probability for selecting any
element from sampling frame - This probability may be same or different for
different elements. - Sampling error can be estimated
- Nonprobability sampling
- Probability of selecting any particular element
of population is unknown - Sampling error is unknown
19Nonprobability Sampling
- Convenience
- Judgment
- Quota
- Snowball
20Probability Sampling
- Simple random sample
- Systematic sample
- Stratified sample
- Cluster sample
- Multistage cluster sample
21What is the Appropriate Sample Design?
- Degree of accuracy precision
- Resources available, including time.
- Advanced knowledge of the population
- National versus local
- Need for statistical analysis
22Statistical Analysis of Samples
- Descriptive statistics
- Describe characteristics of sample
- Using sample statistics, like measures of central
tendency and dispersion, to describe a sample of
observations. - Inferential statistics
- Make an inference about an unknown population
from a sample - Estimation and hypothesis testing.
23Descriptive Statistics
- Measures of central tendency
- Mean, median, mode
- Measures of dispersion
- Variance (or standard deviation), range
- Measures of frequency
- Counts, proportions
- Often presented in a table.
- Possibly separate by different groups or
sub-samples, particularly if your paper involves
a comparison between groups. - Usually some brief discussion of the descriptive
statistics is appropriate. - Give the reader some idea about the type of units
in the sample. - Give the reader a feel for the scale of the data.
- Give information about the amount of variation.
- Inspection of descriptive statistics often
reveals the source of problems you may be having
with statistical procedures.
24Frequency Distribution of Deposits
Frequency (number of people making
deposits Amount in each range)
less than 3,000 499 3,000 - 4,999
530 5,000 - 9,999 562 10,000 -
14,999 718 15,000 or more
811 3,120
25Percentage Distribution of Amounts of Deposits
Amount Percent
less than 3,000 16 3,000 - 4,999
17 5,000 - 9,999 18 10,000 - 14,999
23 15,000 or more 26 100
26Probability Distribution of Amounts of Deposits
Amount Probability
less than 3,000 .16 3,000 - 4,999
.17 5,000 - 9,999 .18 10,000 -
14,999 .23 15,000 or more
.26 1.00
27Measures of Central Tendency
- Mean - arithmetic average
- µ, Population , sample
- Median - midpoint of the distribution
- Mode - the value that occurs most often
28Population Mean
Average value in population.
29Sample Mean
Where n denotes the total number of elements in
sample.
30Daily Sales Calls by Salespersons
Number of Salesperson Sales calls
Mike 4 Patty 3 Billie
2 Bob 5 John 3 Frank
3 Chuck 1 Samantha 5 26
Sample mean3.25, median3, mode3. Range4, I-q
Range1, Variance1.93, Std.Dev.1.39
31Measures of Dispersion or Spread
- Range
- Mean absolute deviation
- Variance
- Standard deviation
32Sales for Products A and B, Both Average 200
Product A Product B
196 150 198 160 199 176 199 181 200
192 200 200 200 201 201 202 201 213 2
01 224 202 240 202 261
But sales of product B have greater variability.
33Low Dispersion Vs High Dispersion
5 4 3 2 1
Low Dispersion
Frequency
150 160 170 180 190
200 210
Value of Variable
34Low Dispersion Vs High Dispersion
5 4 3 2 1
High dispersion
Frequency
150 160 170 180 190
200 210
Value of Variable
35Deviation Scores
- The differences between each observation value
and the mean
36Average Deviation
37Mean Squared Deviation
38Variance Mean SquaredDeviation
39Sample Variance
40Variance
- The variance is given in squared units
- The standard deviation is the square root of
variance, and so is in original units.
41Population Standard Deviation
42Sample Standard Deviation
43Sample Standard Deviation
44Inferential Statistics
- Now instead of using statistics to describe a
sample, we use sample statistics to make
inferences about a population parameter. - For example, we use the sample mean to estimate
the value of the population mean. - Then we may want to test some hypothesis about
the population mean.
45Distributions
- Population distribution frequency distribution
of elements in population - Sample distribution - frequency distribution of
elements in sample - Sampling distribution theoretical distribution
of a sample statistic in repeated sampling. - Key concept in inferential statistics.
- Example sampling distribution of sample mean is
normal.
46Population Distribution
m
s
-s
x
47Sample Distribution
_ C
X
S
48Sampling Distribution
49The Normal Distribution
- Describes the probability distribution expected
of many random occurrences. - Bell shaped curve
- Almost all of its values are within plus or minus
3 standard deviations - I.Q. is an example
50Normal Distribution
13.59
13.59
34.13
34.13
2.14
2.14
51Normal Curve IQ Example
145
70
85
115
100
52Standardized Normal Distribution
- Symmetrical about its mean
- Mean identifies highest point
- Infinite number of cases - a continuous
distribution - Area under curve has a probability density 1.0
- Mean of zero, standard deviation of 1
53Standard Normal Curve
- The curve is bell-shaped or symmetrical
- About 68 of the elements will fall within 1
standard deviation of the mean - About 95 of the elements will fall within
approximately 2 (i.e., 1.96) standard deviations
of the mean - Almost all (gt99) of the elements will fall
within 3 standard deviations of the mean
54A Standardized Normal Curve
z
2
0
-1
-2
1
55The Standardized Normal is the Distribution of Z
z
z
56Population Standardized Scores
57Standardized Values
- Used to compare an individual value to the
population mean in units of the standard deviation
58Linear Transformation of Any Normal Variable Into
a Standardized Normal Variable
s
s
m
X
m
Sometimes the distribution is stretched
Sometimes the distribution is shrunk
-2 -1 0 1 2
59Central Limit Theorem
- The CLT says that if the sample size n is
large, - On average across repeated samples, the mean of
sample means equals the population mean. - The variance of the sample means across different
samples equals the population variance divided by
n. - The distribution of sample means across different
sample is normal.
60Population Parameters and Sample Statistics
61Review of Simple Statistical Tests
- Many research questions can be addressed with
very simple statistical tests. - Often a good research design leads to a simple
test, while a bad design requires complex
statistical procedures for analysis of the data.
- Since many classic research questions imply a
comparison between groups, the two-sample (or
multiple-sample) tests are especially useful.
62Overview
- Tests concerning population means.
- One sample test.
- Two independent samples test.
- K gt 2 independent samples.
- Matched samples.
- Tests concerning population proportions.
- One sample.
- More general tests.
63Examples of Difference between Means Tests
- Consider Does the Death Penalty Deter Murder?
by Tammra Hunt. - Compare murder rates ( per 100,000) with and
without death penalty. - Cross-section of states is murder rate higher on
average in states without executions? - Time series of states did the murder rate fall
in states implementing the death penalty when
allowed by Supreme Court? - Panel data allows an approach based on
differences-in-differences.
64Between-State Differences 2003
- Consider two populations, A (with death penalty)
and B (without death penalty).
Note the alternative is one-sided, because the
research hypothesis is that the death penalty
deters murder.
65Testing the null hypothesis
- The test can be conducted by computing the
t-statistic (note one sample size is less than
30) manually. - Or it can be conducted automatically using
statistical software, or Excel. - In Excel, select Tools,
- Data Analysis,
- t-Test Two Sample Test Assuming Equal Variances
66Between-State Differences 2003
67Within-State Differences
- What if we consider within-state differences in
murder rates? - Idea is that if death penalty deters murder, the
murder rate should fall after the penalty is
implemented. - Take the states that adopted the death penalty
after the Court allowed it. - Get the mean death rate for each of these states
over the 4 years before and the 4 years after. - Test whether the mean is lower after than it is
before. - This is a matched pairs test.
- Each states before period is matched to its
after period.
68Testing the within-state difference
- The test can be conducted manually.
- Compute the After Before difference for each
state with the death penalty. - Get the sample mean and variance of these
differences. - Test the null hypothesis that the difference is
zero against the alternative that it is negative.
- Or it can be conducted automatically using
statistical software, or Excel. - In Excel, select Tools,
- Data Analysis,
- t-Test Paired Two Sample for Means
69Within-State Differences