Data Collection and Sampling - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Data Collection and Sampling

Description:

The reliability and accuracy of the data affect the validity of the results of a ... Occupation. professional. clerical. blue-collar. Stratified Random Sampling. 13 ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 22
Provided by: zgo1
Category:

less

Transcript and Presenter's Notes

Title: Data Collection and Sampling


1
Data Collection and Sampling
  • Chapter 5

2
5.2 Methods of Collecting Data
  • The reliability and accuracy of the data affect
    the validity of the results of a statistical
    analysis.
  • The reliability and accuracy of the data depend
    on the method of collection.
  • Three of the most popular sources of statistical
    data are
  • Published data
  • Observational studies
  • Experimental studies

3
Published Data
  • This is often a preferred source of data due to
    low cost and convenience.
  • Published data is found as printed material,
    tapes, disks, and on the Internet.
  • Data published by the organization that has
    collected it is called PRIMARY DATA.

For example Data published by the US Bureau of
Census.
  • For example
  • The Statistical abstracts of the United States,
  • compiles data from primary sources
  • Compustat, sells variety of financial data
    tapescompiled from primary sources
  • Data published by an organization different than
    the organization that has collected it is called
    SECONDARY DATA.

4
Observational and experimental studies
  • When published data is unavailable, one needs to
    conduct a study to generate the data.
  • Observational study is one in which measurements
    representing a variable of interest are observed
    and recorded, without controlling any factor that
    might influence their values.
  • Experimental study is one in which measurements
    representing a variable of interest are observed
    and recorded, while controlling factors that
    might influence their values.

5
Surveys
  • Surveys solicit information from people.
  • Surveys can be made by means of
  • personal interview
  • telephone interview
  • self-administered questionnaire

6
Surveys
  • A good questionnaire must be well designed
  • Keep the questionnaire as short as possible.
  • Ask short,simple, and clearly worded questions.
  • Start with demographic questions to help
    respondents get started comfortably.
  • Use dichotomous and multiple choice questions.
  • Use open-ended questions cautiously.
  • Avoid using leading-questions.
  • Pretest a questionnaire on a small number of
    people.
  • Think about the way you intend to use the
    collected data when preparing the questionnaire.

7
5.3 Sampling
  • Motivation for conducting a sampling procedure
  • Costs.
  • Population size.
  • The possible destructive nature of the sampling
    process.
  • The sampled population and the target population
    should be similar to one another.

8
5.4 Sampling Plans
  • We introduce three different sampling plans
  • Simple random sampling
  • Stratified random sampling
  • Cluster sampling

9
Simple Random Sampling
  • In simple random sampling all the samples with
    the same size are equally likely to be chosen.
  • To conduct random sampling
  • assign a number to each element of the chosen
    population (or use already given numbers),
  • randomly select the sample numbers (members). Use
    a random numbers table, or a software package.

10
Simple Random Sampling
  • Example 5.1
  • A government income-tax auditor is responsible
    for 1,000 tax returns.
  • The auditor will randomly select 40 returns to
    audit.
  • Use Excels random number generator to select
    the returns.
  • Solution
  • We generate 50 numbers between 1 and 1000 (we
    need only 40 numbers, but the extra might be used
    if duplicate numbers are generated.)

11
Simple Random Sampling
Round-up
X(100)
383 101 597 900 885 959 15 408 864 139 2
46 . .
The auditor should select 40 files numbered
383, 101, ...
12
Stratified Random Sampling
  • This sampling procedure separates the population
    into mutually exclusive sets (strata), and then
    draw simple random samples from each stratum.

13
Stratified Random Sampling
  • With this procedure we can acquire information
    about
  • the whole population
  • each stratum
  • the relationships among strata.

14
Stratified Random Sampling
  • There are several ways to build the stratified
    sample. For example, keep the proportion of each
    stratum in the population.

A sample of size 1,000 is to be drawn
Total 1,000
15
Cluster Sampling
  • Cluster sampling is a simple random sample of
    groups or clusters of elements.
  • This procedure is useful when
  • it is difficult and costly to develop a complete
    list of the population members (making it
    difficult to develop a simple random sampling
    procedure.
  • the population members are widely dispersed
    geographically.
  • Cluster sampling may increase sampling error,
    because of probable similarities among cluster
    members.

16
5.5 Sampling and Non-sampling errors
  • Two major types of errors can arise when a
    sampling procedure is performed.
  • Sampling Error
  • Sampling error refers to differences between the
    sample and the population, because of the
    specific observations that happen to be selected.
  • Sampling error is expected to occur when making a
    statement about the population based on the
    sample taken.

17
Sampling Errors
Population income distribution
m ( population mean)
Sampling error
18
Non-sampling Errors
  • Non-sampling errors occur due to mistakes made
    along the process of data acquisition
  • Increasing sample size will not reduce this type
    of errors.
  • There are three types of Non-sampling errors
  • Errors in data acquisition,
  • Non-response errors,
  • Selection bias.

19
Data Acquisition Error
Population
Sampling error Data acquisition error
Sample
20
Non-Response Error
Population
No response here...
may lead to biased results here.
Sample
21
Selection Bias
Population
When parts of the population cannot be selected...
the sample cannot represent the whole population.
Sample
Write a Comment
User Comments (0)
About PowerShow.com