Where do you get what you are telling us? - PowerPoint PPT Presentation

1 / 128
About This Presentation
Title:

Where do you get what you are telling us?

Description:

Where do you get what you are telling us? Some examples of population All secondary school principals in Malaysia All primary school counselors in the state of Sabah ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 129
Provided by: tdps
Category:

less

Transcript and Presenter's Notes

Title: Where do you get what you are telling us?


1
Where do you get what you are telling us?
2
  • Sources of Data
  • where do you get the data? And from whom should
    the data be collected?
  • Clearly, your data should come from the
    participants that are both available to you and
    relevant to the question you are studying

3
  • there are times when we aren't very concerned
    about generalizing.
  • we're just evaluating a program in a local agency
    and we don't care whether the program would work
    with other people in other places and at other
    times.
  • In that case, sampling and generalizing might
    not be of interest

4
  • "Who do you want to generalize to?" Or should it
    be "To whom do you want to generalize?"
  • In most applied social research, we are
    interested in generalizing to specific groups.
  • The group you wish to generalize to is often
    called the population
  • This is the group you would like to sample from
    because this is the group you are interested in
    generalizing to

5
Some examples of population
  • All secondary school principals in Malaysia
  • All primary school counselors in the state of
    Sabah
  • All students attending Kolej Tunku Kursiah during
    the academic year 2004-2005
  • All students in Mrs. Amin form two at SMKA

6
  • Let's imagine that you wish to generalize to
    urban homeless males between the ages of 30 and
    50 in Malaysia.
  • If that is the population of interest, you are
    likely to have a very hard time developing a
    reasonable sampling plan.
  • You are probably not going to find an accurate
    listing of this population, and even if you did,
    you would almost certainly not be able to mount a
    national sample across hundreds of urban areas.

7
  • So we probably should make a distinction between
    the population you would like to generalize to,
    theoretical population or target population
  • The population that will be accessible to you
    accessible population.
  • In this example, the accessible population might
    be homeless males between the ages of 30 and 50
    in six selected urban areas in Malaysia

8
  • A population can be defined as any set of
    persons/subjects having a common observable
    characteristic. It is the group from which you
    were able to randomly sample .
  • The target population is the group to which the
    researcher would like to generalize his or her
    results. This defined population has at least one
    characteristic that differentiates from other
    groups.
  • The accessible population is the population to
    which the researcher has access.

9
  • Once you've identified the theoretical and
    accessible populations, you have to do one more
    thing before you can actually draw a sample --
    you have to get a list of the members of the
    accessible population.
  • The listing of the accessible population from
    which you'll draw your sample is called the
    sampling frame.
  • If you were doing a phone survey and selecting
    names from the telephone book, the book would be
    your sampling frame.
  • That wouldn't be a great way to sample because
    significant subportions of the population either
    don't have a phone or have moved in or out of the
    area since the last book was printed.

10
  • Finally, you actually draw your sample (using one
    of the many sampling procedures). The sample is
    the group of people who you select to be in your
    study.
  • Sampling refers to drawing a sample (a subset)
    from a population (the full set). is the act,
    process, or technique of selecting a suitable
    sample, or a representative part of a population
    for the purpose of determining parameters or
    characteristics of the whole population.
  • Samples are measured in order to make
    generalisations about populations. Ideally,
    samples are selected, usually by some random
    process, so that they represent the population of
    interest.

11
Topic if investigatiion The effect of computer
assisted instruction on The reading achievement
of first and second graders in Malaysia
Target population All first and second graders in Malaysia
Accessible population All first and second graders in Selangor
Sample 384 first and second graders in the state of Selangor
12
  • The usual goal in sampling is to produce a
    representative sample (i.e., a sample that is
    similar to the population on all characteristics,
    except that it includes fewer people because it
    is a sample rather than the complete population).
  • In other words, a representative sample is a
    "mirror image" of the population from which it
    was selected.  

13
(No Transcript)
14
Why Sample?
  • First, it is usually too costly to test the
    entire population
  • The second reason to sample is that it may be
    impossible to test the entire population
  • The third reason to sample is that testing the
    entire population often produces error. Thus,
    sampling may be more accurate.

15
Why Sample?
  • The final reason to sample is that testing may be
    destructive.
  • you probably would not want to buy a car that
    had the door slammed five hundred or a thousand
    times or had been crash tested. Rather, you
    probably would want to purchase the car that did
    not make it into either of those samples

16
  • Up to here by six 27/08/05

17
How important is sampling?
  • Sampling is important in regards to external
    validity.
  • What is external validity?
  • The extent to which the result of the study can
    be generalized.
  • Two types population and ecological
    generalizibility

18
  • Next lecture begins here

19
Population generalizability
  • The degree to which the sample represent the
    population
  • Look at the usefulness of the study gt small and
    narrowly defined groups findings not useful
  • That is why representativeness is important. We
    want to make the result of the study to be widely
    applicable as possible.
  • You must take appropriate action to make sure the
    findings are generalized to the entire
    population.

20
Ecological generalizability
  • Refers to the degree the result of the study can
    be extended to other settings.
  • Example result from urban school may not be true
    for students from rural schools
  • What we can do here is to describe in detail the
    nature of the environment, setting under which
    the study takes place.

21
  • You cant generalized the effectiveness of a
    method of teaching mathematics to the
    effectiveness of the methods for all subjects.
  • Caution even with the application of powerful
    technique of random sampling, it is quite
    difficult to overcome the problem of ecological
    gerenalizibility.

22
Procedure for Drawing a Sample
  • 1. Define the population. Who is the population
    for each project? e.g., residents of bandar
    Kajang or around Bandar Kajang. Remember, the
    population is the group you want to infer to from
    the sample - define it carefully so it is clear
    who is in, and who is out.
  • 2. Identify the sampling frame the list of
    elements from which the sample may be drawn. It
    is sometimes referred to as the working
    population. e.g., to sample teachers, my
    sampling frame might be a list from the The
    Education Department of Hulu Langat District
  • 3. Select a sampling procedure

23
  • DEfine the population and sample clearly, why?
  • For those interested to determine the
    generalizibility of the findings
  • Not only define the population and sample,
    sampling process has to be clearly defined too.
  • (this one of the common weaknesses in research)

24
  • In non-experimental research, you investigate
    relationships among variables in some pre-defined
    population.
  • Typically, you take elaborate precautions to
    ensure that you have achieved a representative
    sample of that population
  • You define your population, then do your best to
    randomly sample from it.

25
  • The two main types of sampling in quantitative
    research
  • random sampling probability
  • nonrandom sampling. nonprobability
  •  The former produces representative samples.
  •  The latter does not produce representative
    samples.

26
  • In probability samples, each member of the
    population has a known probability of being
    selected.
  • Elements are drawn by chance procedures
  • Probability methods include random sampling,
    systematic sampling, and stratified sampling.
  • In nonprobability sampling, members are selected
    from the population in some nonrandom manner.
    These include convenience sampling, judgment
    sampling, quota sampling, and snowball sampling.

27
  • Probability-based (random) samples
  • These samples are based on probability theory.
    Every unit of the population of interest must be
    identified, and all units must have a known,
    non-zero chance of being selected into the
    sample. Every member of the population has an
    equal chance of being selected
  • (those selected and those who are not are
    similar to one other). The idea here is
    representativeness.
  • How sure are we? That is why it has to be random
    and sufficiently large!!! should have no bias.
    The researcher cannot consciuosly or
    unconsciously influence who will be selected

28
  • The advantage of probability sampling is that
    sampling error can be calculated.
  • Sampling error is the degree to which a sample
    might differ from the population. It the
    difference between population parameter and
    sample statistics
  • (you cant run away from sampling error unless
    you do census)

29
  • When inferring to the population, results are
    reported plus or minus the sampling error.
  • In nonprobability sampling, the degree to which
    the sample differs from the population remains
    unknown.

30
  • Random sampling is the purest form of probability
    sampling. Each member of the population has an
    equal and known chance of being selected.
  • When there are very large populations, it is
    often difficult or impossible to identify every
    member of the population, so the pool of
    available subjects becomes biased
  • RANDOM each element of the population has an
    equal chance of inclusion in the sample.
  • Begin with a SAMPLING FRAME a list of every
    element in the population.

31
Random Sampling Techniques
  • Simple random sampling
  • The first type of random sampling is called
    simple random sampling.  
  • It's the most basic type of random sampling.  
  •  It is an equal probability sampling method
    (EPSM).  
  •   EPSEM means "everyone in the sampling frame has
    an e qual chance of being in the final
    sample."  
  • EPSEM is important because that is what produces
    "representative" samples (i.e., samples that
    represent the populations from which they were
    selected)!

32
Simple random sample
  • Each unit in the population is identified, and
    each unit has an equal chance of being in the
    sample. The selection of each unit is independent
    of the selection of every other unit. Selection
    of one unit does not affect the chances of any
    other unit.

33
(No Transcript)
34
A
A
E
SIMPLE RANDOM
35
  • Sampling experts recommend random sampling
    "without replacement" rather than random sampling
    "with replacement" because the former is a little
    more efficient in producing representative
    samples (i.e., it requires slightly fewer people
    and is therefore a little cheaper).

36
Advantages of the SRS method of sampling
  • Assures good representativeness of sample
    (particularly if large).
  • allows us to make generalizations/inferences. In
    fact, most of the statistical stuff we'll do
    later assumes that we've actually done a simple
    random sample, even if we haven't.
  • avoids biases that are possible in some of the
    other methods we'll talk about.

37
Disadvantages of SRS method
  • Have to have a list/sampling frame.
  • Have to number the list.
  • both are hard to do when the population is large.

38
How do you draw a simple random sample?"
  • One way is to put all the names from your
    population into a hat and then select a subset
    (e.g., pull out 100 names from the hat). 
  • Researchers typically use a computer program that
    randomly selects their samples. One program is
    available at the following address
    http//www.randomizer.org/form.htm .
  • Can use excel to generate random numbers. You
    need as many randomly generated numbers as
    elements in your sample (n).

39
  • To use a computer program (sometimes called a
    random number generator) you must make sure that
    you give each of the people in your population a
    number. Then the program gives you a list of
    randomly selected numbers. Then you identify the
    people with those randomly selected numbers and
    try to get them to participate in your research
    study

40
  • Researchers often use a table of random numbers.
  • You pick a place to start, and then move in one
    direction (e.g., move down the columns).
  • Use the number of digits in the table that is
    appropriate for your population size (e.g., if
    there are 2500 people in the population then use
    4 digits).
  • Once you get the set of randomly selected
    numbers, find out who those people are and try to
    get them to participate in your research study.

41
  • For example, to select a sample of 25 people who
    live in your college dorm,
  • make a list of all the 250 people who live in
    the dorm.
  • Assign each person a unique number, between 1
    and 250. T
  • Then refer to a table of random numbers.
  • Starting at any point in the table, read across
    or down and note every number that falls between
    1 and 250.
  • Use the numbers you have found to pull the names
    from the list that correspond to the 25 numbers
    you found. These 25 people are your sample. This
    is called the table of random numbers method.

42
  • Kita perlu ada frem sampel yang lengkap bagi
    membolehkan kaedah ini diamalkan.
  • Jika tak ada frem yang lengkap, apa nak buat?
  • Kelemahan procedure ini perlukan sampling frem
    yang lengkap.
  • Best sampling procedure !!! dengan andaian
    tertentu.
  • The key to obtaining random sampel is to ensure
    that every member of the population has an equal
    and independence chance of being selected. So
    kita gunakan table of random numbers (more
    scientific)

43
How to use table of random numbers?
  • Say you have 300 sampel to be selected out of
    3000 students.
  • Start anywhere on the table you have chosen
    (possibly secara random)
  • Mulakan membaca nombor 4 digit (why 4 digits gt
    sebabnya the final number 3000 adalah empat
    digit
  • Pilih nombor yang tidak melebihi 3000 sehinggalah
    bilangan sampel yang diperlukan mencukupi
  • What if you come across two similar number? Skip
    the later number and go to the nest number.
  • When selecting the number you can either go
    horizontally or downwards.
  • Do not use simpel random sampling if we wish
    certain subgroups to be in the sample.

44
1 4923 5013 4916 4951 5109 4993 5055 5080 4986 4974
2 4870 4956 5080 5097 5066 5034 4902 4974 5012 5009
3 5065 5014 5034 5057 4902 5061 4942 4946 4960 5019
4 5009 5053 4966 4891 5031 4895 5037 5062 5170 4886
5 5033 4982 5180 5074 4892 4992 5011 5005 4959 4872
6 4976 4993 4932 5039 4965 5034 4943 4932 5116 5070
7 5011 5152 4990 5047 4974 5107 4869 4925 5023 4902
8 5003 5092 5163 4936 5020 5069 4914 4943 4914 4946
9 4860 4899 5138 4959 5089 5047 5030 5039 5002 4937
10 4998 4957 4964 5124 4909 4995 5053 4946 4995 5059
11 4948 5048 5041 5077 5051 5004 5024 4886 4917 5004
12 4958 4993 5064 4987 5041 4984 4991 4987 5113 4882
13 4968 4961 5029 5038 5022 5023 5010 4988 4936 5025
14 5110 4923 5025 4975 5095 5051 5035 4962 4942 4882
15 5094 4962 4945 4891 5014 5002 5038 5023 5179 4852
16 4957 5035 5051 5021 5036 4927 5022 4988 4910 5053
17 5088 4989 5042 4948 4999 5028 5037 4893 5004 4972
18 4970 5034 4996 5008 5049 5016 4954 4989 4970 5014
19 4998 4981 4984 5107 4874 4980 5057 5020 4978 5021
20 4963 5013 5101 5084 4956 4972 5018 4971 5021 4901
45
  • Boleh tak kita memilih secara persampelan rawak
    mudah guru-guru di Malaysia?- rasionale?
  • Kalau tak mampu nak buat simple random sampling ,
    do cluster random, stratified random or
    multi-stage random.
  • Kalau nak pastikan certain sub-group yang
    sememangnya mempunyai banyak berbezaan ciri to be
    included, use stratified random sampling

46
Systematic sampling
  • Is often used instead of random sampling.
  • It is also called an Nth name selection
    technique. After the required sample size has
    been calculated, every Nth record is selected
    from a list of population members.
  • As long as the list does not contain any hidden
    order, this sampling method is as good as the
    random sampling method.
  • Its only advantage over the random sampling
    technique is simplicity. Systematic sampling is
    frequently used to select a specified number of
    records from a computer file.

47
  • Advantages of Systematic Sampling
  • Easier to do than SRS. You don't have to keep
    running back to the random number generator.
  • Disadvantages of Systematic Sampling
  • Still need a list/sampling frame that is
    numbered.
  • Might run into periodicity problem. If the list
    happened to be arranged by class (1,2,3,4), you
    might end up picking all first years. Have to
    make sure the list is not so structured.

48
  Systematic sampling
  • Systematic sampling involves three steps
  • First, determine the sampling interval, which is
    symbolized by "k," (it is the population size
    divided by the desired sample size).
  • Second, randomly select a number between 1 and k,
    and include that person in your sample.
  • Third, also include each kth element in your
    sample. For example if k is 10 and your randomly
    selected number between 1 and 10 was 5, then you
    will select persons 5, 15, 25, 35, 45, etc. When
    you get to the end of your sampling frame you
    will have all the people to be included in your
    sample.

49
  • For example,
  • To select a sample of 25 dorm rooms in your
    college dorm,
  • (1) Make a list of all the room numbers in the
    dorm. Say there are 100 rooms.
  • (2) Divide the total number of rooms (100) by
    the number of rooms you want in the sample (25).
    The answer is 4. This means that you are going to
    select every fourth dorm room from the list. But
    you must first consult a table of random numbers.
  • (3) Pick any point on the table, and read across
    or down until you come to a number between 1 and
    4. This is your random starting point. Say your
    random starting point is "3". This means you
    select dorm room 3 as your first room, and then
    every fourth room down the list (3, 7, 11, 15,
    19, etc.) until you have 25 rooms selected.

50
  • Systematic Sample/Skip Interval Sample
  • 1. Begin with a numbered sampling frame again.
  • 2. Choose your random number.
  • 3. Choose your SAMPLING INTERVAL number in
    population divided by number desired in sample,
    or N/n.
  • 4. Select the element that corresponds to the
    random number. Then instead of picking a second
    random number, etc., count out the interval (N/n)
    and choose that element. When you get to the end
    of the list go back to the beginning until you
    have your full sample.
  • Note, if you get a fraction, round up. If you
    round down, you might not get to the end of the
    list, and those elements at the end will not have
    any probability of inclusion. With rounding up,
    you will always get through the whole list.

51
  •     This method is useful for selecting large
    samples, say 100 or more. It is less cumbersome
    than a simple random sample using either a table
    of random numbers or a lottery method. For
    example, you might have to sample files in a
    large filing cabinet. It is easier to select
    every 17th file than to pull out all the files
    and number them, etc.
  •     However, you must be aware of problems that
    can arise in systematic random sampling. If the
    selection interval matches some pattern in the
    list (e.g., each 4th dorm room is a single unit,
    where all the others are doubles) you will
    introduce systematic bias into your sample.

52
Stratified sampling
  • is commonly used probability method that is
    superior to random sampling because it reduces
    sampling error.
  • A stratum is a subset of the population that
    share at least one common characteristic.
    Examples of stratums might be males and females,
    or managers and non-managers.
  • The researcher first identifies the relevant
    stratums and their actual representation in the
    population.

53
  • Random sampling is then used to select a
    sufficient number of subjects from each stratum.
    "Sufficient" refers to a sample size large enough
    for us to be reasonably confident that the
    stratum represents the population.
  • Stratified sampling is often used when one or
    more of the stratums in the population have a low
    incidence relative to the other stratums

54
Stratified random sampling
  • The population is divided into groupings (or
    strata), (e.g., divide it into the males and the
    females if you are using gender as your
    stratification variable).
  • Take a random sample from each group (i.e., take
    a random sample of males and a random sample of
    females). Put these two sets of people together
    and you now have your final sample.
  • Each unit in the population is identified, and
    each unit has a known, non-zero chance of being
    in the sample. This is used when the researcher
    knows that the population has sub-groups (strata)
    that are of interest.

55
  • For example, if you wanted to find out the
    attitudes of students on your campus about
    immigration, you may want to be sure to sample
    students who are from every region of the country
    as well as foreign students. Say your student
    body of 10,000 students is made up of 8,000
    Middle East 1,000 South East Asia 500 -
    Africa 300 Indian Continent 200 - Others.
  • Improves representativeness in terms of the
    stratification variables

56
Two different types of stratified sampling
  • Proportional stratified sampling
  • In proportional stratified sampling you must make
    sure the subsamples (e.g., the samples of males
    and females) are proportional to their sizes in
    the population.
  • Disproportional stratified sampling.
  • In disproportional stratified sampling, the
    subsamples are not proportional to their sizes in
    the population.

57
(No Transcript)
58
In a population you have 365 students
219 female students (60)
146 male studenst 40
From these you will select A stratified sampel of
66 female (60)
44 Male (40)
and
59
50
25
25
STRATIFIED
25
25
50
25
60
Here is an example
  • Assume that your population is 75 female and 25
    male. Assume also that you want a sample of size
    100 and you want to stratify on the variable
    called gender.
  • For proportional stratified sampling, you would
    randomly select 75 females and 25 males from the
    population.
  • For disproportional stratified sampling, you
    might randomly select 50 females and 50 males
    from the population.

61
  • If you select a simple random sample of 500
    students, you might not get any from the Midwest,
    South, or Foreign.
  • To make sure that you get some students from
    each group, you can divide the students into
    these five groups, and then select the same
    percentage of students from each group using a
    simple random sampling method.
  • This is proportional stratified random sampling.

62
  • However, you may still have too few of some types
    of students. Instead, you may divide students
    into the five groups and then select the same
    number of students from each group using a simple
    random sampling method.
  • This is disproportionate stratified random
    sampling. This allows you to have enough students
    in each sub-group so that you can perform some
    meaningful statistical analyses of the attitudes
    of students in each sub-group.
  • In order to say something about the attitudes of
    the total student population of the university,
    however, you will have to apply weights to the
    findings for each sub-group, proportional to its
    presence in the total student body.

63
How do you divide into stratum ?
  • Level of education, type of occupaion, religious
    affiliation, tempat tinggal dan banyak lagi
  • Misalnya you want to study the attitudes of
    adolescent towards certain issue (clothing
    patterns). You might as well compare those who
    live in small, big towns as well as those who
    live in rural areas.
  • So here you have theree sub-groups.
  • Misalnya lagi. The study is about the
  • EFFECT OF YOUTH PARTICIPATION IN SCHOOL TO WORK
    PROGRAM ON THEIR OCCUPAtIONAL SELF-EFFICACY,
    CAREER DECIDEDNESS, AND EMPLOYABILITTY SKILLS.
  • How many sub groups do we have here?

64
Cluster sampling
  • This is the most commonly used scientific
    sampling method in the social sciences, like
    opinion polling, etc.
  • Randomly select clusters, or pre-existing,
    natural groupings rather than individual type
    units in the first stage of sampling.
  • Use it when you don't have or need a sampling
    frame
  • a list doesn't exist,
  • the list would be too hard to get,
  • or if the population is directly identifiable
    without a list (eg. name of the road/street in
    your area).

65
  • Cluster sampling views the units in a population
    as not only being members of the total population
    but as members also of naturally-occurring in
    clusters within the population. For example, city
    residents are also residents of neighborhoods,
    blocks, and housing structures.

66
  • You can do cluster sampling when the elements of
    the population naturally "cluster" into
    identifiable patterns, like neighborhoods,
    organizations, etc.
  • The assumption is that individuals within a
    cluster will be fairly homogenous. (Talk about
    housing area).
  • You have to come up with your clusters
    carefully!
  • 1. Take the whole population and divide it into a
    bunch of smaller clusters. Number the
    clusters.2. Do a simple random or systematic
    sample of the clusters.3. Divide the chosen
    clusters into smaller ones and number them. 4.
    Repeat 2. And so on until you get to individual
    elements in your sample.

67
A
H
B
G
D
C
F
E
CLUSTER SAMPLING
D
F
A
68
  • Cluster sampling is used in large geographic
    samples where no list is available of all the
    units in the population but the population
    boundaries can be well-defined.
  • For example, to obtain information about the drug
    habits of all high school students in a state,
    you could obtain a list of all the school
    districts in the state and select a simple random
    sample of school districts.
  • Then, within in each selected school district,
    list all the high schools and select a simple
    random sample of high schools. Within each
    selected high school, list all high school
    classes, and select a simple random sample of
    classes.
  • Then use the high school students in those
    classes as your sample.

69
  • Cluster sampling must use a random sampling
    method at each stage. This may result in a
    somewhat larger sample than using a simple random
    sampling method, but it saves time and money.
  • It is also cheaper to administer than a statewide
    sample of high school seniors, because there are
    many fewer sites to obtain information from.

70
  • Ambil jumlah kelompok yang banyak untuk
    menjadikan sampel mempunyai lebih keperwakilan.
  • Lebih baik sampel kelompok kecil dengan banyak
    berbanding kelompok besar dengan bilangan sedikit
    terutama sekali bila banyak variasi dalam
    populasi.
  • Lebih banyak maklumat akan diperolehi dan akan
    memberikan anggaran yang lebih tepat. Misalnya
  • Kalau tidak banyak perbezaan antara sampe dalam
    kelompok, bilangan kelompok yang kecil sudah
    memadai
  • Makin banyak bilangan kelompok yang dipilih makin
    yakin untuk kita mengaplikasikan dapatan kajian
    kepada populasi.

71
  • Kebaikannya boleh digunakan apabila susah atau
    tidak mungkin dapat memilih sampel secara rawak
  • Tidak memerlukan masa yang banyak
  • Kelemahan mungkin akan memilih kelompok yang
    tidak mewakili populasi
  • Kesilapan yang sering berlaku memilih satu
    kelompok sahaja terutama apabila bilangan sampel
    yang terdapat dalam kelompok itu besar. Perlu
    ingat kelompok berkenaan tidak mewakili populasi.
    Perlu juga diketahui bahawa disini kita memilih
    kelompok secara rawak bukannya subjek kajian.
    Oleh itu salah bagi kita membuat inferensi kepada
    populasi. MEMANG SALAH!!!! DAN JANGAN BUAT
    SEBEGITU.

72
  • Advantages of Cluster Sampling method
  • Less costly.
  • Don't need a list.
  • At start everyone has an approximately equal
    chance of selection despite the number of steps
    involved.
  • boleh digunakan apabila susah atau tidak mungkin
    dapat memilih sampel secara rawak
  • Tidak memerlukan masa yang banyak
  • .

73
  • Disadvantages of the Cluster Sample
  • more possibility of introducing error - drawing
    the boundaries, etc.
  • increases with the number of steps involved.
  • Have to figure out a balance between number of
    stages and the number you want in your final
    sample. For instance, we could get a sample of
    2000 Malaysian by picking 2000 clusters and one
    person from each, or we could pick 1000 each from
    2 clusters. I
  • if the clusters aren't drawn well, the second
    method would be unrepresentative. But if the
    single person drawn from the first method was
    weird, it wouldn't matter how good the clusters
    were

74
  • Memerlukan sampel size yang lebih besar
    berbanding rawak mudah atau rawak berstrata
  • Dalam kelompok perlu heterogenus seboleh mungkin
    dan antara kelompok mestilah homogenus seboleh
    mungkin

75
Two types of cluster sampling
  • One-stage cluster sampling
  • To select a one-stage cluster sample, you first
    select a random sample of clusters. 
  • Then, second, you include in your final sample
    all of the individual units in the selected
    clusters
  • Two-stage cluster sampling
  • First, you take a random sample of clusters
    (i.e., just like you did in one-stage cluster
    sampling).
  • Second, you take a random sample of elements from
    each of the selected clusters (e.g., you might
    randomly select 10 students from each of the 15
    classrooms you selected in stage one).

76
  • Multistage Sampling involves combinations of
    stratified and/or clustered and/or simple random
    samples until one reaches the desired unit of
    analysis
  • Allows us to get a random sample without a
    sampling frame.  Example sample from East
    Malaysia
  • Clusterurban / rural and randomly select
    communities

77
  • When random sampling is not possible,
  • Descrtibe the sample as thoroughly as possible so
    that the interested party can judge in making the
    generalizability
  • Do replication

78
  • When random sampling is done, the resulting
    subsets will be "mirror images" of each other
    (except for chance differences).
  • Random assignment
  • you start with a set of people (that may very
    well be a convenience sample) and then randomly
    divide that set of people into two or more
    subsets. You are taking a set of people and
    "assigning" them to two or more groups.
  • For example, if you randomly assign a convenience
    sample of 150 people to three groups of 50
    people, the three groups will be "equivalent" on
    all known and unknown variables. In short, random
    assignment generates similar groups that can be
    used in strong experimental research designs

79
Non-probability (non-random) samples
  • These samples focus on volunteers, easily
    available units, or those that just happen to be
    present when the research is done.
  • Non-probability samples are useful for quick and
    cheap studies, for case studies, for qualitative
    research, for pilot studies, and for developing
    hypotheses for future research.

80
Nonrandom Sampling Techniques
81
Convenient sample
  • Also called an "accidental" sample. These are
    the ones like "man on the street interviews," The
    researcher selects units that are convenient,
    close at hand, easy to reach, etc. Ex. Choose
    whoever walks by. If you looked at folks' clothes
    at the bus terminal, that means you have done a
    convenience sampling.

82
  • Advantages of convenience samples
  • easy
  • cheap
  • some possibility of substantive inference, if you
    can justify, but not statistical inference. Ex
    many psych. studies are done with college
    students as subjects.
  • If the researcher can make the case that the
    college students are like other people in the
    relevant characteristics, then it's OK, but you
    can't use the concept of statistical inference.
  • Disadvantages of ALL non-scientific samples
  • Can't do statistical inference.

83
Purposive sample
  • the researcher selects the units with some
    purpose in mind or with a specific set of
    characteristics for your research study. For
    example, students who live in dorms on campus, or
    experts on urban development.
  • it involves selecting a convenience sample from a
    population)

84
Judgment sampling
  • is a common nonprobability method.
  • The researcher selects the sample based on
    judgment (according to a specific criteria of
    interest)
  • This is usually and extension of convenience
    sampling. For example, a researcher may decide to
    draw the entire sample from one "representative"
    city, even though the population includes all
    cities.
  • When using this method, the researcher must be
    confident that the chosen sample is truly
    representative of the entire population

85
  • Example of judgemental sampling A wanted to talk
    about ENVONMENT ISSUES , so she sought out people
    who were really involved in KEDAH OVER
    ENVIRONMENT ISSUE. She didn't want to know about
    everyone's opinions on THIS ISSUE, just about the
    activists. Or you might start with one person
    that fits the bill and ask for recommendations of
    other people like her. This is big in studies of
    political elites (ask a staffer to recommend some
    friends, etc.).

86
  • Quota sample the researcher constructs quotas
    for different types of units.
  • Use convenience sampling to obtain those quotas.
    A set of quotas might be as follows to interview
    a fixed number of shoppers at a mall, half of
    whom are male and half of whom are female.
  • Snowball sampling (i.e., it involves asking your
    participants to identify other potential
    participants with a specific set of
    characteristics, then asking the next set of
    participants you obtain the same question, and
    continuing this process until a sufficient sample
    size is obtained).

87
  • It's been shown that the main factor in
    generalizability is not HOW MANY SAMPLE SUBJECTS,
    but rather, THE PROCESS BY WHICH THEY WERE
    SELECTED
  • apakah proses persampelan yang akan kita gunakan?
  • how "typical" or representative is the sample of
    this population?
  • If randomly -- you're hoping you'll get a "good
    enough mix" on other contaminating variables or
    threats to validity so that they will cancel out
    (e.g., mix of ethnicities, I.Q.'s. so forth).
  • how "certain" can you be that the findings from
    the sample will hold true for the entire
    population? Especially if we didnt "study" (e.g.,
    survey or interview) everyone?

88
How often is random sampling done?
  • The answer to this question depends on the
    research method being used. For example, when we
    use the experimental research method it is RARELY
    used.
  • However, if we use the survey method it is
    FREQUENTLY used. The difference is that
    experiments usually require some commitment on
    the part of the subjects. Whereas in survey
    research, the commitment is usually low.
  • Thus, when we randomly contact people in the
    survey research method we will usually get high
    levels of participaton. When we design
    experiments we usually must rely on volunteers.

89
  • Start here for next lecture
  • First three lecture belum ada nota
  • Exam after break
  • Everybody should talk in english

90
BIAS AND ERROR IN SAMPLING
  • A sample is expected to mirror the population
    from which it comes, however, there is no
    guarantee that any sample will be precisely
    representative of the population from which it
    comes. One of the most frequent causes of
    unrepresentative of its population is sampling
    error.
  • Chance may dictate that a disproportionate
    number of untypical observations will be made
    like for the case of testing fuses, the sample of
    fuses may consist of more or less faulty fuses
    than the real population proportion of faulty
    cases.
  • In practice, it is rarely known when a sample is
    unrepresentative and should be discarded.

91
  • Sampling error the differences between the
    sample and the population that are due solely to
    the particular units that happen to have been
    selected.
  • (the difference between the result of a given
    sample and the result of a census conducted using
    identical procedures).
  • Sekiranya kita menggunakan populasi maka ralat
    persampelan tidak akan berlaku. Apabila kita
    menggunakan sampel sudah tentu berlaku perbezaan
    ciri di antara satu unit dengan satu unit yang
    lain.
  • Apabila kita menggunakan sampel ralat tetap wujud
    cuma ianya berbeza dari segi saiz ralat. Ralat
    persampeln di sebabkan oleh dua faktor uama iaitu
    (1) saiz sampel yang digunakan dan (2) kaedah
    persampelan yang digunakan.

92
  • For example, suppose that a sample of 100 women
    are measured and are all found to be taller than
    5.6 feet. It is very clear even without any
    statistical prove that this would be a highly
    unrepresentative sample leading to invalid
    conclusions.
  • The more dangerous error is the less obvious
    sampling error against which nature offers very
    little protection. An example would be like a
    sample in which the average height is overstated
    by only one inch or two rather than one foot
    which is more obvious. It is the unobvious error
    that is of much concern.

93
SAMPEL 1
STUDENTS NO. SEX SCHOOL IQ
52 M B 110
63 F B 83
82 F C 105
75 M C 113
92 F C 98
36 F B 129
03 F A 130
11 F A 117
43 M B 117
08 M A 120
MIN IQ OF POPOLATION (100 students) 109.5 MIN
IQ OF SAMPEL (10 students 112.2
94
SAMPEL 2
STUDENTS NO. SEX SCHOOL IQ
72 F C 121
64 F C 137
94 F C 96
49 M B 111
41 M B 125
20 F A 104
05 F A 123
93 M C 97
14 M A 111
99 M C 83
MIN IQ OF POPOLATION (100 students) 109.5 MIN
IQ OF SAMPEL1 (10 students) 112.2 MIN IQ OF
SAMPEL2 (10 students) 110.8 MIN IQ OF SAMPEL
12 (20 students) 115.5
95
  • TAKE ANOTHER SAMPLE OF 10 STUDENTS, SAMPLE 3

MIN IQ OF POPOLATION (100 students) 109.5 MIN
IQ OF SAMPEL1 (10 students) 112.2 MIN IQ OF
SAMPEL2 (10 students) 110.8 MIN IQ OF SAMPEL
12 (20 students) 115.5 MIN IQ OF SAMPEL3 (10
students) 112.3
96
  • TAKE ANOTHER SAMPLE OF 10 STUDENTS, SAMPLE 4

MIN IQ OF POPOLATION (100 students) 109.5 MIN
IQ OF SAMPEL1 (10 students) 112.2 MIN IQ OF
SAMPEL2 (10 students) 110.8 MIN IQ OF SAMPEL
12 (20 students) 115.5 MIN IQ OF SAMPEL3 (10
students) 112.3 MIN IQ OF SAMPEL4 (10
students) 104.2 MIN IQ OF SAMPEL1234 (40
students) 109.6
97
  • There are two basic causes for sampling error.
    One is chance Unusual units in a population do
    exist and there is always a possibility that an
    abnormally large number of them will be chosen.
    The main protection agaisnt this kind of error is
    to use a large enough sample and
  • sampling bias tendency to favour the selection
    of units that have paticular characteristics .
    Sampling bias is usually the result of a poor
    sampling plan.
  • Sampling bias is a systematic mistake the fault
    of the researcher.

98
  • The major source of sampling biases comes from
    the use of nonprobability sampling techniques. If
    you can't specify the probability that each
    member can be chosen, then the results won't be
    generalizable.
  • Biases comes from
  • Convenience--because they are readily available,
  • Volunteers--probably not like the others who
    didn't volunteer,
  • Judgment sampling--"I think they represent the
    group,"
  • Administrative convenience--when the boss says
    "use this group."
  • If a bias does exist, the researcher must
    describe it fully in the final report.

99
  • Frame error discrepancy between the intended
    target population and the actual population from
    which the sample is drawn
  • It could be due to
  • (a) missing elements - individuals who should be
    on your list but for some reason are not on the
    list.
  • (b) Foreign elements. Elements which should not
    be included in my population and sample appear on
    my sampling list.
  • (c) Duplicates. These are elements who appear
    more than once on the sampling frame.

100
  • Selection error
  • Certain element in the frame have a greater
    chance of falling into the sample than the others
  • The elements may be listed more than once on
    different lists . (mathematic teachers, school
    grade, gender list)

101
(No Transcript)
102
"How big should my sample be?"
  • Here are my four "simple" answers to your
    question
  •  Try to get as big of a sample as you can for
    your study (i.e., because the bigger the sample
    the better).
  •  If your population is size 100 or less, then
    include the whole population rather than taking a
    sample (i.e., don't take a sample include the
    whole population).
  • Look at other studies in the research literature
    and see how many they are selecting.
  • For an exact number, just look at tables which
    show recommended sample sizes.

103
  • You also need to understand that there are many
    times when you will need larger rather than
    smaller samples. You will need larger samples
    when.

104
  • When the population is very heterogeneous.
  • When you want to breakdown the data into multiple
    categories.
  • When you want a relatively narrow confidence
    interval (e.g., note that the estimate that 75
    of teachers support a policy plus or minus 4 is
    more narrow than the estimate of 75 plus or
    minus 5).

105
  • When you expect a weak relationship or a small
    effect.
  • When you use a less efficient technique of random
    sampling (e.g., cluster sampling is less
    efficient than stratified sampling).
  • When you expect to have a low response rate. The
    response rate is the percentage of people in your
    sample who agree to be in your study.

106
  • To estimate a sample size, a researcher must
  • estimate the standard deviation of the
    population homogeneity of the population
  • allowable amounts of error Degree of precision
    desired juga disebut margin of error atau
    ketepatan yang diperlukan
  • determine a confidence interval

107
  • Degree of precision desired also known as margin
    of error or precision required.
  • (3, 4, 5, 10) we choose the disired
    precision.
  • how close the estimate should fall to the
    parameter.
  • (exactness of prediction). If possible the sample
    statistics must be equal to population parameter,
    but is impossible. We do not take measurement
    from every unit. Thus, there is a sampling error.

108
SAMPLE SIZE CRITERIA
  • In addition to the purpose of the study and
    population size, the risk of selecting a "bad"
    sample, and the allowable sampling error,
  • Three criteria usually will need to be specified
    to determine the appropriate sample size
  • level of precision,
  • level of confidence or risk, and
  • degree of variability in the attributes being
    measured

109
Degree Of Variability
  • the degree of variability in the attributes being
    measured refers to the distribution of attributes
    in the population.
  • The more heterogeneous a population, the larger
    the sample size required to obtain a given level
    of precision.
  • The less variable (more homogeneous) a
    population, the smaller the sample size. Note
    that a proportion of 50 indicates a greater
    level of variability than either 20 or 80.
  • This is because 20 and 80 indicate that a large
    majority do not or do, respectively, have the
    attribute of interest. Because a proportion of .5
    indicates the maximum variability in a
    population, it is often used in determining a
    more conservative sample size, that is, the
    sample size may be larger than if the true
    variability of the population attribute were
    used.

110
The Level Of Precision
  • The level of precision, sometimes called sampling
    error, is the range in which the true value of
    the population is estimated to be.
  • This range is often expressed in percentage
    points, (e.g., 5 percent), in the same way that
    results for political campaign polls are reported
    by the media.
  • Thus, if a researcher finds that 60 of farmers
    in the sample have adopted a recommended practice
    with a precision rate of 5, then he or she can
    conclude that between 55 and 65 of farmers in
    the population have adopted the practice.

111
  • misalnya IQ pelajar ialah 100 sd15. Jika kita
    mengambil cerapan daripada 25 sampel, ralat
    persampelnnya ialah
  • 15/v253.0
  • jadi skor IQ pelajar berada pada misalnya 100
    3.0
  • tetapi sekiranya saiz sampel ditingkatkan menjadi
    40
  • maka ralat persampelan menjadi
  • 15/v402.3
  • skor IQ pelajar sekarang berada dalam lengkongan
    100 2.3
  • jika kita tambal saiz sampel menjadi 100
  • ralat persampelan menjadi
  • 15/v1001.5
  • skor IQ pelajar sekarang berada dalam lengkongan
    100 1.5
  • Ini bermakna semakin besar saiz sampel semakin
    kurang ralat persampelan dan min sampel akan
    menghampiri min populasi

112
  • Once these are known, the formula for calculating
    sample size is
  • n (ZS )2
  • E
  • where...
  • Z standardized value that corresponds to the
    confidence level
  • S sample standard deviation
  • E acceptable magnitude of error
  • Suppose a researcher studying annual
    expenditures on books wishes to have a 95
    confidence interval (Z1.96) and a range of error
    (E) of less than 2, and an estimate of the
    standard deviation is 29.
  • n (ZS )2 (1.96)(29)/22 808
  • E
  • If we change the range of acceptable error to 4,
    sample size falls
  • n (1.96)(29)/4 202

113
  • Suppose you wanted to estimate the same size for
    a survey which contains the following question
  • What is your overall attitude towards Hospital X?
    Very Good 7 6 5 4 3 2 1 Very Poor
  • The range of acceptable error is 0.1 points, the
    confidence level is 95, and the estimated
    standard deviation is 1.
  • n (ZS )2 (1.96)(1)/.12 384
  • E
  • If you increase the acceptable error to 0.2,
  • the sample size drops to n 96

114
The Confidence Level
  • The confidence or risk level is based on ideas
    encompassed under the Central Limit Theorem.
  • The key idea encompassed in the Central Limit
    Theorem is that when a population is repeatedly
    sampled, the average value of the attribute
    obtained by those samples is equal to the true
    population value.
  • Furthermore, the values obtained by these samples
    are distributed normally about the true value,
    with some samples having a higher value and some
    obtaining a lower score than the true population
    value.
  • In a normal distribution, approximately 95 of
    the sample values are within two standard
    deviations of the true population value (e.g.,
    mean).

115
  • In other words, this means that, if a 95
    confidence level is selected, 95 out of 100
    samples will have the true population value
    within the range of precision specified earlier
  • There is always a chance that the sample you
    obtain does not represent the true population
    value.
  • Such samples with extreme values are represented
    by the shaded areas.
  • This risk is reduced for 99 confidence levels
    and increased for 90 (or lower) confidence
    levels.

116
  • A "p" value of .05 is a commonly used
    significance level. When we say "the results are
    significant at the .05 level," we have a 95
    probability that the differences we observed in
    the data were not due to chance. With a "p" value
    of .01, we have a 99 probability that the
    differences were not due to chance.

117
  • For example, if your population is 500,000, and
    you want to be 95 confident that your data are
    representative of your population to 1
    accuracy, 9,423 returned surveys are required.
    (If you expect a response rate between 30 - 40,
    you must then estimate the number for your
    initial mailing at approximately 29,000.)
  • If you want to be 99 confident that your data is
    accurate to within 1, you need 16,057 surveys
    returned.
  • This number can drastically drop if you need 95
    confidence that your data is accurate to 5.
    With those parameters, 384 returned surveys will
    be adequate.

118
  • Population mean 109.5
  • Sample1 mean of 10 students 112.2
  • Sampel2 mean of 10 students 110.8
  • Sample(12) mean of 20 students 111.5
  • Sample 3 mean 104.4
  • Sample 4 mean 91.2
  • Sample mean of 34 97.8
  • All four samples 109.6

119
  • In general, it is safe to assume that
  • Sample sizes will need to increase as the size of
    the confidence interval decreases.
  • Sample sizes will need to increase as the level
    of statistical significance decreases.
  • Sample sizes will need to increase as population
    increases.
  • The 'p' value .05 is often used to calculate the
    sample size you need and set thresholds of
    statistical significance.
  • The term "significance" does not mean
    "important," but rather is a measure of
    confidence when analyzing the data.

120
  • In completing this discussion of determining
    sample size, there are three additional issues.
  • First, the above approaches to determining sample
    size have assumed that a simple random sample is
    the sampling design.
  • More complex designs, e.g., stratified random
    samples, must take into account the variances of
    subpopulations, strata, or clusters before an
    estimate of the variability in the population as
    a whole can be made.

121
  • Another consideration with sample size is the
    number needed for the data analysis.
  • If descriptive statistics are to be used, e.g.,
    mean, frequencies, then nearly any sample size
    will suffice.
  • On the other hand, a good size sample, e.g.,
    200-500, is needed for multiple regression,
    analysis of covariance, or log-linear analysis,
    which might be performed for more rigorous state
    impact evaluations.
  • The sample size should be appropriate for the
    analysis that is planned.

122
  • In addition, an adjustment in the sample size may
    be needed to accommodate a comparative analysis
    of subgroups (e.g., such as an evaluation of
    program participants with nonparticipants).
  • Sudman (1976) suggests that a minimum of 100
    elements is needed for each major group or
    subgroup in the sample and for each minor
    subgroup, a sample of 20 to 50 elements is
    necessary.
  • Similarly, Kish (1965) says that 30 to 200
    elements are sufficient when the attribute is
    present 20 to 80 percent of the time (i.e., the
    distribution approaches normality).
  • On the other hand, skewed distributions can
    result in serious departures from normality even
    for moderate size samples (Kish, 196517). Then a
    larger sample or a census is required.

123
  • Finally, the sample size formulas provide the
    number of responses that need to be obtained.
    Many researchers commonly add 10 to the sample
    size to compensate for persons that the
    researcher is unable to contact.
  • The sample size also is often increased by 30 to
    compensate for nonresponse. Thus, the number of
    mailed surveys or planned interviews can be
    substantially larger than the number required for
    a desired level of confidence and precision.

124
STRATEGIES FOR DETERMINING SAMPLE SIZE
  • There are several approaches to determining the
    sample size. These include using a census for
    small populations, imitating a sample size of
    similar studies, using published tables, and
    applying formulas to calculate a sample size.

125
  • Using A Census For Small Populations
  • One approach is to use the entire population as
    the sample. Although cost considerations make
    this impossible for large populations, a census
    is attractive for small populations (e.g., 200 or
    less).
  • A census eliminates sampling error and provides
    data on all the individuals in the population. In
    addition, some costs such as questionnaire design
    and developing the sampling frame are "fixed,"
    that is, they will be the same for samples of 50
    or 200.
  • Finally, virtually the entire population would
    have to be sampled in small populations to
    achieve a desirable level of precision

126
  • Using A Sample Size Of A Similar Study
  • Another approach is to use the same sample size
    as those of studies similar to the one you plan.
  • Without reviewing the procedures employed in
    these studies you may run the risk of repeating
    errors that were made in determining the sample
    size for another study.
  • However, a review of the literature in your
    discipline can provide guidance about "typical"
    sample sizes which are used.

127
  • Using Published Tables
  • A third way to determine sample size is to rely
    on published tables which provide the sample size
    for a given set of criteria. Some tables present
    sample sizes that would be necessary for given
    combinations of precision, confidence levels, and
    variability. Please note two things.
  • First, these sample sizes reflect the number of
    obtained responses, and not necessarily the
    number of surveys mailed or interviews planned
    (this number is often increased to compensate for
    nonresponse).
  • Second, the sample sizes presume that the
    attributes being measured are distributed
    normally or nearly so. If this assumption cannot
    be met, then the entire population may need to be
    surveyed.

128
  • Using Formulas To Calculate A Sample Size
  • Although tables can provide a useful guide for
    determining the sample size, you may need to
    calculate the necessary sample size for a
    different combination of levels of precision,
    confidence, and variability. The fourth approach
    to determining sample size is the application of
    one of several formulas was used to calculate the
    sample sizes

n N ---------- 1
N(e)2
Write a Comment
User Comments (0)
About PowerShow.com