Title: Where do you get what you are telling us?
1Where do you get what you are telling us?
2- Sources of Data
- where do you get the data? And from whom should
the data be collected? - Clearly, your data should come from the
participants that are both available to you and
relevant to the question you are studying
3- there are times when we aren't very concerned
about generalizing. - we're just evaluating a program in a local agency
and we don't care whether the program would work
with other people in other places and at other
times. - In that case, sampling and generalizing might
not be of interest
4- "Who do you want to generalize to?" Or should it
be "To whom do you want to generalize?" - In most applied social research, we are
interested in generalizing to specific groups. - The group you wish to generalize to is often
called the population - This is the group you would like to sample from
because this is the group you are interested in
generalizing to
5Some examples of population
- All secondary school principals in Malaysia
- All primary school counselors in the state of
Sabah - All students attending Kolej Tunku Kursiah during
the academic year 2004-2005 - All students in Mrs. Amin form two at SMKA
6- Let's imagine that you wish to generalize to
urban homeless males between the ages of 30 and
50 in Malaysia. - If that is the population of interest, you are
likely to have a very hard time developing a
reasonable sampling plan. - You are probably not going to find an accurate
listing of this population, and even if you did,
you would almost certainly not be able to mount a
national sample across hundreds of urban areas.
7- So we probably should make a distinction between
the population you would like to generalize to,
theoretical population or target population - The population that will be accessible to you
accessible population. - In this example, the accessible population might
be homeless males between the ages of 30 and 50
in six selected urban areas in Malaysia
8- A population can be defined as any set of
persons/subjects having a common observable
characteristic. It is the group from which you
were able to randomly sample . - The target population is the group to which the
researcher would like to generalize his or her
results. This defined population has at least one
characteristic that differentiates from other
groups. - The accessible population is the population to
which the researcher has access.
9- Once you've identified the theoretical and
accessible populations, you have to do one more
thing before you can actually draw a sample --
you have to get a list of the members of the
accessible population. - The listing of the accessible population from
which you'll draw your sample is called the
sampling frame. - If you were doing a phone survey and selecting
names from the telephone book, the book would be
your sampling frame. - That wouldn't be a great way to sample because
significant subportions of the population either
don't have a phone or have moved in or out of the
area since the last book was printed.
10- Finally, you actually draw your sample (using one
of the many sampling procedures). The sample is
the group of people who you select to be in your
study. - Sampling refers to drawing a sample (a subset)
from a population (the full set). is the act,
process, or technique of selecting a suitable
sample, or a representative part of a population
for the purpose of determining parameters or
characteristics of the whole population. - Samples are measured in order to make
generalisations about populations. Ideally,
samples are selected, usually by some random
process, so that they represent the population of
interest.
11Topic if investigatiion The effect of computer
assisted instruction on The reading achievement
of first and second graders in Malaysia
Target population All first and second graders in Malaysia
Accessible population All first and second graders in Selangor
Sample 384 first and second graders in the state of Selangor
12- The usual goal in sampling is to produce a
representative sample (i.e., a sample that is
similar to the population on all characteristics,
except that it includes fewer people because it
is a sample rather than the complete population).
- In other words, a representative sample is a
"mirror image" of the population from which it
was selected.
13(No Transcript)
14Why Sample?
- First, it is usually too costly to test the
entire population - The second reason to sample is that it may be
impossible to test the entire population - The third reason to sample is that testing the
entire population often produces error. Thus,
sampling may be more accurate.
15Why Sample?
- The final reason to sample is that testing may be
destructive. - you probably would not want to buy a car that
had the door slammed five hundred or a thousand
times or had been crash tested. Rather, you
probably would want to purchase the car that did
not make it into either of those samples
16- Up to here by six 27/08/05
17How important is sampling?
- Sampling is important in regards to external
validity. - What is external validity?
- The extent to which the result of the study can
be generalized. - Two types population and ecological
generalizibility
18 19Population generalizability
- The degree to which the sample represent the
population - Look at the usefulness of the study gt small and
narrowly defined groups findings not useful - That is why representativeness is important. We
want to make the result of the study to be widely
applicable as possible. - You must take appropriate action to make sure the
findings are generalized to the entire
population.
20Ecological generalizability
- Refers to the degree the result of the study can
be extended to other settings. - Example result from urban school may not be true
for students from rural schools - What we can do here is to describe in detail the
nature of the environment, setting under which
the study takes place.
21- You cant generalized the effectiveness of a
method of teaching mathematics to the
effectiveness of the methods for all subjects. - Caution even with the application of powerful
technique of random sampling, it is quite
difficult to overcome the problem of ecological
gerenalizibility.
22Procedure for Drawing a Sample
- 1. Define the population. Who is the population
for each project? e.g., residents of bandar
Kajang or around Bandar Kajang. Remember, the
population is the group you want to infer to from
the sample - define it carefully so it is clear
who is in, and who is out. - 2. Identify the sampling frame the list of
elements from which the sample may be drawn. It
is sometimes referred to as the working
population. e.g., to sample teachers, my
sampling frame might be a list from the The
Education Department of Hulu Langat District - 3. Select a sampling procedure
23- DEfine the population and sample clearly, why?
- For those interested to determine the
generalizibility of the findings - Not only define the population and sample,
sampling process has to be clearly defined too. - (this one of the common weaknesses in research)
24- In non-experimental research, you investigate
relationships among variables in some pre-defined
population. - Typically, you take elaborate precautions to
ensure that you have achieved a representative
sample of that population - You define your population, then do your best to
randomly sample from it.
25- The two main types of sampling in quantitative
research - random sampling probability
- nonrandom sampling. nonprobability
- The former produces representative samples.
- The latter does not produce representative
samples.
26- In probability samples, each member of the
population has a known probability of being
selected. - Elements are drawn by chance procedures
- Probability methods include random sampling,
systematic sampling, and stratified sampling. - In nonprobability sampling, members are selected
from the population in some nonrandom manner.
These include convenience sampling, judgment
sampling, quota sampling, and snowball sampling.
27- Probability-based (random) samples
- These samples are based on probability theory.
Every unit of the population of interest must be
identified, and all units must have a known,
non-zero chance of being selected into the
sample. Every member of the population has an
equal chance of being selected - (those selected and those who are not are
similar to one other). The idea here is
representativeness. - How sure are we? That is why it has to be random
and sufficiently large!!! should have no bias.
The researcher cannot consciuosly or
unconsciously influence who will be selected
28- The advantage of probability sampling is that
sampling error can be calculated. - Sampling error is the degree to which a sample
might differ from the population. It the
difference between population parameter and
sample statistics - (you cant run away from sampling error unless
you do census)
29- When inferring to the population, results are
reported plus or minus the sampling error. - In nonprobability sampling, the degree to which
the sample differs from the population remains
unknown.
30- Random sampling is the purest form of probability
sampling. Each member of the population has an
equal and known chance of being selected. - When there are very large populations, it is
often difficult or impossible to identify every
member of the population, so the pool of
available subjects becomes biased - RANDOM each element of the population has an
equal chance of inclusion in the sample. - Begin with a SAMPLING FRAME a list of every
element in the population.
31Random Sampling Techniques
- Simple random sampling
- The first type of random sampling is called
simple random sampling. - It's the most basic type of random sampling.
- It is an equal probability sampling method
(EPSM). - EPSEM means "everyone in the sampling frame has
an e qual chance of being in the final
sample." - EPSEM is important because that is what produces
"representative" samples (i.e., samples that
represent the populations from which they were
selected)!
32Simple random sample
-
- Each unit in the population is identified, and
each unit has an equal chance of being in the
sample. The selection of each unit is independent
of the selection of every other unit. Selection
of one unit does not affect the chances of any
other unit.
33(No Transcript)
34A
A
E
SIMPLE RANDOM
35- Sampling experts recommend random sampling
"without replacement" rather than random sampling
"with replacement" because the former is a little
more efficient in producing representative
samples (i.e., it requires slightly fewer people
and is therefore a little cheaper).
36Advantages of the SRS method of sampling
- Assures good representativeness of sample
(particularly if large). - allows us to make generalizations/inferences. In
fact, most of the statistical stuff we'll do
later assumes that we've actually done a simple
random sample, even if we haven't. - avoids biases that are possible in some of the
other methods we'll talk about.
37Disadvantages of SRS method
- Have to have a list/sampling frame.
- Have to number the list.
- both are hard to do when the population is large.
38How do you draw a simple random sample?"
- One way is to put all the names from your
population into a hat and then select a subset
(e.g., pull out 100 names from the hat). - Researchers typically use a computer program that
randomly selects their samples. One program is
available at the following address
http//www.randomizer.org/form.htm . - Can use excel to generate random numbers. You
need as many randomly generated numbers as
elements in your sample (n).
39- To use a computer program (sometimes called a
random number generator) you must make sure that
you give each of the people in your population a
number. Then the program gives you a list of
randomly selected numbers. Then you identify the
people with those randomly selected numbers and
try to get them to participate in your research
study
40- Researchers often use a table of random numbers.
- You pick a place to start, and then move in one
direction (e.g., move down the columns). - Use the number of digits in the table that is
appropriate for your population size (e.g., if
there are 2500 people in the population then use
4 digits). - Once you get the set of randomly selected
numbers, find out who those people are and try to
get them to participate in your research study.
41- For example, to select a sample of 25 people who
live in your college dorm, - make a list of all the 250 people who live in
the dorm. - Assign each person a unique number, between 1
and 250. T - Then refer to a table of random numbers.
- Starting at any point in the table, read across
or down and note every number that falls between
1 and 250. - Use the numbers you have found to pull the names
from the list that correspond to the 25 numbers
you found. These 25 people are your sample. This
is called the table of random numbers method.
42- Kita perlu ada frem sampel yang lengkap bagi
membolehkan kaedah ini diamalkan. - Jika tak ada frem yang lengkap, apa nak buat?
- Kelemahan procedure ini perlukan sampling frem
yang lengkap. - Best sampling procedure !!! dengan andaian
tertentu. - The key to obtaining random sampel is to ensure
that every member of the population has an equal
and independence chance of being selected. So
kita gunakan table of random numbers (more
scientific)
43How to use table of random numbers?
- Say you have 300 sampel to be selected out of
3000 students. - Start anywhere on the table you have chosen
(possibly secara random) - Mulakan membaca nombor 4 digit (why 4 digits gt
sebabnya the final number 3000 adalah empat
digit - Pilih nombor yang tidak melebihi 3000 sehinggalah
bilangan sampel yang diperlukan mencukupi - What if you come across two similar number? Skip
the later number and go to the nest number. - When selecting the number you can either go
horizontally or downwards. - Do not use simpel random sampling if we wish
certain subgroups to be in the sample.
441 4923 5013 4916 4951 5109 4993 5055 5080 4986 4974
2 4870 4956 5080 5097 5066 5034 4902 4974 5012 5009
3 5065 5014 5034 5057 4902 5061 4942 4946 4960 5019
4 5009 5053 4966 4891 5031 4895 5037 5062 5170 4886
5 5033 4982 5180 5074 4892 4992 5011 5005 4959 4872
6 4976 4993 4932 5039 4965 5034 4943 4932 5116 5070
7 5011 5152 4990 5047 4974 5107 4869 4925 5023 4902
8 5003 5092 5163 4936 5020 5069 4914 4943 4914 4946
9 4860 4899 5138 4959 5089 5047 5030 5039 5002 4937
10 4998 4957 4964 5124 4909 4995 5053 4946 4995 5059
11 4948 5048 5041 5077 5051 5004 5024 4886 4917 5004
12 4958 4993 5064 4987 5041 4984 4991 4987 5113 4882
13 4968 4961 5029 5038 5022 5023 5010 4988 4936 5025
14 5110 4923 5025 4975 5095 5051 5035 4962 4942 4882
15 5094 4962 4945 4891 5014 5002 5038 5023 5179 4852
16 4957 5035 5051 5021 5036 4927 5022 4988 4910 5053
17 5088 4989 5042 4948 4999 5028 5037 4893 5004 4972
18 4970 5034 4996 5008 5049 5016 4954 4989 4970 5014
19 4998 4981 4984 5107 4874 4980 5057 5020 4978 5021
20 4963 5013 5101 5084 4956 4972 5018 4971 5021 4901
45- Boleh tak kita memilih secara persampelan rawak
mudah guru-guru di Malaysia?- rasionale? - Kalau tak mampu nak buat simple random sampling ,
do cluster random, stratified random or
multi-stage random. - Kalau nak pastikan certain sub-group yang
sememangnya mempunyai banyak berbezaan ciri to be
included, use stratified random sampling
46Systematic sampling
- Is often used instead of random sampling.
- It is also called an Nth name selection
technique. After the required sample size has
been calculated, every Nth record is selected
from a list of population members. - As long as the list does not contain any hidden
order, this sampling method is as good as the
random sampling method. - Its only advantage over the random sampling
technique is simplicity. Systematic sampling is
frequently used to select a specified number of
records from a computer file.
47- Advantages of Systematic Sampling
- Easier to do than SRS. You don't have to keep
running back to the random number generator. - Disadvantages of Systematic Sampling
- Still need a list/sampling frame that is
numbered. -
- Might run into periodicity problem. If the list
happened to be arranged by class (1,2,3,4), you
might end up picking all first years. Have to
make sure the list is not so structured.
48 Systematic sampling
- Systematic sampling involves three steps
- First, determine the sampling interval, which is
symbolized by "k," (it is the population size
divided by the desired sample size). - Second, randomly select a number between 1 and k,
and include that person in your sample. - Third, also include each kth element in your
sample. For example if k is 10 and your randomly
selected number between 1 and 10 was 5, then you
will select persons 5, 15, 25, 35, 45, etc. When
you get to the end of your sampling frame you
will have all the people to be included in your
sample.
49- For example,
- To select a sample of 25 dorm rooms in your
college dorm, - (1) Make a list of all the room numbers in the
dorm. Say there are 100 rooms. - (2) Divide the total number of rooms (100) by
the number of rooms you want in the sample (25).
The answer is 4. This means that you are going to
select every fourth dorm room from the list. But
you must first consult a table of random numbers.
- (3) Pick any point on the table, and read across
or down until you come to a number between 1 and
4. This is your random starting point. Say your
random starting point is "3". This means you
select dorm room 3 as your first room, and then
every fourth room down the list (3, 7, 11, 15,
19, etc.) until you have 25 rooms selected.
50- Systematic Sample/Skip Interval Sample
- 1. Begin with a numbered sampling frame again.
- 2. Choose your random number.
- 3. Choose your SAMPLING INTERVAL number in
population divided by number desired in sample,
or N/n. - 4. Select the element that corresponds to the
random number. Then instead of picking a second
random number, etc., count out the interval (N/n)
and choose that element. When you get to the end
of the list go back to the beginning until you
have your full sample. - Note, if you get a fraction, round up. If you
round down, you might not get to the end of the
list, and those elements at the end will not have
any probability of inclusion. With rounding up,
you will always get through the whole list.
51- This method is useful for selecting large
samples, say 100 or more. It is less cumbersome
than a simple random sample using either a table
of random numbers or a lottery method. For
example, you might have to sample files in a
large filing cabinet. It is easier to select
every 17th file than to pull out all the files
and number them, etc. - However, you must be aware of problems that
can arise in systematic random sampling. If the
selection interval matches some pattern in the
list (e.g., each 4th dorm room is a single unit,
where all the others are doubles) you will
introduce systematic bias into your sample.
52Stratified sampling
- is commonly used probability method that is
superior to random sampling because it reduces
sampling error. - A stratum is a subset of the population that
share at least one common characteristic.
Examples of stratums might be males and females,
or managers and non-managers. - The researcher first identifies the relevant
stratums and their actual representation in the
population.
53- Random sampling is then used to select a
sufficient number of subjects from each stratum.
"Sufficient" refers to a sample size large enough
for us to be reasonably confident that the
stratum represents the population. - Stratified sampling is often used when one or
more of the stratums in the population have a low
incidence relative to the other stratums
54Stratified random sampling
- The population is divided into groupings (or
strata), (e.g., divide it into the males and the
females if you are using gender as your
stratification variable). - Take a random sample from each group (i.e., take
a random sample of males and a random sample of
females). Put these two sets of people together
and you now have your final sample. - Each unit in the population is identified, and
each unit has a known, non-zero chance of being
in the sample. This is used when the researcher
knows that the population has sub-groups (strata)
that are of interest.
55- For example, if you wanted to find out the
attitudes of students on your campus about
immigration, you may want to be sure to sample
students who are from every region of the country
as well as foreign students. Say your student
body of 10,000 students is made up of 8,000
Middle East 1,000 South East Asia 500 -
Africa 300 Indian Continent 200 - Others. - Improves representativeness in terms of the
stratification variables
56Two different types of stratified sampling
- Proportional stratified sampling
- In proportional stratified sampling you must make
sure the subsamples (e.g., the samples of males
and females) are proportional to their sizes in
the population. - Disproportional stratified sampling.
- In disproportional stratified sampling, the
subsamples are not proportional to their sizes in
the population.
57(No Transcript)
58In a population you have 365 students
219 female students (60)
146 male studenst 40
From these you will select A stratified sampel of
66 female (60)
44 Male (40)
and
5950
25
25
STRATIFIED
25
25
50
25
60Here is an example
- Assume that your population is 75 female and 25
male. Assume also that you want a sample of size
100 and you want to stratify on the variable
called gender. - For proportional stratified sampling, you would
randomly select 75 females and 25 males from the
population. - For disproportional stratified sampling, you
might randomly select 50 females and 50 males
from the population.
61- If you select a simple random sample of 500
students, you might not get any from the Midwest,
South, or Foreign. - To make sure that you get some students from
each group, you can divide the students into
these five groups, and then select the same
percentage of students from each group using a
simple random sampling method. - This is proportional stratified random sampling.
62- However, you may still have too few of some types
of students. Instead, you may divide students
into the five groups and then select the same
number of students from each group using a simple
random sampling method. - This is disproportionate stratified random
sampling. This allows you to have enough students
in each sub-group so that you can perform some
meaningful statistical analyses of the attitudes
of students in each sub-group. -
- In order to say something about the attitudes of
the total student population of the university,
however, you will have to apply weights to the
findings for each sub-group, proportional to its
presence in the total student body.
63How do you divide into stratum ?
- Level of education, type of occupaion, religious
affiliation, tempat tinggal dan banyak lagi - Misalnya you want to study the attitudes of
adolescent towards certain issue (clothing
patterns). You might as well compare those who
live in small, big towns as well as those who
live in rural areas. - So here you have theree sub-groups.
- Misalnya lagi. The study is about the
- EFFECT OF YOUTH PARTICIPATION IN SCHOOL TO WORK
PROGRAM ON THEIR OCCUPAtIONAL SELF-EFFICACY,
CAREER DECIDEDNESS, AND EMPLOYABILITTY SKILLS. - How many sub groups do we have here?
64Cluster sampling
- This is the most commonly used scientific
sampling method in the social sciences, like
opinion polling, etc. - Randomly select clusters, or pre-existing,
natural groupings rather than individual type
units in the first stage of sampling. - Use it when you don't have or need a sampling
frame - a list doesn't exist,
- the list would be too hard to get,
- or if the population is directly identifiable
without a list (eg. name of the road/street in
your area).
65- Cluster sampling views the units in a population
as not only being members of the total population
but as members also of naturally-occurring in
clusters within the population. For example, city
residents are also residents of neighborhoods,
blocks, and housing structures.
66- You can do cluster sampling when the elements of
the population naturally "cluster" into
identifiable patterns, like neighborhoods,
organizations, etc. - The assumption is that individuals within a
cluster will be fairly homogenous. (Talk about
housing area). - You have to come up with your clusters
carefully! - 1. Take the whole population and divide it into a
bunch of smaller clusters. Number the
clusters.2. Do a simple random or systematic
sample of the clusters.3. Divide the chosen
clusters into smaller ones and number them. 4.
Repeat 2. And so on until you get to individual
elements in your sample.
67A
H
B
G
D
C
F
E
CLUSTER SAMPLING
D
F
A
68- Cluster sampling is used in large geographic
samples where no list is available of all the
units in the population but the population
boundaries can be well-defined. - For example, to obtain information about the drug
habits of all high school students in a state,
you could obtain a list of all the school
districts in the state and select a simple random
sample of school districts. - Then, within in each selected school district,
list all the high schools and select a simple
random sample of high schools. Within each
selected high school, list all high school
classes, and select a simple random sample of
classes. - Then use the high school students in those
classes as your sample.
69- Cluster sampling must use a random sampling
method at each stage. This may result in a
somewhat larger sample than using a simple random
sampling method, but it saves time and money. - It is also cheaper to administer than a statewide
sample of high school seniors, because there are
many fewer sites to obtain information from.
70- Ambil jumlah kelompok yang banyak untuk
menjadikan sampel mempunyai lebih keperwakilan. - Lebih baik sampel kelompok kecil dengan banyak
berbanding kelompok besar dengan bilangan sedikit
terutama sekali bila banyak variasi dalam
populasi. - Lebih banyak maklumat akan diperolehi dan akan
memberikan anggaran yang lebih tepat. Misalnya - Kalau tidak banyak perbezaan antara sampe dalam
kelompok, bilangan kelompok yang kecil sudah
memadai - Makin banyak bilangan kelompok yang dipilih makin
yakin untuk kita mengaplikasikan dapatan kajian
kepada populasi.
71- Kebaikannya boleh digunakan apabila susah atau
tidak mungkin dapat memilih sampel secara rawak - Tidak memerlukan masa yang banyak
- Kelemahan mungkin akan memilih kelompok yang
tidak mewakili populasi - Kesilapan yang sering berlaku memilih satu
kelompok sahaja terutama apabila bilangan sampel
yang terdapat dalam kelompok itu besar. Perlu
ingat kelompok berkenaan tidak mewakili populasi.
Perlu juga diketahui bahawa disini kita memilih
kelompok secara rawak bukannya subjek kajian.
Oleh itu salah bagi kita membuat inferensi kepada
populasi. MEMANG SALAH!!!! DAN JANGAN BUAT
SEBEGITU.
72- Advantages of Cluster Sampling method
- Less costly.
- Don't need a list.
- At start everyone has an approximately equal
chance of selection despite the number of steps
involved. - boleh digunakan apabila susah atau tidak mungkin
dapat memilih sampel secara rawak - Tidak memerlukan masa yang banyak
- .
73- Disadvantages of the Cluster Sample
- more possibility of introducing error - drawing
the boundaries, etc. - increases with the number of steps involved.
- Have to figure out a balance between number of
stages and the number you want in your final
sample. For instance, we could get a sample of
2000 Malaysian by picking 2000 clusters and one
person from each, or we could pick 1000 each from
2 clusters. I - if the clusters aren't drawn well, the second
method would be unrepresentative. But if the
single person drawn from the first method was
weird, it wouldn't matter how good the clusters
were
74- Memerlukan sampel size yang lebih besar
berbanding rawak mudah atau rawak berstrata -
- Dalam kelompok perlu heterogenus seboleh mungkin
dan antara kelompok mestilah homogenus seboleh
mungkin
75Two types of cluster sampling
- One-stage cluster sampling
- To select a one-stage cluster sample, you first
select a random sample of clusters. - Then, second, you include in your final sample
all of the individual units in the selected
clusters - Two-stage cluster sampling
- First, you take a random sample of clusters
(i.e., just like you did in one-stage cluster
sampling). - Second, you take a random sample of elements from
each of the selected clusters (e.g., you might
randomly select 10 students from each of the 15
classrooms you selected in stage one).
76- Multistage Sampling involves combinations of
stratified and/or clustered and/or simple random
samples until one reaches the desired unit of
analysis - Allows us to get a random sample without a
sampling frame. Example sample from East
Malaysia - Clusterurban / rural and randomly select
communities
77- When random sampling is not possible,
- Descrtibe the sample as thoroughly as possible so
that the interested party can judge in making the
generalizability - Do replication
78- When random sampling is done, the resulting
subsets will be "mirror images" of each other
(except for chance differences). - Random assignment
- you start with a set of people (that may very
well be a convenience sample) and then randomly
divide that set of people into two or more
subsets. You are taking a set of people and
"assigning" them to two or more groups. - For example, if you randomly assign a convenience
sample of 150 people to three groups of 50
people, the three groups will be "equivalent" on
all known and unknown variables. In short, random
assignment generates similar groups that can be
used in strong experimental research designs
79Non-probability (non-random) samples
- These samples focus on volunteers, easily
available units, or those that just happen to be
present when the research is done. - Non-probability samples are useful for quick and
cheap studies, for case studies, for qualitative
research, for pilot studies, and for developing
hypotheses for future research.
80Nonrandom Sampling Techniques
81Convenient sample
- Also called an "accidental" sample. These are
the ones like "man on the street interviews," The
researcher selects units that are convenient,
close at hand, easy to reach, etc. Ex. Choose
whoever walks by. If you looked at folks' clothes
at the bus terminal, that means you have done a
convenience sampling.
82- Advantages of convenience samples
- easy
- cheap
- some possibility of substantive inference, if you
can justify, but not statistical inference. Ex
many psych. studies are done with college
students as subjects. - If the researcher can make the case that the
college students are like other people in the
relevant characteristics, then it's OK, but you
can't use the concept of statistical inference. - Disadvantages of ALL non-scientific samples
- Can't do statistical inference.
83Purposive sample
- the researcher selects the units with some
purpose in mind or with a specific set of
characteristics for your research study. For
example, students who live in dorms on campus, or
experts on urban development. - it involves selecting a convenience sample from a
population)
84Judgment sampling
- is a common nonprobability method.
- The researcher selects the sample based on
judgment (according to a specific criteria of
interest) - This is usually and extension of convenience
sampling. For example, a researcher may decide to
draw the entire sample from one "representative"
city, even though the population includes all
cities. - When using this method, the researcher must be
confident that the chosen sample is truly
representative of the entire population
85- Example of judgemental sampling A wanted to talk
about ENVONMENT ISSUES , so she sought out people
who were really involved in KEDAH OVER
ENVIRONMENT ISSUE. She didn't want to know about
everyone's opinions on THIS ISSUE, just about the
activists. Or you might start with one person
that fits the bill and ask for recommendations of
other people like her. This is big in studies of
political elites (ask a staffer to recommend some
friends, etc.).
86- Quota sample the researcher constructs quotas
for different types of units. - Use convenience sampling to obtain those quotas.
A set of quotas might be as follows to interview
a fixed number of shoppers at a mall, half of
whom are male and half of whom are female. - Snowball sampling (i.e., it involves asking your
participants to identify other potential
participants with a specific set of
characteristics, then asking the next set of
participants you obtain the same question, and
continuing this process until a sufficient sample
size is obtained). -
87- It's been shown that the main factor in
generalizability is not HOW MANY SAMPLE SUBJECTS,
but rather, THE PROCESS BY WHICH THEY WERE
SELECTED - apakah proses persampelan yang akan kita gunakan?
- how "typical" or representative is the sample of
this population? - If randomly -- you're hoping you'll get a "good
enough mix" on other contaminating variables or
threats to validity so that they will cancel out
(e.g., mix of ethnicities, I.Q.'s. so forth). - how "certain" can you be that the findings from
the sample will hold true for the entire
population? Especially if we didnt "study" (e.g.,
survey or interview) everyone?
88How often is random sampling done?
- The answer to this question depends on the
research method being used. For example, when we
use the experimental research method it is RARELY
used. - However, if we use the survey method it is
FREQUENTLY used. The difference is that
experiments usually require some commitment on
the part of the subjects. Whereas in survey
research, the commitment is usually low. - Thus, when we randomly contact people in the
survey research method we will usually get high
levels of participaton. When we design
experiments we usually must rely on volunteers.
89- Start here for next lecture
- First three lecture belum ada nota
- Exam after break
- Everybody should talk in english
90BIAS AND ERROR IN SAMPLING
- A sample is expected to mirror the population
from which it comes, however, there is no
guarantee that any sample will be precisely
representative of the population from which it
comes. One of the most frequent causes of
unrepresentative of its population is sampling
error. - Chance may dictate that a disproportionate
number of untypical observations will be made
like for the case of testing fuses, the sample of
fuses may consist of more or less faulty fuses
than the real population proportion of faulty
cases. - In practice, it is rarely known when a sample is
unrepresentative and should be discarded.
91- Sampling error the differences between the
sample and the population that are due solely to
the particular units that happen to have been
selected. - (the difference between the result of a given
sample and the result of a census conducted using
identical procedures). - Sekiranya kita menggunakan populasi maka ralat
persampelan tidak akan berlaku. Apabila kita
menggunakan sampel sudah tentu berlaku perbezaan
ciri di antara satu unit dengan satu unit yang
lain. - Apabila kita menggunakan sampel ralat tetap wujud
cuma ianya berbeza dari segi saiz ralat. Ralat
persampeln di sebabkan oleh dua faktor uama iaitu
(1) saiz sampel yang digunakan dan (2) kaedah
persampelan yang digunakan.
92- For example, suppose that a sample of 100 women
are measured and are all found to be taller than
5.6 feet. It is very clear even without any
statistical prove that this would be a highly
unrepresentative sample leading to invalid
conclusions. - The more dangerous error is the less obvious
sampling error against which nature offers very
little protection. An example would be like a
sample in which the average height is overstated
by only one inch or two rather than one foot
which is more obvious. It is the unobvious error
that is of much concern.
93SAMPEL 1
STUDENTS NO. SEX SCHOOL IQ
52 M B 110
63 F B 83
82 F C 105
75 M C 113
92 F C 98
36 F B 129
03 F A 130
11 F A 117
43 M B 117
08 M A 120
MIN IQ OF POPOLATION (100 students) 109.5 MIN
IQ OF SAMPEL (10 students 112.2
94SAMPEL 2
STUDENTS NO. SEX SCHOOL IQ
72 F C 121
64 F C 137
94 F C 96
49 M B 111
41 M B 125
20 F A 104
05 F A 123
93 M C 97
14 M A 111
99 M C 83
MIN IQ OF POPOLATION (100 students) 109.5 MIN
IQ OF SAMPEL1 (10 students) 112.2 MIN IQ OF
SAMPEL2 (10 students) 110.8 MIN IQ OF SAMPEL
12 (20 students) 115.5
95- TAKE ANOTHER SAMPLE OF 10 STUDENTS, SAMPLE 3
MIN IQ OF POPOLATION (100 students) 109.5 MIN
IQ OF SAMPEL1 (10 students) 112.2 MIN IQ OF
SAMPEL2 (10 students) 110.8 MIN IQ OF SAMPEL
12 (20 students) 115.5 MIN IQ OF SAMPEL3 (10
students) 112.3
96- TAKE ANOTHER SAMPLE OF 10 STUDENTS, SAMPLE 4
MIN IQ OF POPOLATION (100 students) 109.5 MIN
IQ OF SAMPEL1 (10 students) 112.2 MIN IQ OF
SAMPEL2 (10 students) 110.8 MIN IQ OF SAMPEL
12 (20 students) 115.5 MIN IQ OF SAMPEL3 (10
students) 112.3 MIN IQ OF SAMPEL4 (10
students) 104.2 MIN IQ OF SAMPEL1234 (40
students) 109.6
97- There are two basic causes for sampling error.
One is chance Unusual units in a population do
exist and there is always a possibility that an
abnormally large number of them will be chosen.
The main protection agaisnt this kind of error is
to use a large enough sample and - sampling bias tendency to favour the selection
of units that have paticular characteristics .
Sampling bias is usually the result of a poor
sampling plan. - Sampling bias is a systematic mistake the fault
of the researcher.
98- The major source of sampling biases comes from
the use of nonprobability sampling techniques. If
you can't specify the probability that each
member can be chosen, then the results won't be
generalizable. - Biases comes from
- Convenience--because they are readily available,
- Volunteers--probably not like the others who
didn't volunteer, - Judgment sampling--"I think they represent the
group," - Administrative convenience--when the boss says
"use this group." - If a bias does exist, the researcher must
describe it fully in the final report.
99- Frame error discrepancy between the intended
target population and the actual population from
which the sample is drawn - It could be due to
- (a) missing elements - individuals who should be
on your list but for some reason are not on the
list. - (b) Foreign elements. Elements which should not
be included in my population and sample appear on
my sampling list. - (c) Duplicates. These are elements who appear
more than once on the sampling frame.
100- Selection error
- Certain element in the frame have a greater
chance of falling into the sample than the others - The elements may be listed more than once on
different lists . (mathematic teachers, school
grade, gender list)
101(No Transcript)
102"How big should my sample be?"
- Here are my four "simple" answers to your
question - Try to get as big of a sample as you can for
your study (i.e., because the bigger the sample
the better). - If your population is size 100 or less, then
include the whole population rather than taking a
sample (i.e., don't take a sample include the
whole population). - Look at other studies in the research literature
and see how many they are selecting. - For an exact number, just look at tables which
show recommended sample sizes.
103- You also need to understand that there are many
times when you will need larger rather than
smaller samples. You will need larger samples
when.
104- When the population is very heterogeneous.
- When you want to breakdown the data into multiple
categories. - When you want a relatively narrow confidence
interval (e.g., note that the estimate that 75
of teachers support a policy plus or minus 4 is
more narrow than the estimate of 75 plus or
minus 5).
105- When you expect a weak relationship or a small
effect. - When you use a less efficient technique of random
sampling (e.g., cluster sampling is less
efficient than stratified sampling). - When you expect to have a low response rate. The
response rate is the percentage of people in your
sample who agree to be in your study.
106- To estimate a sample size, a researcher must
- estimate the standard deviation of the
population homogeneity of the population - allowable amounts of error Degree of precision
desired juga disebut margin of error atau
ketepatan yang diperlukan - determine a confidence interval
107- Degree of precision desired also known as margin
of error or precision required. - (3, 4, 5, 10) we choose the disired
precision. - how close the estimate should fall to the
parameter. - (exactness of prediction). If possible the sample
statistics must be equal to population parameter,
but is impossible. We do not take measurement
from every unit. Thus, there is a sampling error.
108SAMPLE SIZE CRITERIA
- In addition to the purpose of the study and
population size, the risk of selecting a "bad"
sample, and the allowable sampling error, - Three criteria usually will need to be specified
to determine the appropriate sample size - level of precision,
- level of confidence or risk, and
- degree of variability in the attributes being
measured
109Degree Of Variability
- the degree of variability in the attributes being
measured refers to the distribution of attributes
in the population. - The more heterogeneous a population, the larger
the sample size required to obtain a given level
of precision. - The less variable (more homogeneous) a
population, the smaller the sample size. Note
that a proportion of 50 indicates a greater
level of variability than either 20 or 80. - This is because 20 and 80 indicate that a large
majority do not or do, respectively, have the
attribute of interest. Because a proportion of .5
indicates the maximum variability in a
population, it is often used in determining a
more conservative sample size, that is, the
sample size may be larger than if the true
variability of the population attribute were
used.
110The Level Of Precision
- The level of precision, sometimes called sampling
error, is the range in which the true value of
the population is estimated to be. - This range is often expressed in percentage
points, (e.g., 5 percent), in the same way that
results for political campaign polls are reported
by the media. - Thus, if a researcher finds that 60 of farmers
in the sample have adopted a recommended practice
with a precision rate of 5, then he or she can
conclude that between 55 and 65 of farmers in
the population have adopted the practice.
111- misalnya IQ pelajar ialah 100 sd15. Jika kita
mengambil cerapan daripada 25 sampel, ralat
persampelnnya ialah - 15/v253.0
- jadi skor IQ pelajar berada pada misalnya 100
3.0 - tetapi sekiranya saiz sampel ditingkatkan menjadi
40 - maka ralat persampelan menjadi
- 15/v402.3
- skor IQ pelajar sekarang berada dalam lengkongan
100 2.3 - jika kita tambal saiz sampel menjadi 100
- ralat persampelan menjadi
- 15/v1001.5
- skor IQ pelajar sekarang berada dalam lengkongan
100 1.5 - Ini bermakna semakin besar saiz sampel semakin
kurang ralat persampelan dan min sampel akan
menghampiri min populasi
112- Once these are known, the formula for calculating
sample size is - n (ZS )2
- E
- where...
- Z standardized value that corresponds to the
confidence level - S sample standard deviation
- E acceptable magnitude of error
- Suppose a researcher studying annual
expenditures on books wishes to have a 95
confidence interval (Z1.96) and a range of error
(E) of less than 2, and an estimate of the
standard deviation is 29. - n (ZS )2 (1.96)(29)/22 808
- E
- If we change the range of acceptable error to 4,
sample size falls - n (1.96)(29)/4 202
113- Suppose you wanted to estimate the same size for
a survey which contains the following question - What is your overall attitude towards Hospital X?
Very Good 7 6 5 4 3 2 1 Very Poor - The range of acceptable error is 0.1 points, the
confidence level is 95, and the estimated
standard deviation is 1. - n (ZS )2 (1.96)(1)/.12 384
- E
- If you increase the acceptable error to 0.2,
- the sample size drops to n 96
114The Confidence Level
- The confidence or risk level is based on ideas
encompassed under the Central Limit Theorem. - The key idea encompassed in the Central Limit
Theorem is that when a population is repeatedly
sampled, the average value of the attribute
obtained by those samples is equal to the true
population value. - Furthermore, the values obtained by these samples
are distributed normally about the true value,
with some samples having a higher value and some
obtaining a lower score than the true population
value. - In a normal distribution, approximately 95 of
the sample values are within two standard
deviations of the true population value (e.g.,
mean).
115- In other words, this means that, if a 95
confidence level is selected, 95 out of 100
samples will have the true population value
within the range of precision specified earlier - There is always a chance that the sample you
obtain does not represent the true population
value. - Such samples with extreme values are represented
by the shaded areas. - This risk is reduced for 99 confidence levels
and increased for 90 (or lower) confidence
levels.
116- A "p" value of .05 is a commonly used
significance level. When we say "the results are
significant at the .05 level," we have a 95
probability that the differences we observed in
the data were not due to chance. With a "p" value
of .01, we have a 99 probability that the
differences were not due to chance.
117- For example, if your population is 500,000, and
you want to be 95 confident that your data are
representative of your population to 1
accuracy, 9,423 returned surveys are required.
(If you expect a response rate between 30 - 40,
you must then estimate the number for your
initial mailing at approximately 29,000.) - If you want to be 99 confident that your data is
accurate to within 1, you need 16,057 surveys
returned. - This number can drastically drop if you need 95
confidence that your data is accurate to 5.
With those parameters, 384 returned surveys will
be adequate.
118- Population mean 109.5
- Sample1 mean of 10 students 112.2
- Sampel2 mean of 10 students 110.8
- Sample(12) mean of 20 students 111.5
- Sample 3 mean 104.4
- Sample 4 mean 91.2
- Sample mean of 34 97.8
- All four samples 109.6
119- In general, it is safe to assume that
- Sample sizes will need to increase as the size of
the confidence interval decreases. - Sample sizes will need to increase as the level
of statistical significance decreases. - Sample sizes will need to increase as population
increases. - The 'p' value .05 is often used to calculate the
sample size you need and set thresholds of
statistical significance. - The term "significance" does not mean
"important," but rather is a measure of
confidence when analyzing the data.
120- In completing this discussion of determining
sample size, there are three additional issues. - First, the above approaches to determining sample
size have assumed that a simple random sample is
the sampling design. - More complex designs, e.g., stratified random
samples, must take into account the variances of
subpopulations, strata, or clusters before an
estimate of the variability in the population as
a whole can be made.
121- Another consideration with sample size is the
number needed for the data analysis. - If descriptive statistics are to be used, e.g.,
mean, frequencies, then nearly any sample size
will suffice. - On the other hand, a good size sample, e.g.,
200-500, is needed for multiple regression,
analysis of covariance, or log-linear analysis,
which might be performed for more rigorous state
impact evaluations. - The sample size should be appropriate for the
analysis that is planned.
122- In addition, an adjustment in the sample size may
be needed to accommodate a comparative analysis
of subgroups (e.g., such as an evaluation of
program participants with nonparticipants). - Sudman (1976) suggests that a minimum of 100
elements is needed for each major group or
subgroup in the sample and for each minor
subgroup, a sample of 20 to 50 elements is
necessary. - Similarly, Kish (1965) says that 30 to 200
elements are sufficient when the attribute is
present 20 to 80 percent of the time (i.e., the
distribution approaches normality). - On the other hand, skewed distributions can
result in serious departures from normality even
for moderate size samples (Kish, 196517). Then a
larger sample or a census is required.
123- Finally, the sample size formulas provide the
number of responses that need to be obtained.
Many researchers commonly add 10 to the sample
size to compensate for persons that the
researcher is unable to contact. - The sample size also is often increased by 30 to
compensate for nonresponse. Thus, the number of
mailed surveys or planned interviews can be
substantially larger than the number required for
a desired level of confidence and precision.
124STRATEGIES FOR DETERMINING SAMPLE SIZE
- There are several approaches to determining the
sample size. These include using a census for
small populations, imitating a sample size of
similar studies, using published tables, and
applying formulas to calculate a sample size.
125- Using A Census For Small Populations
- One approach is to use the entire population as
the sample. Although cost considerations make
this impossible for large populations, a census
is attractive for small populations (e.g., 200 or
less). - A census eliminates sampling error and provides
data on all the individuals in the population. In
addition, some costs such as questionnaire design
and developing the sampling frame are "fixed,"
that is, they will be the same for samples of 50
or 200. - Finally, virtually the entire population would
have to be sampled in small populations to
achieve a desirable level of precision
126- Using A Sample Size Of A Similar Study
- Another approach is to use the same sample size
as those of studies similar to the one you plan. - Without reviewing the procedures employed in
these studies you may run the risk of repeating
errors that were made in determining the sample
size for another study. - However, a review of the literature in your
discipline can provide guidance about "typical"
sample sizes which are used.
127- Using Published Tables
- A third way to determine sample size is to rely
on published tables which provide the sample size
for a given set of criteria. Some tables present
sample sizes that would be necessary for given
combinations of precision, confidence levels, and
variability. Please note two things. - First, these sample sizes reflect the number of
obtained responses, and not necessarily the
number of surveys mailed or interviews planned
(this number is often increased to compensate for
nonresponse). - Second, the sample sizes presume that the
attributes being measured are distributed
normally or nearly so. If this assumption cannot
be met, then the entire population may need to be
surveyed.
128- Using Formulas To Calculate A Sample Size
- Although tables can provide a useful guide for
determining the sample size, you may need to
calculate the necessary sample size for a
different combination of levels of precision,
confidence, and variability. The fourth approach
to determining sample size is the application of
one of several formulas was used to calculate the
sample sizes
n N ---------- 1
N(e)2