Title: Sampling
1Sampling
2BASIC DEFINITIONS
Population aggregate set of all case that conform some set of specifications
Subpopulation population stratum stratum
Population element single member of population
Census count of elements in population, determination of characteristics of the population based on info on all members
Sample selection of some elements of population
Representative sampling plan carries the
insurance that say, 90 of the time (confidence
level) the population estimates based on the
sample differ no more than 5 (margin of error)
from the real value
3Nonprobability sampling
Accidental Samples select the first population elements you encounter. danger underrepresentation of minorities, females, etc.
Quota Sampling accidental sample while taking care to have all strata represented in the sample as in the population. danger own-friends-bias
Purposive Samples pick cases that are judged to be typical of the target population. danger judgement ...
Probability sampling
Simple Random Samples selection based on random numbers such that each population element has equal and independent probability of being sampled
Stratified Random Sample
Cluster Sample
4Determine a proper sample size
5Lambda (?) (Goodman Kruskal, 1954)
To what extent does prediction of rows (columns)
improve if the column (row) is known in a 2?2
contingency table?
Divorce. Divorce.
Low High
Marriage Low 18 5 23
Marriage High 6 19 25
24 24
Marriage rate predicted WITHOUT knowledge of
divorce rate High (25gt23)
Marriage rate predicted WITH knowledge of
divorce rate High if Divorce highLow if
Divorce low
6Generalization correlation between continuous
variables
? To what extent does prediction of rows
(columns) improve if the column (row) is known in
a 2?2 contingency table? Generalization r² To
what extent does prediction of y improve if this
prediction is based on X substituted in the
regression line yaxb than if this prediction
were made without knowing that line?
7Correlation between one continuous variable and
one dichotomous variable
puzzles solved by puzzles solved by
Autocratic Teams Democratic Teams
8 10
10 12
7 9
11 11
12 13
mean 9.6 11.0
y 9.6 1.4x
Regression line
Dummy variable Autocratic 0 Democratic 1
8Computation of r²
If we know the leadership style in a team, then
we can predict their productivity 15 better than
without that knowledge
9r² depends on the difference between the group
averages and the variance within the groups
Higher r²
Higher r²
10Relation between r and t-test
H0 there is no relation between a dichotomous
variable x (group) and a continuous variable y
H0 is not likely to be true if r² is high.
WHAT IS HIGH?
Statistics ? if in the population no relation
exists between x and y then the
samplingdistribution of has a
t-distribution
11Student t-distribution versus Normal distribution
N(0,1)
t(100)
.4
t(5)
.3
.2
Type I error
.1
12Voorbeeld leadership style and productivity
r² for leadersship style and productivity
equals.15(N10 teams, r .39)
Concluson the obtained result is NOT strong
enough to reject the nulhypothesis that states
that leadership style and productivity are
unrelated
WHICH IS NOT THE SAME AS H0 IS TRUE !!
13Classical approach to t-test
14Attention!
Statistically significant?Theoretically relevant
You can always find a sample size N for which you
get a significant test result r.04
r².0016 N3000 yields t 2,19