Title: Stat 1510 Statistical Thinking
1Stat 1510Statistical Thinking Concepts
2Data
- Primary Data is the data collected by the
investigator conducting the research / study with
a specific purpose. -
- Secondary Data - is data collected by someone
other than the user for the same or different
purpose.
3Population
- Researchers often want to answer questions about
some large group of individuals (this group is
called the population) - Population is a set of units. This population is
potentially infinite or even hypothetical. - If the time that unit are measured is important,
then the population is often called process. - So in analysis, it is important to be clear about
what is the definition of population.
4Population Sample
- We consider three type of units.
- The target population is the set of units to
which the investigators set out to investigate in
the definition of the problem - The study population is the set of units that
could have been in the sample - The sample which is the set of units actually
selected for the investigation. The total number
of units in the sample is called sample size and
the way that the samples are selected is called
sampling protocol or sampling design.
5Population Sample
6Example
- Faculty of Science of Memorial University want
to know the opinion of students on the university
facilities. For this purpose, they conducted a
survey by selecting a random sample of 200
students survey registered for Winter 2013. - Identify target population, study population
sample sampling unit
7Example
- Target Population Faculty of Science, MUN,
Students - Study Population All students registered for
Winter 2013 in FoS of MUN - Sample 200 students selected for this survey
- Sampling Unit Each selected student
8Bad Sampling Designs
- Voluntary response sampling
- allowing individuals to choose to be in the
sample - Convenience sampling
- selecting individuals that are easiest to reach
- Both of these techniques are biased
- systematically favor certain outcomes
9Voluntary Response
- To prepare for her book Women and Love, Shere
Hite sent questionnaires to 100,000 women asking
about love, sex, and relationships. - 4.5 responded
- Hite used those responses to write her book
- Moore (Statistics Concepts and Controversies,
1997) noted - respondents were fed up with men and eager to
fight them - the anger became the theme of the book
- but angry women are more likely to respond
10Convenience Sampling
- Sampling mice from a large cage to study how a
drug affects physical activity - lab assistant reaches into the cage to select the
mice one at a time until 10 are chosen - Which mice will likely be chosen?
- could this sample yield biased results?
11Purposive Sampling
- Consider the selection of football team or soccer
team. - Consider selection of students for a math skill
competition - In the above sampling scheme, we select the
sampling units with a well defined purpose and
samples are not randomly picked.
12Simple Random Sampling
- Each individual in the population has the same
chance of being chosen for the sample - Each group of individuals (in the population) of
the required size (n) has the same chance of
being the sample actually selected - Random selection
- drawing names out of a hat
- table of random digits
- computer software
13Table of Random Digits
- Table B on pg. 692 of text
- each entry is equally likely to be any of the 10
digits 0 through 9 - entries are independent of each other (knowledge
of one entry gives no information about any other
entries) - each pair of entries is equally likely to be any
of the 100 pairs 00, 01,, 99 - each triple of entries is equally likely to be
any of the 1000 values 000, 001, , 999
14Choosing a Simple Random Sample (SRS)
- STEP 1 Label each individual in the population
- STEP 2 Use Table B to select labels at random
15 Simple Random Sample with and without
replacement
- Case 1 In without replacement, each selected
sampling unit will not replaced back to the
population. - Case 2 In with replacement, each sampled unit
will be replaced back to the population.
16Probability Sample
- a sample chosen by chance
- must know what samples are possible and what
chance, or probability, each possible sample has
of being selected - a SRS gives each member of the population an
equal chance to be selected
17Stratified Random Sample
- first divide the population into groups of
similar individuals, called strata - second, choose a separate SRS in each stratum
- third, combine these SRSs to form the full sample
18Stratified Random SampleExample
- Suppose a university has the following student
demographics - Undergraduate Graduate First Professional
Special - 55 20
5 20
A stratified random sample of 100 students could
be chosen as follows select a SRS of 55
undergraduates, a SRS of 20 graduates, a SRS of 5
first professional students, and a SRS of 20
special students combine these 100 students.
19Stratified Random SampleExample
- We would like to take a sample to represent
Canadian population - We have different provinces and we wish
represent the all provinces should be represented
in the sample -
A stratified random sample of 1000 people could
be chosen as follows From each province, we
select random samples. Since population in each
province differ heavily, samples from each
province should be proportional to its population.
20Multistage Sample
- several stages of sampling are carried out
- useful for large-scale sample surveys
- samples at each stage may be SRSs, but are often
stratified - stages may involve other random sampling
techniques as well (cluster, systematic, random
digit dialing, )
21Cautions about Sample Surveys
- Undercoverage
- some individuals or groups in the population are
left out of the process of choosing the sample - Nonresponse
- individuals chosen for the sample cannot be
contacted or refuse to cooperate/respond - Response bias
- behavior of respondent or interviewer may lead to
inaccurate answers or measurements - Wording of questions
- confusing or leading (biased) questions words
with different meanings
22Nonresponse
- To prepare for her book Women and Love, Shere
Hite sent questionnaires to 100,000 women asking
about love, sex, and relationships. - 4.5 responded
- Hite used those responses to write her book
- angry women are more likely to respond
23Response Bias
- A door-to-door survey is being conducted to
determine drug use (past or present) of members
of the community. Respondents may give socially
acceptable answers (maybe not the truth!) - For this survey on drug use, would it matter if a
police officer is conducting the interview?
(bias from interviewer)
24Asking the UninformedWashington Post National
Weekly Edition (April 10-16, 1995, p. 36)
Response Bias
- A 1978 poll done in Cincinnati asked people
whether they favored or opposed repealing the
1975 Public Affairs Act. - There was no such act!
- About one third of those asked expressed an
opinion about it.
25Wording of Questions
A newsletter distributed by a politician to his
constituents gave the results of a nationwide
survey on Americans attitudes about a variety of
educational issues. One of the questions asked
was, Should your legislature adopt a policy to
assist children in failing schools to opt out of
that school and attend an alternative
school--public, private, or parochial--of the
parents choosing? From the wording of this
question, can you speculate on what answer was
desired? Explain.
26Wording Deliberate Bias
- If you found a wallet with 20 in it, would you
return the money? - If you found a wallet with 20 in it, would you
do the right thing of returning the money?
27Wording Unintentional Bias
- I have taught several students over the past few
years. - How many students do you think I have taught?
- How many years am I referring to?
- Over the past few days, how many servings of
fruit have you eaten? - How many days are you considering?
- What constitutes a serving?
28Wording Unnecessary Complexity
- Do you sometimes find that you have arguments
with your family members and co-workers? - Arguments with family members
- Arguments with co-workers
29Wording Ordering of Questions
- How often do you normally go out on a date?
about ___ times a month. - How happy are you with life in general?
- Strong association between these questions.
- If the ordering is reversed, then there would be
no strong association between these questions
30Inferences about the Population
- Values calculated from samples are used to make
conclusions (inferences) about unknown values in
the population - Variability
- different samples from the same population may
yield different results for a particular value of
interest - estimates from random samples will be closer to
the true values in the population if the samples
are larger - how close the estimates will likely be to the
true values can be calculated -- this is called
the margin of error