Title: Collecting Data
1Unit 6
2My cabbages are better
Neighbour shows you a cabbage and claims it is
larger than any you could grow.
- So what? I grow tasty bug-free cabbages, not big
ones. - Anyone can grow one big cabbage, but all mine are
above average. - On my soil it takes real skill to grow cabbages
at all. Anyone can grow cabbages on your soil.
Relevant
Representative
Unambiguous
3My cabbages are better
Neighbour shows you a cabbage and claims it is
larger than any you could grow.
- So what? I grow tasty bug-free cabbages, not big
ones.
- Relevance.
- What do you measure and analyse?
- Practical problem, not specifically statistical
4My cabbages are better
Neighbour shows you a cabbage and claims it is
larger than any you could grow.
- Anyone can grow one big cabbage, but all mine are
above average.
- Representative.
- Decide what objects your conclusions should refer
to. - Measure all or select a sample randomly.
5My cabbages are better
Neighbour shows you a cabbage and claims it is
larger than any you could grow.
- On my soil it takes real skill to grow cabbages
at all. Anyone can grow cabbages on your soil.
- Unambiguous.
- Try to avoid other possible explanations for
what you see. - In an experiment, you can randomly allocate
treatments to a pool of objects.
6Types of data collection
- Samples
- Observational look but dont touch!
- Representative objects from population
- Randomly select objects
- Avoids picking too many of one type
- Experiments
- Actively apply treatment to some objects
- Randomly select objects
- Randomly allocate treatments
7Sampling
- Why Sample?
- Measuring every object of interest is usually
impractical or uneconomic. - Sample must be representative.
- Advantages
- Cheaper
- Does not destroy the population
- Can make more careful observations
- Sampling variability can be estimated
8Sampling
- Selecting a Sample
- The reason for selecting an object must have
nothing to do with the measurement to be made. - Simple random sampling
- Every object has same chance of being selected.
- Ensures that selected objects are similar to
rest. - Statistical procedures guarantee this no basis
for critics to argue.
9Sampling Terminology
- Population (Target Population)
- All of the objects we want to describe or draw
conclusions about. - Sample
- Objects selected from the population.
- Simple Random Sample
- A sample selected by a process that guarantees
that every possible sample has the same chance of
selection.
10Sampling Terminology
- Sampling Frame
- A list from which the sample is to be selected.
- Population Parameter
- A number, like the mean weight of cabbages per
hectare, describing the whole population. - Sample estimate
- A number calculated from a sample, which is
intended to be close to the population parameter.
11Selecting a Sample
- Of sheep ( half a dozen)
- Select typical ones.
- First half dozen caught
- If tagged use random numbers
- Put through race select every 10th (if 60)
NO
NO
OK
OK
- Of Grass
- Choose typical areas
- Mark a grid on a drawing and select cells at
random - Make a quadrat and throw it haphazardly.
NO
OK
Maybe
12Selecting a Sample
- Number objects 1 to n
- Select random numbers
- Excel
- Minitab
- Calculator
- Random number tables
- Reject numbers outside 1 to n
- Reject duplicates
13Sampling from area
- Generate random x and y coordinates
- Design zig-zag path and sample
- every xxx paces
- at random numbers of paces over path
- Throw quadrats to fine-tune locally
14(No Transcript)
15Notation
- N population size
- m population mean
- n sample size
- x1, x2, xi sample values
- sample mean
is an estimate of ? is an estimate of
popn total
16Stratified Sampling
- Population may consist of 2 groups
- Large farms / Small farms
- Flat land / Hilly land
- Males / Females
- Maori / Pakeha
- Children / Young adults / Middle aged / Old
17Stratified Sampling
- Divide the population into strata (groups)
- For each stratum
- Separate simple random sample (SRS)
- Estimate mean in stratum
- Scale up to estimate total for stratum
- Add to get estimate of population total
- If required, divide by N to estimate overall popn
mean
18Stratified sampling
- Provides separate estimates for strata
- More accurate than SRS for popn mean
- Deciding on strata to use
- You must know popn number in each stratum
- Best to use strata whose means are very different
- Sample size in strata?
- Proportional to popn size?
- Better ways? More for variable strata.
19Unambiguous causes from samples
- In survey of farms, those growing Variety A had
higher mean yield than those growing Variety B - Cannot conclude that Variety A is better
Farmers with poor soils chose to grow B because
it is more resistant to drought
An experiment is needed to remove ambiguity
20Experimental Design
- Uses a collection of objects experimental units
- Pots growing tomatoes
- Ideally all experimental units are identical
- Identical soil mix, light, variety of tomato,
- Different treatments applied to units
- Half the pots get additional fertiliser
- Only treatment differs, so differences are caused
by treatment
21Randomisation
- In practice, the experimental units are not
identical - Cows are different
- It takes a long time to weigh a herd during which
they still eat and excrete - Try to make units as similar as possible
- Randomly allocate treatments to units
- Use random numbers to select cow for each
treatment
22Replication
- More than 1 unit for each treatment
- Lets you assess natural variability of units
- Based on s.d. within each treatment
- Lets you assess whether difference between
treatment means is more than random variation
23Blocking
- To improve the experiment, take account of known
differences between units. - Before experiment, split units into blocks.
- Like strata in sampling
- Blocks light levels in greenhouse (pot plants)
- Randomise treatments within each block.
- Same mix of varieties within each block
24Example
- Compare sizes of 4 varieties of cabbage
- Experimental units 24 positions in a plot
- Treatments 4 different varieties
- Randomisation Randomly pick positions for 6
cabbages of each variety in plot.
25Cabbages (cont)
- Plot data
- Find the average cabbage weight for variety A, B,
C and D - Assess variability of the plants and the plots by
looking at the variability for variety A, B, C
and D
26Cabbages Blocking
- We may feel that there is a difference in
fertility of the ground from left to right. - Make each column a block and have one of each
variety per block.
27Reasons for Blocking
- Allows for naturally occurring differences
between the experimental units. - Units are more similar within blocks
- The treatment means are each averaged over the
same blocks. - Treatment effects can be compared without the
effects of the different blocks. - Differences between the treatments can be found
within each block.