Collecting Data - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Collecting Data

Description:

In survey of farms, those growing Variety A had higher mean yield than those growing Variety B ... Pots growing tomatoes. Ideally all experimental units are identical ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 28
Provided by: sivag
Category:

less

Transcript and Presenter's Notes

Title: Collecting Data


1
Unit 6
  • Collecting Data

2
My cabbages are better
Neighbour shows you a cabbage and claims it is
larger than any you could grow.
  • So what? I grow tasty bug-free cabbages, not big
    ones.
  • Anyone can grow one big cabbage, but all mine are
    above average.
  • On my soil it takes real skill to grow cabbages
    at all. Anyone can grow cabbages on your soil.

Relevant
Representative
Unambiguous
3
My cabbages are better
Neighbour shows you a cabbage and claims it is
larger than any you could grow.
  • So what? I grow tasty bug-free cabbages, not big
    ones.
  • Relevance.
  • What do you measure and analyse?
  • Practical problem, not specifically statistical

4
My cabbages are better
Neighbour shows you a cabbage and claims it is
larger than any you could grow.
  • Anyone can grow one big cabbage, but all mine are
    above average.
  • Representative.
  • Decide what objects your conclusions should refer
    to.
  • Measure all or select a sample randomly.

5
My cabbages are better
Neighbour shows you a cabbage and claims it is
larger than any you could grow.
  • On my soil it takes real skill to grow cabbages
    at all. Anyone can grow cabbages on your soil.
  • Unambiguous.
  • Try to avoid other possible explanations for
    what you see.
  • In an experiment, you can randomly allocate
    treatments to a pool of objects.

6
Types of data collection
  • Samples
  • Observational look but dont touch!
  • Representative objects from population
  • Randomly select objects
  • Avoids picking too many of one type
  • Experiments
  • Actively apply treatment to some objects
  • Randomly select objects
  • Randomly allocate treatments

7
Sampling
  • Why Sample?
  • Measuring every object of interest is usually
    impractical or uneconomic.
  • Sample must be representative.
  • Advantages
  • Cheaper
  • Does not destroy the population
  • Can make more careful observations
  • Sampling variability can be estimated

8
Sampling
  • Selecting a Sample
  • The reason for selecting an object must have
    nothing to do with the measurement to be made.
  • Simple random sampling
  • Every object has same chance of being selected.
  • Ensures that selected objects are similar to
    rest.
  • Statistical procedures guarantee this no basis
    for critics to argue.

9
Sampling Terminology
  • Population (Target Population)
  • All of the objects we want to describe or draw
    conclusions about.
  • Sample
  • Objects selected from the population.
  • Simple Random Sample
  • A sample selected by a process that guarantees
    that every possible sample has the same chance of
    selection.

10
Sampling Terminology
  • Sampling Frame
  • A list from which the sample is to be selected.
  • Population Parameter
  • A number, like the mean weight of cabbages per
    hectare, describing the whole population.
  • Sample estimate
  • A number calculated from a sample, which is
    intended to be close to the population parameter.

11
Selecting a Sample
  • Of sheep ( half a dozen)
  • Select typical ones.
  • First half dozen caught
  • If tagged use random numbers
  • Put through race select every 10th (if 60)

NO
NO
OK
OK
  • Of Grass
  • Choose typical areas
  • Mark a grid on a drawing and select cells at
    random
  • Make a quadrat and throw it haphazardly.

NO
OK
Maybe
12
Selecting a Sample
  • Number objects 1 to n
  • Select random numbers
  • Excel
  • Minitab
  • Calculator
  • Random number tables
  • Reject numbers outside 1 to n
  • Reject duplicates

13
Sampling from area
  • Generate random x and y coordinates
  • Design zig-zag path and sample
  • every xxx paces
  • at random numbers of paces over path
  • Throw quadrats to fine-tune locally

14
(No Transcript)
15
Notation
  • N population size
  • m population mean
  • n sample size
  • x1, x2, xi sample values
  •   sample mean

is an estimate of ? is an estimate of
popn total
16
Stratified Sampling
  • Population may consist of 2 groups
  • Large farms / Small farms
  • Flat land / Hilly land
  • Males / Females
  • Maori / Pakeha
  • Children / Young adults / Middle aged / Old

17
Stratified Sampling
  • Divide the population into strata (groups)
  • For each stratum
  • Separate simple random sample (SRS)
  • Estimate mean in stratum
  • Scale up to estimate total for stratum
  • Add to get estimate of population total
  • If required, divide by N to estimate overall popn
    mean

18
Stratified sampling
  • Provides separate estimates for strata
  • More accurate than SRS for popn mean
  • Deciding on strata to use
  • You must know popn number in each stratum
  • Best to use strata whose means are very different
  • Sample size in strata?
  • Proportional to popn size?
  • Better ways? More for variable strata.

19
Unambiguous causes from samples
  • In survey of farms, those growing Variety A had
    higher mean yield than those growing Variety B
  • Cannot conclude that Variety A is better

Farmers with poor soils chose to grow B because
it is more resistant to drought
An experiment is needed to remove ambiguity
20
Experimental Design
  • Uses a collection of objects experimental units
  • Pots growing tomatoes
  • Ideally all experimental units are identical
  • Identical soil mix, light, variety of tomato,
  • Different treatments applied to units
  • Half the pots get additional fertiliser
  • Only treatment differs, so differences are caused
    by treatment

21
Randomisation
  • In practice, the experimental units are not
    identical
  • Cows are different
  • It takes a long time to weigh a herd during which
    they still eat and excrete
  • Try to make units as similar as possible
  • Randomly allocate treatments to units
  • Use random numbers to select cow for each
    treatment

22
Replication
  • More than 1 unit for each treatment
  • Lets you assess natural variability of units
  • Based on s.d. within each treatment
  • Lets you assess whether difference between
    treatment means is more than random variation

23
Blocking
  • To improve the experiment, take account of known
    differences between units.
  • Before experiment, split units into blocks.
  • Like strata in sampling
  • Blocks light levels in greenhouse (pot plants)
  • Randomise treatments within each block.
  • Same mix of varieties within each block

24
Example
  • Compare sizes of 4 varieties of cabbage
  • Experimental units 24 positions in a plot
  • Treatments 4 different varieties
  • Randomisation Randomly pick positions for 6
    cabbages of each variety in plot.

25
Cabbages (cont)
  • Plot data
  • Find the average cabbage weight for variety A, B,
    C and D
  • Assess variability of the plants and the plots by
    looking at the variability for variety A, B, C
    and D

26
Cabbages Blocking
  • We may feel that there is a difference in
    fertility of the ground from left to right.
  • Make each column a block and have one of each
    variety per block.

27
Reasons for Blocking
  • Allows for naturally occurring differences
    between the experimental units.
  • Units are more similar within blocks
  • The treatment means are each averaged over the
    same blocks.
  • Treatment effects can be compared without the
    effects of the different blocks.
  • Differences between the treatments can be found
    within each block.
Write a Comment
User Comments (0)
About PowerShow.com