Lecture 2: Statistical terms - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Lecture 2: Statistical terms

Description:

Population - Collection of all possible objects or observations ... Sample - A portion of the population under study or subsets ... Discrete/meristic no ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 31
Provided by: Bhu9
Category:

less

Transcript and Presenter's Notes

Title: Lecture 2: Statistical terms


1
Lecture 2 Statistical terms
  • Main topics basic knowledge
  • Sampling
  • Data and variables
  • Statistical errors
  • Data accuracy
  • Significant numbers
  • Exploratory data analysis

2
Lecture 2 Statistics related terms
  • Population - Collection of all possible objects
    or observations of a specific characteristic of
    interest
  • Sample - A portion of the population under study
    or subsets of elements or representative portion
    of population
  • Sampling or random sampling
  • Measurement of whole population is difficult,
    costly and often impossible
  • You dont need to eat whole buffalo to test its
    meat!
  • Sample size 1, 3, 5 10, 25, 50?
  • 20 households, 30, 50, 100, 300?
  • No hard and fast rule
  • Bigger the sample better the accuracy/reliability,
    but it increases costs
  • Therefore, there is a trade off between cost and
    accuracy

3
  • Sampling method/type
  • - important especially when population is too
    large variable
  • Random sampling single stage sampling from
    single group
  • Systematic sampling e.g. certain interval of
    time
  • Stratified sampling - e.g. representative
    sampling from all the strata or groups
  • Cluster sampling - select certain group first
    (e.g. select only 10 universities out of all then
    select students from these to study)
  • Multi-stage sampling sample from the samples

3-stage sampling
4
Data and Variables
  • You should be clear about which data or
    information, or parameters to collect/measure!
  • Data Vs. Information
  • Numbers Vs. interpretation (raw Vs. organized)
  • Numerical fact of a variable
  • Usually quantitative study
  • All qualitative observations have to be
    transformed to numerical facts before analyses

5
Data and Variables
  • Variables properties with respect to which
    individuals differ in some ascertainable way.
  • Measurement variables which can be expressed in
    a numerically ordered fashion.
  • Type
  • Continuous infinite points between two points
  • Discrete/meristic no intermediate values
  • 2. Attributes or rank variables e.g. nominal,
    qualitative or categorical variables are
    arbitrarily given the numbers to present the
    group and make easier to analyze (statistically)
    e.g. Black white, and very poor, poor, rich
    very rich and 1 for the best, 2 for good, 3 for
    fair and so on.

6
Data and Variables
3. Derived variables Computed against using 2
or more measurable variables. examples Crop
production per hectare (t/ha) Milk production
(per cow/day) Daily weight gain
(g/animal/day) Net fish yield (g of
fish/m2) Specific growth rate - Feed conversion
ratio (FCR) -
7
Data accuracy and precision
  • Accuracy nearness of a measurement to the actual
    value of the variable. fish weight 8g or 8.3
    depending on the accuracy of the measuring
    instrument
  • Precision Closeness to each other of repeated
    measurements of the same quantity

We strive for both accuracy as well as precision
8
Variation inaccuracy are caused by errors
  • Gross errors
  • Incomplete data, missing data, missing important
    persons/time (e.g. DO at 6 am), malfunction of
    the instruments, recording errors, human errors,
    typing/keying, contaminated reagents etc.
  • Missing data, data manipulation etc.
  • Neither accurate nor precise avoid these errors
  • Systematic errors re-occur upon repeated
    measurements
  • Biasness, rounding off, faulty calibration etc.
  • May be precise but away from accuracy
  • Possible to separate or re-vise/recalculate
  • Note Avoid or at least minimize these first two
    errors!
  • Random or residual errors (unsystematic) vary
    unpredictably
  • It is impossible to completely wipe out errors
  • The remaining error is experimental error
  • Treatment have to have higher effects than the
    random error to be significant

9
The model depends on the experimental design
  • X ij ? T1 T2 (T1T2)E
  • X - Value of an experimental unit
  • - Population mean
  • T1 Treatment 1 effect
  • T2 Treatment 2 effect
  • T1T2 Interaction of treatments 1 and 2
  • E Experimental or residual error or random error

10
Error separation and minimization
  • To avoid gross and systematic errors plan
    properly, use proper sampling and keep control of
    the trial or the research project, avoid
    re-keying of data (try to copy from the original
    data entered)
  • To minimize other errors -
  • 1. Increase treatments or replication
  • Increase the no. of treatments e.g. treatment
    levels
  • Minimum replication 2? increase replication
  • Normally 4-8 are required in agriculture (ref.
    Little and Hills, 1978)
  • Consider facility, management and other costs
  • Be clear about treatment experimental unit
  • Treatment treatment levels with N or without
    N (Trts)
  • its rates (20, 40 .. kg/ha are the levels)
  • Experimental unit fish or tank?

11
Error separation and minimization
  • Replication Experimental error can be measured
    only if there are at least two units treated the
    same way. Repetition of the same event. If you
    see same thing happening again and again you are
    more sure that that event happens if such
    conditions are available.
  • Type
  • Temporal or spatial
  • Time
  • Be careful about pseudo-replication! Replicate
    samples or sub-samples, measuring individual fish
    in a tank is not the replication if the tank is
    experimental unit but individual fish could be an
    experimental unit if you are injecting hormone
    individually and evaluating the hormone efficacy

12
Error separation and minimization
  • Replication
  • Replication can be different for different
    treatments but equal replication decreases the
    standard error or variance and increases the
    precision e.g.

13
Error separation and minimization
No. of replication (Experimental research) - can
be calculated if we know the expected variance
and minimum substantial difference between 2
means. - preliminary sampling/trial or based on
similar trials done in the past also your
judgment t (x1 ?) / v(2?2 /r)
14
Error separation and minimization
  • No. of replication
  • r 2?2. t2 / d2
  • d difference between two means (x - ?)
  • t 1.96 x SE, if ? is 0.05 (significance level)
  • statistical power (small, medium, large 0.1 -
    0.5)

15
Error separation and minimization
  • No. of replication (survey research)
  • n N / (1 Ne2)
  • n sample size
  • N total population (e.g. households)
  • e significance level or the precision (10?)
  • Example, if you know a village has 1,000
    households, assuming e 10 you can find out the
    number of sample households
  • Size of the sample (n) 1000 / (110000.12)
  • 90.9
  • 91 households to be sampled

16
Error separation and minimization
  • In field trial there is possibility of not having
    enough replication which may give not significant
    results therefore, need to do power analysis to
    see whether the non-significant difference was
    due to inadequate replication or the real
    treatment effects
  • statistical power (small, medium, large 0.1 -
    0.5) to learn later

17
Error separation and minimization
  • 2. Refine experimental conditions and procedures
  • Pre-set up and run
  • Pre-test (instruments, systems, questionnaires
    etc.)
  • 3. Use uniform materials and methods
  • use uniform materials e.g. same size and age
    fish, chicken, etc.
  • Use same methods and instruments throughout the
    experimental period

18
Error separation and minimization
  • Ways to minimize or separate experimental/random/r
    esidual errors
  • 1. Randomization provide equal chance. It is
    the cornerstone of the statistical theory in the
    design of experiments
  • lottery
  • random numbers Excel Function Rand()1000
  • 2. Pairing grouping in two e.g. same age
    animals to use for trial
  • 3. Blocking to separate effects already existed
    in the system e.g. canal, shade, different
    ponds/plots, districts, community etc.
  • space plots
  • time year, months weeks etc.
  • Other conditions
  • 4. Data analysis e.g. Covariance analysis (to
    learn later)

19
Data measurement/collection
1. Units and levels of measurement should be
appropriate for examples - mm, cm, m, km -
?g, mg, g, kg, quintal or ton? 2. There must be
enough space for variation in data so that
statistics can detect the differences.
Normally, difference between minimum and
maximum values should be between 30-300 steps or
intermediate levels For example, if you expect
weight of fish between 5g and 10g in a trial.
There are only 5 steps between 5 10 but between
5.0 g 10.0 g there are 51 steps, while 5.00 g
10.00 g there are 501 steps. Therefore,
measured up to one decimal place is enough
20
Significant numbers
Rounding off gt 0.5 increase its preceding digit
by one step (rounding up) and lt0.5 ignore the
number after decimal rounding down). If exactly
0.5 then, if preceding number is odd increase one
step and if the number is even keep as such -
But computer programs do for you - enter the
original numbers collected from the field (it
rounds up only if the second digit is exactly
0.5) Calculated or derived values e.g. mean, SD,
SE etc. can have one decimal more digits than in
measured data
21
Significant numbers
Consider significant numbers
See examples from AIT Thesis!
22
Significant numbers
  • Calculations rounding the numbers
  • 5200 85.7 5285.7 (X wrong)
  • 5200 85.7 5300 (v correct)
  • 5200.0 85.7 5285.7 (v correct)
  • 5.15 x 3.1216 x 150 x 561.617
  • 1,354,303.452 (X)
  • 1,354,300 (v)
  • Rule the answer cannot be more accurate than
    the least accurate figure

23
Exploratory Data Analysis (EDA)
  • Watch for source of errors in a set of data
    before analyzing data, e.g.
  • obvious mistakes double check original data,
    ask friend to check (you may not see your
    mistakes)
  • precision of recording
  • recorder/instrument differences
  • trends e.g. time, increase, decrease
  • treatment responses
  • extreme values compare with other similar
    work/literature

24
Exploratory Data Analysis
  • Compare your data with the related recorded
    information published in
  • books and proceedings
  • journals,
  • magazines
  • newspapers,
  • thesis, reports and raw data etc.
  • Unexpected events/data may be observed do not
    throw away if the results are un-expected try to
    find the cause e.g. fish pond 5 m deep recorded?

25
Tools of Exploratory Data Analysis
  • Pictures and diagrams a single good picture can
    describe something better than thousand words do.
  • Tables A table can accommodate large number of
    information which shows the exact figures/numbers
    e.g. frequency and its distribution, cumulative
    frequency, sum, mean, maximum, minimum etc.
  • Graphs a graph is to show the trends and extra
    high light the certain findings
  • Scatter plots
  • Bar charts and Pie-charts
  • Line graphs
  • Frequency distribution polygons or histograms
  • Notes and explanations sometimes very important

26
Exploratory Data Analysis
  • Basic assumptions of experimental designs
  • Effects of treatments, block and errors are
    additive
  • Observations are normally distributed
  • Experimental errors are independent
  • Variances are homogenous
  • It is necessary to see the collected data from
    samples for these before starting the analysis.

27
Additivity example
28
Normal distribution
? 68 95 99
29
Test of homogeneity/heterogeneity Later!
30
Practical Session 2
Exploratory data analysis(EDA)Afternoon
session 14.30 hrsAFE Computer Lab
Write a Comment
User Comments (0)
About PowerShow.com