Title: Lecture 2: Statistical terms
1Lecture 2 Statistical terms
- Main topics basic knowledge
- Sampling
- Data and variables
- Statistical errors
- Data accuracy
- Significant numbers
- Exploratory data analysis
2Lecture 2 Statistics related terms
- Population - Collection of all possible objects
or observations of a specific characteristic of
interest - Sample - A portion of the population under study
or subsets of elements or representative portion
of population - Sampling or random sampling
- Measurement of whole population is difficult,
costly and often impossible - You dont need to eat whole buffalo to test its
meat! - Sample size 1, 3, 5 10, 25, 50?
- 20 households, 30, 50, 100, 300?
- No hard and fast rule
- Bigger the sample better the accuracy/reliability,
but it increases costs - Therefore, there is a trade off between cost and
accuracy
3- Sampling method/type
- - important especially when population is too
large variable - Random sampling single stage sampling from
single group - Systematic sampling e.g. certain interval of
time - Stratified sampling - e.g. representative
sampling from all the strata or groups - Cluster sampling - select certain group first
(e.g. select only 10 universities out of all then
select students from these to study) - Multi-stage sampling sample from the samples
3-stage sampling
4Data and Variables
- You should be clear about which data or
information, or parameters to collect/measure! - Data Vs. Information
- Numbers Vs. interpretation (raw Vs. organized)
- Numerical fact of a variable
- Usually quantitative study
- All qualitative observations have to be
transformed to numerical facts before analyses
5Data and Variables
- Variables properties with respect to which
individuals differ in some ascertainable way. - Measurement variables which can be expressed in
a numerically ordered fashion. - Type
- Continuous infinite points between two points
- Discrete/meristic no intermediate values
- 2. Attributes or rank variables e.g. nominal,
qualitative or categorical variables are
arbitrarily given the numbers to present the
group and make easier to analyze (statistically)
e.g. Black white, and very poor, poor, rich
very rich and 1 for the best, 2 for good, 3 for
fair and so on.
6Data and Variables
3. Derived variables Computed against using 2
or more measurable variables. examples Crop
production per hectare (t/ha) Milk production
(per cow/day) Daily weight gain
(g/animal/day) Net fish yield (g of
fish/m2) Specific growth rate - Feed conversion
ratio (FCR) -
7Data accuracy and precision
- Accuracy nearness of a measurement to the actual
value of the variable. fish weight 8g or 8.3
depending on the accuracy of the measuring
instrument - Precision Closeness to each other of repeated
measurements of the same quantity
We strive for both accuracy as well as precision
8Variation inaccuracy are caused by errors
- Gross errors
- Incomplete data, missing data, missing important
persons/time (e.g. DO at 6 am), malfunction of
the instruments, recording errors, human errors,
typing/keying, contaminated reagents etc. - Missing data, data manipulation etc.
- Neither accurate nor precise avoid these errors
- Systematic errors re-occur upon repeated
measurements - Biasness, rounding off, faulty calibration etc.
- May be precise but away from accuracy
- Possible to separate or re-vise/recalculate
- Note Avoid or at least minimize these first two
errors! - Random or residual errors (unsystematic) vary
unpredictably - It is impossible to completely wipe out errors
- The remaining error is experimental error
- Treatment have to have higher effects than the
random error to be significant
9The model depends on the experimental design
- X ij ? T1 T2 (T1T2)E
- X - Value of an experimental unit
- - Population mean
- T1 Treatment 1 effect
- T2 Treatment 2 effect
- T1T2 Interaction of treatments 1 and 2
- E Experimental or residual error or random error
10Error separation and minimization
- To avoid gross and systematic errors plan
properly, use proper sampling and keep control of
the trial or the research project, avoid
re-keying of data (try to copy from the original
data entered) - To minimize other errors -
- 1. Increase treatments or replication
- Increase the no. of treatments e.g. treatment
levels - Minimum replication 2? increase replication
- Normally 4-8 are required in agriculture (ref.
Little and Hills, 1978) - Consider facility, management and other costs
- Be clear about treatment experimental unit
- Treatment treatment levels with N or without
N (Trts) - its rates (20, 40 .. kg/ha are the levels)
- Experimental unit fish or tank?
11Error separation and minimization
- Replication Experimental error can be measured
only if there are at least two units treated the
same way. Repetition of the same event. If you
see same thing happening again and again you are
more sure that that event happens if such
conditions are available. - Type
- Temporal or spatial
- Time
- Be careful about pseudo-replication! Replicate
samples or sub-samples, measuring individual fish
in a tank is not the replication if the tank is
experimental unit but individual fish could be an
experimental unit if you are injecting hormone
individually and evaluating the hormone efficacy
12Error separation and minimization
- Replication
- Replication can be different for different
treatments but equal replication decreases the
standard error or variance and increases the
precision e.g.
13Error separation and minimization
No. of replication (Experimental research) - can
be calculated if we know the expected variance
and minimum substantial difference between 2
means. - preliminary sampling/trial or based on
similar trials done in the past also your
judgment t (x1 ?) / v(2?2 /r)
14Error separation and minimization
- No. of replication
- r 2?2. t2 / d2
- d difference between two means (x - ?)
- t 1.96 x SE, if ? is 0.05 (significance level)
- statistical power (small, medium, large 0.1 -
0.5)
15Error separation and minimization
- No. of replication (survey research)
- n N / (1 Ne2)
- n sample size
- N total population (e.g. households)
- e significance level or the precision (10?)
- Example, if you know a village has 1,000
households, assuming e 10 you can find out the
number of sample households - Size of the sample (n) 1000 / (110000.12)
- 90.9
- 91 households to be sampled
16Error separation and minimization
- In field trial there is possibility of not having
enough replication which may give not significant
results therefore, need to do power analysis to
see whether the non-significant difference was
due to inadequate replication or the real
treatment effects - statistical power (small, medium, large 0.1 -
0.5) to learn later
17Error separation and minimization
- 2. Refine experimental conditions and procedures
- Pre-set up and run
- Pre-test (instruments, systems, questionnaires
etc.) - 3. Use uniform materials and methods
- use uniform materials e.g. same size and age
fish, chicken, etc. - Use same methods and instruments throughout the
experimental period
18Error separation and minimization
- Ways to minimize or separate experimental/random/r
esidual errors - 1. Randomization provide equal chance. It is
the cornerstone of the statistical theory in the
design of experiments - lottery
- random numbers Excel Function Rand()1000
- 2. Pairing grouping in two e.g. same age
animals to use for trial - 3. Blocking to separate effects already existed
in the system e.g. canal, shade, different
ponds/plots, districts, community etc. - space plots
- time year, months weeks etc.
- Other conditions
- 4. Data analysis e.g. Covariance analysis (to
learn later)
19Data measurement/collection
1. Units and levels of measurement should be
appropriate for examples - mm, cm, m, km -
?g, mg, g, kg, quintal or ton? 2. There must be
enough space for variation in data so that
statistics can detect the differences.
Normally, difference between minimum and
maximum values should be between 30-300 steps or
intermediate levels For example, if you expect
weight of fish between 5g and 10g in a trial.
There are only 5 steps between 5 10 but between
5.0 g 10.0 g there are 51 steps, while 5.00 g
10.00 g there are 501 steps. Therefore,
measured up to one decimal place is enough
20Significant numbers
Rounding off gt 0.5 increase its preceding digit
by one step (rounding up) and lt0.5 ignore the
number after decimal rounding down). If exactly
0.5 then, if preceding number is odd increase one
step and if the number is even keep as such -
But computer programs do for you - enter the
original numbers collected from the field (it
rounds up only if the second digit is exactly
0.5) Calculated or derived values e.g. mean, SD,
SE etc. can have one decimal more digits than in
measured data
21Significant numbers
Consider significant numbers
See examples from AIT Thesis!
22Significant numbers
- Calculations rounding the numbers
- 5200 85.7 5285.7 (X wrong)
- 5200 85.7 5300 (v correct)
- 5200.0 85.7 5285.7 (v correct)
- 5.15 x 3.1216 x 150 x 561.617
- 1,354,303.452 (X)
- 1,354,300 (v)
- Rule the answer cannot be more accurate than
the least accurate figure
23Exploratory Data Analysis (EDA)
- Watch for source of errors in a set of data
before analyzing data, e.g. - obvious mistakes double check original data,
ask friend to check (you may not see your
mistakes) - precision of recording
- recorder/instrument differences
- trends e.g. time, increase, decrease
- treatment responses
- extreme values compare with other similar
work/literature
24Exploratory Data Analysis
- Compare your data with the related recorded
information published in - books and proceedings
- journals,
- magazines
- newspapers,
- thesis, reports and raw data etc.
- Unexpected events/data may be observed do not
throw away if the results are un-expected try to
find the cause e.g. fish pond 5 m deep recorded?
25Tools of Exploratory Data Analysis
- Pictures and diagrams a single good picture can
describe something better than thousand words do.
- Tables A table can accommodate large number of
information which shows the exact figures/numbers
e.g. frequency and its distribution, cumulative
frequency, sum, mean, maximum, minimum etc. - Graphs a graph is to show the trends and extra
high light the certain findings - Scatter plots
- Bar charts and Pie-charts
- Line graphs
- Frequency distribution polygons or histograms
- Notes and explanations sometimes very important
26Exploratory Data Analysis
- Basic assumptions of experimental designs
- Effects of treatments, block and errors are
additive - Observations are normally distributed
- Experimental errors are independent
- Variances are homogenous
- It is necessary to see the collected data from
samples for these before starting the analysis.
27Additivity example
28Normal distribution
? 68 95 99
29Test of homogeneity/heterogeneity Later!
30Practical Session 2
Exploratory data analysis(EDA)Afternoon
session 14.30 hrsAFE Computer Lab