Lecture 2: Statistical terms - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Lecture 2: Statistical terms

Description:

Population - Collection of all possible objects or observations ... Sample - A portion of the population under study or subsets ... Discrete/meristic no ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 31

Provided by: Bhu9

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 2: Statistical terms

1
Lecture 2 Statistical terms

Main topics basic knowledge
Sampling
Data and variables
Statistical errors
Data accuracy
Significant numbers
Exploratory data analysis

2
Lecture 2 Statistics related terms

Population - Collection of all possible objects
or observations of a specific characteristic of
interest
Sample - A portion of the population under study
or subsets of elements or representative portion
of population
Sampling or random sampling
Measurement of whole population is difficult,
costly and often impossible
You dont need to eat whole buffalo to test its
meat!
Sample size 1, 3, 5 10, 25, 50?
20 households, 30, 50, 100, 300?
No hard and fast rule
Bigger the sample better the accuracy/reliability,
but it increases costs
Therefore, there is a trade off between cost and
accuracy

Sampling method/type
- important especially when population is too
large variable
Random sampling single stage sampling from
single group
Systematic sampling e.g. certain interval of
time
Stratified sampling - e.g. representative
sampling from all the strata or groups
Cluster sampling - select certain group first
(e.g. select only 10 universities out of all then
select students from these to study)
Multi-stage sampling sample from the samples

3-stage sampling
4
Data and Variables

You should be clear about which data or
information, or parameters to collect/measure!
Data Vs. Information
Numbers Vs. interpretation (raw Vs. organized)
Numerical fact of a variable
Usually quantitative study
All qualitative observations have to be
transformed to numerical facts before analyses

5
Data and Variables

Variables properties with respect to which
individuals differ in some ascertainable way.
Measurement variables which can be expressed in
a numerically ordered fashion.
Type
Continuous infinite points between two points
Discrete/meristic no intermediate values
2. Attributes or rank variables e.g. nominal,
qualitative or categorical variables are
arbitrarily given the numbers to present the
group and make easier to analyze (statistically)
e.g. Black white, and very poor, poor, rich
very rich and 1 for the best, 2 for good, 3 for
fair and so on.

6
Data and Variables
3. Derived variables Computed against using 2
or more measurable variables. examples Crop
production per hectare (t/ha) Milk production
(per cow/day) Daily weight gain
(g/animal/day) Net fish yield (g of
fish/m2) Specific growth rate - Feed conversion
ratio (FCR) -
7
Data accuracy and precision

Accuracy nearness of a measurement to the actual
value of the variable. fish weight 8g or 8.3
depending on the accuracy of the measuring
instrument
Precision Closeness to each other of repeated
measurements of the same quantity

We strive for both accuracy as well as precision
8
Variation inaccuracy are caused by errors

Gross errors
Incomplete data, missing data, missing important
persons/time (e.g. DO at 6 am), malfunction of
the instruments, recording errors, human errors,
typing/keying, contaminated reagents etc.
Missing data, data manipulation etc.
Neither accurate nor precise avoid these errors
Systematic errors re-occur upon repeated
measurements
Biasness, rounding off, faulty calibration etc.
May be precise but away from accuracy
Possible to separate or re-vise/recalculate
Note Avoid or at least minimize these first two
errors!
Random or residual errors (unsystematic) vary
unpredictably
It is impossible to completely wipe out errors
The remaining error is experimental error
Treatment have to have higher effects than the
random error to be significant

9
The model depends on the experimental design

X ij ? T1 T2 (T1T2)E
X - Value of an experimental unit
- Population mean
T1 Treatment 1 effect
T2 Treatment 2 effect
T1T2 Interaction of treatments 1 and 2
E Experimental or residual error or random error

10
Error separation and minimization

To avoid gross and systematic errors plan
properly, use proper sampling and keep control of
the trial or the research project, avoid
re-keying of data (try to copy from the original
data entered)
To minimize other errors -
1. Increase treatments or replication
Increase the no. of treatments e.g. treatment
levels
Minimum replication 2? increase replication
Normally 4-8 are required in agriculture (ref.
Little and Hills, 1978)
Consider facility, management and other costs
Be clear about treatment experimental unit
Treatment treatment levels with N or without
N (Trts)
its rates (20, 40 .. kg/ha are the levels)
Experimental unit fish or tank?

11
Error separation and minimization

Replication Experimental error can be measured
only if there are at least two units treated the
same way. Repetition of the same event. If you
see same thing happening again and again you are
more sure that that event happens if such
conditions are available.
Type
Temporal or spatial
Time
Be careful about pseudo-replication! Replicate
samples or sub-samples, measuring individual fish
in a tank is not the replication if the tank is
experimental unit but individual fish could be an
experimental unit if you are injecting hormone
individually and evaluating the hormone efficacy

12
Error separation and minimization

Replication
Replication can be different for different
treatments but equal replication decreases the
standard error or variance and increases the
precision e.g.

13
Error separation and minimization
No. of replication (Experimental research) - can
be calculated if we know the expected variance
and minimum substantial difference between 2
means. - preliminary sampling/trial or based on
similar trials done in the past also your
judgment t (x1 ?) / v(2?2 /r)
14
Error separation and minimization

No. of replication
r 2?2. t2 / d2
d difference between two means (x - ?)
t 1.96 x SE, if ? is 0.05 (significance level)
statistical power (small, medium, large 0.1 -
0.5)

15
Error separation and minimization

No. of replication (survey research)
n N / (1 Ne2)
n sample size
N total population (e.g. households)
e significance level or the precision (10?)
Example, if you know a village has 1,000
households, assuming e 10 you can find out the
number of sample households
Size of the sample (n) 1000 / (110000.12)
90.9
91 households to be sampled

16
Error separation and minimization

In field trial there is possibility of not having
enough replication which may give not significant
results therefore, need to do power analysis to
see whether the non-significant difference was
due to inadequate replication or the real
treatment effects
statistical power (small, medium, large 0.1 -
0.5) to learn later

17
Error separation and minimization

2. Refine experimental conditions and procedures
Pre-set up and run
Pre-test (instruments, systems, questionnaires
etc.)
3. Use uniform materials and methods
use uniform materials e.g. same size and age
fish, chicken, etc.
Use same methods and instruments throughout the
experimental period

18
Error separation and minimization

Ways to minimize or separate experimental/random/r
esidual errors
1. Randomization provide equal chance. It is
the cornerstone of the statistical theory in the
design of experiments
lottery
random numbers Excel Function Rand()1000
2. Pairing grouping in two e.g. same age
animals to use for trial
3. Blocking to separate effects already existed
in the system e.g. canal, shade, different
ponds/plots, districts, community etc.
space plots
time year, months weeks etc.
Other conditions
4. Data analysis e.g. Covariance analysis (to
learn later)

19
Data measurement/collection
1. Units and levels of measurement should be
appropriate for examples - mm, cm, m, km -
?g, mg, g, kg, quintal or ton? 2. There must be
enough space for variation in data so that
statistics can detect the differences.
Normally, difference between minimum and
maximum values should be between 30-300 steps or
intermediate levels For example, if you expect
weight of fish between 5g and 10g in a trial.
There are only 5 steps between 5 10 but between
5.0 g 10.0 g there are 51 steps, while 5.00 g
10.00 g there are 501 steps. Therefore,
measured up to one decimal place is enough
20
Significant numbers
Rounding off gt 0.5 increase its preceding digit
by one step (rounding up) and lt0.5 ignore the
number after decimal rounding down). If exactly
0.5 then, if preceding number is odd increase one
step and if the number is even keep as such -
But computer programs do for you - enter the
original numbers collected from the field (it
rounds up only if the second digit is exactly
0.5) Calculated or derived values e.g. mean, SD,
SE etc. can have one decimal more digits than in
measured data
21
Significant numbers
Consider significant numbers
See examples from AIT Thesis!
22
Significant numbers

Calculations rounding the numbers
5200 85.7 5285.7 (X wrong)
5200 85.7 5300 (v correct)
5200.0 85.7 5285.7 (v correct)
5.15 x 3.1216 x 150 x 561.617
1,354,303.452 (X)
1,354,300 (v)
Rule the answer cannot be more accurate than
the least accurate figure

23
Exploratory Data Analysis (EDA)

Watch for source of errors in a set of data
before analyzing data, e.g.
obvious mistakes double check original data,
ask friend to check (you may not see your
mistakes)
precision of recording
recorder/instrument differences
trends e.g. time, increase, decrease
treatment responses
extreme values compare with other similar
work/literature

24
Exploratory Data Analysis

Compare your data with the related recorded
information published in
books and proceedings
journals,
magazines
newspapers,
thesis, reports and raw data etc.
Unexpected events/data may be observed do not
throw away if the results are un-expected try to
find the cause e.g. fish pond 5 m deep recorded?

25
Tools of Exploratory Data Analysis

Pictures and diagrams a single good picture can
describe something better than thousand words do.
Tables A table can accommodate large number of
information which shows the exact figures/numbers
e.g. frequency and its distribution, cumulative
frequency, sum, mean, maximum, minimum etc.
Graphs a graph is to show the trends and extra
high light the certain findings
Scatter plots
Bar charts and Pie-charts
Line graphs
Frequency distribution polygons or histograms
Notes and explanations sometimes very important

26
Exploratory Data Analysis

Basic assumptions of experimental designs
Effects of treatments, block and errors are
additive
Observations are normally distributed
Experimental errors are independent
Variances are homogenous
It is necessary to see the collected data from
samples for these before starting the analysis.

27
Additivity example
28
Normal distribution
? 68 95 99
29
Test of homogeneity/heterogeneity Later!
30
Practical Session 2
Exploratory data analysis(EDA)Afternoon
session 14.30 hrsAFE Computer Lab

Write a Comment

User Comments (0)