Title: Hypothesis testing
1Lecture - 6 Review Until now about one-third!
- Background
- Related terms
- Exploratory data analysis data presentation
- Normal distribution/ frequency curve
- Data transformation
- Measure of central tendency
- Measure of dispersion/variability
Whats remaining?
- Binomial and Poisson distributions
- Hypothesis formulation and testing
- Experimental designs ANOVA and multiple
comparisons - Regression and correlation
- Covariance analysis
- Non-parametric tests
- Miscellaneous
2Normal distribution _ revision
- Continuous variables
- Class intervals
- Intermediate steps/scales e.g.
- - weight (10 g, 10.4g, 10.45 g. etc.)
- - similarly - length, distance, percentage, etc.
3Normal distribution
4Binomial distribution and probability
- Discrete series/nominal scale
- Two mutually exclusive category/class and their
frequency/occurrence - For examples,
- - male and female
- - head and tail
- - dead or live
- - white and black
5Binomial distribution
- Frequency bar
- Total 60 prawns sampled
6Binomial distribution
(p q)2 p2 2pq q2
7Binomial distribution becomes normal if it is
expanded
8Binomial distribution
- Expansion of binomial distribution
- (p q)1 p q
- (p q)2 p2 2pq q2
- (p q)3 p3 3p2q 3pq2 q3
- (p q)4 p4 4p3q 6p2q2 4pq3 q4
- (p q)5 p5 5p4q 10p3q2 10p2q3 5pq4
q5 - ---
- (p q)n ??
- P (X) pk.qn-k x n! / k! (n-k)!
- Note as it was first described by Bernoulli
(1654-1705) binomial distribution is also
referred after his name
9- Application Examples
- Testing the sex ratio (Male female)
- Effectiveness of hormonal sex-reversal
- Progeny testing - super-male (YY) male
technologies - Breeding trials with animals, plants etc.
- Sex-ratio of population of your study areas e.g.
during war time no. of male decreased
significantly in some countries - Gender migration studies etc.etc.
10Applications Mendel's Law of inheritance
(Genetics)
11Poisson distribution
- First described by Simeon-Denis Poisson
(1781-1840) - Random occurrences of an event
- Equal chances in space or time
- They are independent of each other occurrence
of one event does not affect the others - Mean Variance (? ?2)
- It approaches binomial distribution when n is
large and p is small
12Poisson distribution
Poisson distribution
Poisson (random) ?2 ?
Clustered ?2 gt ?
Uniform/normal ?2 lt ?
13Circular distribution
- Data are in circular and interval scale e.g.
degrees, radians etc. and should be presented
using pie charts - Examples,
- Per cent (0-100)
- Direction E, S, N, W (measured by a Compass)
- Longitude (distance)
- Time 0-24 hours, Weekdays, months, etc.
14Hypothesis formulation and testing
What is a hypothesis?
15Scientific method _revision
Nature
Observation
New related hypothesis
Alternate hypothesis
Hypothesis
Prediction
Design/propose experiment
Law (universal)
Experiment/trial
Generate/collect data
Theory
Modify
Analyze interpret data
Compare/relate
Support hypothesis
Reject hypothesis
16- What is a hypothesis?
- Starting point of scientific inquiry
(discovery/invention) - an assumption made for the sake of argument
- a tentative and normally testable statement that
proposes a possible explanation to some
phenomenon or event - A hypothesis is an embryo which needs to be
tested and developed to theory and law
17Hypothesis gt theory gt law
- Hypothesis implies insufficient evidence to
provide more than a tentative explanation e.g. a
hypothesis explaining the extinction of the
dinosaurs, origin of the universe etc. - Theory implies a greater range of evidence and
greater likelihood of truth e.g. the theory of
evolution - Law implies a statement of order and relation in
nature that has been found to be invariable under
the same conditions e.g. the law of gravitation,
law of inheritance (Mendel) etc.
18Type of Hypothesis
- H0 - Null hypothesis no difference
- H1 - Alternate hypothesis another is better
- Examples use tentative word may
- H0 - Salt in soil may not affect plant growth
- H1 - Salt in soil may affect plant growth
- Hypothesis (H1)
- Bird flu virus may also infect cat and then human
- Feeding Vit C may increase survival of fish fry
- 100 mg of Vit C/kg diet is needed to increase fry
survival
19Hypothesis testing
- Is an intelligent guess based on limited and
untested information - Although experimental conclusions may match
predicted results, we should not guess without
proper testing and analysis we must test and
confirm it - Should be tested against new information using
appropriate tool(s) - Can be expressed in mathematical language e.g.
- H0 H1 there is no difference accept H0
- H0 ? H1 there is a difference reject H0
20Testing processes tools
- Requirements
- Design trial/survey
- Collect data or information (raw)
- Descriptive statistics (central locations and
dispersions) - Significance level confidence limits
(intervals) - Appropriate statistical tools/packages
21Significance (?) level
- Probability (P) of occurring any event by random
error or chance - Social sciences research 10
- Biological research 5
- Medical or laboratory analysis 1
- Physics 99.99 and 0.01 (?)
- In biology, if calculated P is lower than
- 5 or 0.05 ? significant ()
- 1 or 0.01 ? highly significant ()
22- In fact, there is no fixed level of significance-
researchers themselves determine what it is. - For example
- Cure for AIDS may consider plt0.40 adequate (i.e.
Pgt0.60 that a certain cocktail is effective
against AIDS) - For drug for common cold probability as high as
0.0001 may be necessary to convince that the
treatment does not cause side effects. - The scientists objective is to understand the
system and not to find mindless mechanistic
statistical significance
23Biological/substantive significance
- Effect size or substantive importance/
significance or meaningfulness - Statistically difference but that may not be able
to cause any effect in the nature or the
situation where we use it. Therefore, the
difference to be a difference must make a
difference e.g. - For example (weight of fish in g)
- Sample 1 100.1, 100.2, 100.3, 100.1, 100.2,
100.3 - Sample 2 100.4, 100.5, 100.5, 100.6, 100.6,
100.4 - Means 100.2 and 100.5 (statistically
significant) - Difference 0.3 g (no biological significance)
24Confidence Interval and limits
Confidence interval
Lower limit (L1)
Higher limit (L2)
?1 95
1.96?
-1.96?
Mean ? 1.96 SE or (SD/vn)
25Use of confidence Interval and limits
Upper CI Mean Lower CI
Mean ? 1.96 SE or (SD/vn)
26Confidence Intervals (CI)
- Sample mean estimates the true mean and standard
error describes the variability of that
estimation - This variability can be conveniently expressed in
terms of probabilities by calculating CI - For example
- Sample (n) 25 fish from a pond of 10, 000 fish,
suppose we got - - Mean 350g SD 75g
- L1 ?
- L2 ?
- -
27- Here, SE 75.v25 15 g
- If Mean 1SE, we can be confident that there is
68 chance (p) that true mean is 35015 - 335 is the lower limit (L1)
- 365 is the upper limit (L2)
- But we would like to be 95 confident.
- This is done by multiplying the SE by t-value
- From the table for the critical values of the
t-distribution - t-statistics (t 0.05, 24) (SE) for n-1 24
- 2.064 15 31 g
- L1 350 31g 381 and L2 350-31319
We can now say with 95 confidence the true mean
falls between these limits (319-381 g)
28- CI increases as we demand greater and greater
confidence e.g. for 99 - CI Mean ? 2.797 15 350 ? 42 g
- Conversely, if we have 0.01 or less probability
(p) value, we are over 99 sure that it does not
fall within that limit or (significantly
different)
29Hypothesis testing - An example
- H1 Hypothesis (alternate hypothesis)
- Homemade pellet feed may give better yield of
tilapia than the expensive commercial feed - H0 null hypothesis
- There might be no difference in tilapia yield
between two diets
30The test
- Four tanks each fed two types of feed
- Nursing results after 1 month
- Homemade feed (G1)
- 50.310.1g/fish (Mean 1SE)
- Commercial feed (G2)
- 69.19.2g/fish
- Difference in mean values 69.1 -50.3 18.8g
i.e. 37 bigger than the fish of G1, if it is
statistically proved (of course this difference
have biological meaning) - Should the null hypothesis be accepted or
rejected?
31Statistical errors in hypothesis testing
- It will not be correct if we reject the null
hypothesis (there are no difference between the
two means) or accept the null hypothesis (no
difference between two means) - Two types of errors one can make
- Type I error- claiming true significant
difference when there is none - Type II error- claiming no truly significant
difference when there is significant difference
32Statistical error in hypothesis testing
33- Scientists generally prefer to decrease the
possibility of making a Type 1 error - - It is better to miss significant difference/
relationship when there was, than to claim
significant when there was none - Selection of statistical tools has an important
role in not committing these errors.
34Summary of statistical tools
35Statistical tables http//www.statsoft.com/textbo
ok/stathome.html
No Lab session today ! Happy Vietnamese/Chinese
New Year! Enjoy yourself!
Thank you!