Title: Biostatistics
1Biostatistics
2Statistics
- Sayings about statistics
- Statistics is a science about accurate work with
inaccurate numbers. - We know three kinds of lies intentional,
unintentional and statistics
3(No Transcript)
4Biostatistics what does it mean?
- It isnt separate field of science. Using this
word we point out, that it is an application of
statistical methods helping to resolve biological
problems. and biological data are specific of
their own
5And what is statistics indeed?
- (in laymen language) Ordered group of data
statistics of shootings, statistics of car
accidents in different regions - (in scientific language) A science, what we are
going to do with our data - (mathematical)
statistics as a science - Withing the scope of statistics a value
calculated from numbers and synthesizing
features of these numbers
6Anything can be proved with the help of
statistics
- especially by people, who dont understand
statistics - It is statistically proved, that widows live
longer than their husbands. - It is possible to put anything to diagrams and
they look then very suggestive, especially when
they are accompanied with right interpretation
(data are fictitious, but according to reality)
7And much better with the help of diagrams
8(No Transcript)
9(No Transcript)
10Advice when somebody tells you, how many per
cents something got better, ask every time, which
base were the percents computed from.
11Goals of statistics
- (1) Descriptive statistics to sumarize data, to
condensate information from many numbers to
lesser number of parameters or to a diagram
12Compare
Average number of points was 74.5, whereas the
minimum value was 28 and the maximum value was
100.
Frequency diagram
No. of points
13The lower number of parameters I obtain
- the more transparent and more simple the result
is - the loss of information is bigger though (I am
never able to find out from average or histogram
how much points had František K., nor the value
of all the numbers) - - the art is to find the border, where the result
is transparent but still having its predictive
quality
14Thanks to the loss of information we are able to
say lies in statistics
According to the statistics, we all are flying.
Not so high in the clouds, but near the ground
and just slightly touching with the end of our
shoes the shit we are sitting in.
15The worst the patient is, the better the
medicine works.
16Argument for harmfulness of fluoridization (data
from USAs states)
Nicaragua should be here
17Storks bring babies
18Differentiate - correlation and causation
- The general scientific method
19Common scientific method on the example of
babies bringing storks 1. Observation finding
of pattern
20- 2. Interpretation Stork brings babies
- 3. Prediction if we remove storks, babies wont
be born or their number would be decreased, if
crows also do the job - 4. Experiment In the half of regions (randomly
selected!) we shoot out storks and watch changes
in natality (in comparison with the changes in
control regions) - 5. (After statistical approach) we bring out
there are no changes, so we can proclaim, that
storks dont bring babies.
21Hypothetical-deductive approach (K. Popper)
good presumption can bring just good prediction,
bad presumption can bring both good and bad
prediction thanks to this we can never prove
the prediction (hypothesis), just reject it
Observation (pattern)
explanation
Hypothesis exclude each other, predictions differ
from each other
Hypothesis 1
Hypothesis 2
Hypothesis 3
Prediction 2
Prediction 3
Prediction 1
Result of the experiment compared with the reality
22Goals of statisticsPopulation and sample
- (2) Interferential statistics - Making an
inference about (statistical) population from a
sample - Some (statistical) populations are too large or
potentially infinite I am not able to check
all the members - What can I say about results of elections in the
whole republic, when I ask just 1000 people? - What can I say about amount of Cd in blood of
wild geese in CZ, when I took blood just from 10
specimens?
23Interferential statistic is common in biology
- I dont want to make conclusions about my 10
laboratory rats, but on the base of these 10 rats
I want to say something about all experiments
done in the same way - Should this be a science, the experiments have to
be reproducible (comp. Journal of Irreproducible
Research)
24Types of (not only biological) data
- Continuous and discrete data mathematical
definition and reality of datas measuring in
reality we always measure data with certain
accuracy
25Types of (not only biological) data
- Ratio scale
- Interval scale
- Ordinal scale
- Nominal scale (categorical data)
0
Circular scale
90
270
180
26Azimuth of the stem with lichen findings
degrees 5, 10, 5, 350, 350, 355 gt average
180 Time of doom-mongers ululating 2200,
2300, 2400, 100, 100, 200 gt average is
short after the midday
27Types of (not only biological) data
- Ratio scale
- Interval scale
- Ordinal scale
- Nominal scale (categorical data)
0
Circular scale
90
270
180
28Population and Random sample
- Sampling Sampling design
- Random sample every individual has to have the
same probability to be chosen, independent upon
the fact that another individual was chosen - Tabs and generators of (pseudo)random numbers
29Population sample and Random sample
- Almost philosophical question what it is
random - And what it is probability
- In statistics (that means in this course) we will
use so-called a priori probability (also the
Bayesian - posterior probability exists)
30To make a random sampling isnt usually trivial
in no case it is a sampling of typical
individuals it works reasonably well in
agricultural experiments
1
2
3
1
2
3
4
5
6
31Much more difficult it is in natural populations
even individual nearest to the random point
does not work here
32Basic statistical characteristics
- We usually mark N size of the population, n
size of sample - Characteristics of the population are usually
marked with Greek alphabet and characteristics of
sample with Roman characters - Characteristics of location
- Means, median and modus
- Means are defined for quantitative data (i.e. on
ratio and interval scale)
33Arithmetical mean
of population
of sample
34Geometrical mean
- n-root of the sum of n values (for a sample here)
35Harmonic mean
- Reciprocal of the mean of reciprocals.
36Median used for ordinal-scaled data also
- It is defined as one half of the values is under
and the second one over the median (in endless
populations is the probability, that random value
is over as well as under the median 0.5). In
populations with even number of terms is a value
in the half of two middle values considered to be
the median
37Upper and lower quartile
- Over the upper quartile is 1/4 observations,
under the lower one is 1/4 of observations
(similar with the endless populations)
38Make difference among meaning of mean and median
Example wages in two companies
39Modus the most common value in continuous data
in continuous data it is the peak in
frequency diagram we will define it as the
local maximum of the density-probabilities curve
later can be more than one
40mean
median
mean
median
mean
mean
median
median
41Characteristics of variability
- 1. Range is a difference between minimum and
maximum - 2. Interquartile range
- 3. Variance and standard deviation
42Variance average value of square deviation from
mean
estimation based on the sample
n-1 df degrees of freedom
43Standard deviation (sx, often also s.d. or
S.D.) is root from variance
44Compare variability in weight of elephant and ant
- Use either variance or standard deviation of data
under logarithm, or coefficient of variation CV - Both have its sense just for ratio-scaled data
45Standard error of mean
- Characteristic of sample means accuracy how
big would be variability of means of this size
from many random samples
variability in data
accuracy
We can higher accuracy thanks to larger sample.
46Graphic summarizations frequency diagram
NO_SAPLING
47Box and whisker plot
Attention, nowadays is box whisker also used
for mean and standard deviation etc.
NO_SAPLING