Title: MD 5108 Biostatistics for Basic Research
1MD 5108Biostatistics for Basic Research
- Lecturer Dr K. Mukherjee
- Office S16-06-100
- Tel 874 2764
- Email stamk_at_nus.edu.sg
2Objectives To train practitioners of the
biomedical sciences in the use and interpretation
of statistical data analysis.
- explore and present data using tables, charts
and graphs - ability to do simple statistical calculations
with a calculator - carry out data analysis using a statistical
package such as SPSS - pick the right procedure for analysing a set
of data - interpret results correctly and report
findings - avoid misuse and abuse of statistics
- understand statistical contents of papers in
medical journals - judge claims and statements critically
- discuss and communicate ideas in a
quantitative manner
3- Teaching approach
- nonmathematical introduction
- explanation of concepts rather than proofs
- emphasis on methodology and procedures
- emphasise use of statistical package rather
- than manual calculation
- emphasis on choosing the right procedure
- emphasis on correct interpretation of results
- examples from clinical research literature
4Topic 1 What is statistics? A branch of
mathematics dealing with the analysis and
interpretation of masses of numerical data
Merrian-Webster Dictionary The field of study
that involves the collection and analysis of
numerical facts or data of any kind Oxford
Dictionary The study of how information should
be employed to reflect on, and give guidance for
action, in a practical situation involving
uncertainty Vic Barnett
Biostatistics Application of statistical methods
to biological, medicine and health sciences
5Why the need for Statistics in Biomedicine ?
- Two main reasons
- Variation
- attributes differ not only among individuals but
also within the same individual over time - Sampling
- biomedical research projects mostly carried out
on small numbers of study subjects - challenging problem to project results from small
samples studies to individuals at large
6Necessitates the use of statistical methods in
biomedicine to put numerical data into a context
by which we can better judge their meaning
7From sample to population
Statistical methods used to produce statistical
inferences about a population based on
information from a sample derived from that
population
Population
inductive statistical methods
sample
8Altman (1991) Practical Statistics for Medical
Research, Chapman and Hall.
9Bailar Mosteller (1986) Medical Uses of
Statistics, NEJM Books.
10Many studies have been done on misuse of
statistics in medicine
11From Altman (1991)
12Schor and Karten (1966, J. Am. Med. Assoc.)
- 149 papers classed as analytical studies in 3
issues of 11 most frequently read medical
journals - assessment criteria
- Validity with respect to
- Design of experiment?
- Type of analysis performed?
- Applicability of statistical test used?
13Findings of Schor and Karten
- 28 of papers acceptable
- 68 deficient but acceptable if reviewed
- 4 unsalvageable
Lesson
must be exercised when reading scientific papers
in biomedical journals! Knowledge of basic
biostatistics is required
CARE
14 There are three kinds of lies lies, damned
lies and statistics Benjamin Disraeli It is
easy to lie with statistics, but it is easier to
lie without them Frederick Mosteller Statisti
cal thinking will one day be as necessary for
efficient citizenship as the ability to read and
write. H.G. Wells
15Types of statistical methods
1. Descriptive statistical methods data
collection and organization summarizing data
and describing its characteristics
presentation and publication 2. Exploratory
data analysis play around and get a feel of
the data preliminary analysis, often
graphical looking for patterns and possible
relationships are assumptions
satisfied? which model and procedure to use?
163. Inductive (inferential) statistical methods
Statistical inferences about a population based
on information from a sample derived from that
population
Population
inductive statistical methods
- estimation, confidence intervals
- hypothesis testing
- prediction, forecasting
- classification
sample
17Topic 2 Types of data
Sources of data, the raw materials of
statistics Routinely kept records, e.g.,
hospital medical records Surveys
Experiments Clinical trials Data
base Published reports
Any characteristic that can be measured or
classified into categories is called a variable
18Types of variables
(1) Qualitative variables cannot be measured
numerically categorical in nature, e.g.,
gender categories must not overlap and must
cover all possibilities
w Nominal variables (No inherent ordering of
categories) M/F, Yes/No Blood group (A,
B, AB, O) Ethnic group (Chinese, Malay,
Indian, Others)
w Ordinal variables (Categories are ordered in
some sense) response to treatment
unimproved, improved, much improved pain
severity no pain, slight pain, moderate pain,
severe pain
19(2) Quantitative variables can be measured
numerically, e.g., weight, height,
concentration can be continuous or discrete
w a continuous variable can take on any value
(subject to precision of measuring instrument)
within some range or interval, e.g., weight,
height, blood pressure, cholesterol level w a
discrete variable is usually a count of something
and hence takes on integer values only, e.g.,
number of admissions to NUH
Variable types and measurement types have
implications on how data should be displayed or
summarized determines the kind of
statistical procedures that should be used
20SUMMARY
Variable
Types of variables
- Qualitative
- or categorical
Quantitative measurement
Nominal (not ordered) e.g. ethnic group
Ordinal (ordered) e.g. response to treatment
Discrete (count data) e.g. number of admissions
Continuous (real-valued) e.g. height
Measurement scales
21Topic 3 Presenting data graphically Advantages
of graphical data display
- Let data speak for itself
- Get a good feel of the data before formal
analysis - Graphs and plots easier to understand and
interpret - Reveal patterns in data which may shed light
on the appropriate model/analysis to use
e.g., Skewed or symmetric distribution
Multiple peaks / mode Are there any
outliers ? Relatioship between variables.
22Graphs for categorical data
23(No Transcript)
24(No Transcript)
25(No Transcript)
26- Comparison of methods
- Bar charts can be read more accurately and
offer better distinction between close together
values - Pie charts especially useful for showing
percentage distribution - Pie charts can display large and small
simultaneously without scale break - A single bar chart is preferable to a single
segmented bar chart - A series of segmented bar charts is easier
to read than a series of pie charts or ordinary
bar charts
27(No Transcript)
28Variation of the basic bar chart
29(No Transcript)
30(No Transcript)
31(No Transcript)
32Plotting by sector rather than by
profession Look at the data from a
different angle Highlight different aspects
of the data
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37A back to back bar chart
Source JAMA, 1978, vol 239, no 21
38Comparison of methods Stacked bar chart is also
a bar chart for the combined data Some of the
bars in a stacked bar chart are not aligned Bars
in clustered bar charts are aligned but it is
harder to visualize how the component bars would
stack up Back to back bar charts are applicable
when there are 2 groups only, the aggregated bars
are not aligned Series of stacked or segmented
bar charts useful in showing time trend
39Time Trend
Exaggerate visually the increase in
prescriptions written per person by starting at 8
rather than 0
40Stacked bar chart of yearly mortality rate per
1000 births
Pagano Gauvreau (1999) Principles of
Biostatistics, Duxbury.
41- Response under two treatments
Response to Treatment None Partial Complete Tot
al
Treatment
A 3 15 9 27
B 2 22 30 54
42A misleading bar chart
By design, there are twice as many patients
receiving treatment B
43Can compare the response type percentages for the
two treatments
44(No Transcript)
45Graphs for quantitative data Histogram
Frequency polygon Box plot
46Histogram Divide the range of the data into a
suitably chosen number of intervals/bins, all of
the same width The number of observations that
fall within each interval is plotted
Relative frequency histogram Plot the proportions
of observations that fall within the class
intervals
47Wild Seber (2000) Chance Encounters, Wiley.
48(No Transcript)
49(No Transcript)
50Comparison of methods
Histogram good at revealing distributional
shape such as symmetry, skewness, number of peaks
etc difficult to superimpose or draw side by
side Frequency polygons can be superimposed
for easy comparison
51Wild Seber (2000, p.59)
52Can be superimposed
Pagano Gauvreau (1999)
53Wild Seber (2000)
54The median is the middle value (if n is odd) or
the average of the two middle values (if n is
even), it is a measure of the center of the
data
Median and quartiles
Sort the data in increasing order
- Quartiles dividing the set of ordered values
into 4 equal parts
Q2 second quartile median
first 25
second 25
third 25
fourth 25
Q1
Q2
Q3
IQR Interquartile range
55Box plot Draw a box from the lower quartile to
the upper quartile and a line to mark the
position of the median Extend from both edges of
the box by 1.5 IQR, pull back the lines until
they hit observation Observations more than 1.5
IQR away from the lower or upper quartile are
marked out as outside values for further
investigation and checking
56How a boxplot is constructed (Wild Seber, 2000,
p.73)
5-Number Summary min, lower quartile, median,
upper quartile, max
57(No Transcript)
58Advantages of box plot quick visual summary of a
data set capture prominent features like
location, spread, skewness and outliers can
easily draw a series of box plots side by side
not so for histograms
59Dataset Hotdogs
60Graphical Analysis of the Hotdogs data.
61Parallel Box plots Can Be Quite Revealing
Rice (1995) Mathematical Statistics Data
Analysis, Duxbury Press.
1969
1972
Reduction in concentration through time Higher
during winter months Skewed toward higher
value Spread increases with level
(Parallel histograms much harder to visualise)