Introduction to Biostatistics

About This Presentation

Title:

Introduction to Biostatistics

Description:

Title: Review of key biostatistical concepts relevant to EBM Author: Haroon Saloojee Last modified by: Wits-Admin Created Date: 7/11/2004 8:23:31 PM – PowerPoint PPT presentation

Number of Views:371

Avg rating:3.0/5.0

Slides: 69

Provided by: HaroonS

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Biostatistics

1
Introduction to Biostatistics
Prof Haroon Saloojee Division of Community
Paediatrics
2
Introduction to BiostatisticsLecture 1
Summarising your data 1
3
The evidence-based clinicians motto

In God we trust.
All others must bring data.

4
Challenges

Statistical ideas can be difficult and
intimidating
Thus
Statistical results are often skipped-over when
reading scientific literature
Data is often misinterpreted

5
Misinterpretation of Data

Celebrating birthdays is healthy

Statistics show that those that celebrate the
most birthdays, live the longest.
6
You may think that

A Bar Chart is a map of the locations of the
nearest taverns
A p-value is the result of a urinalysis
A t-test is a taste test between rooibos tea and
Five Roses tea

7
Course Structure

BIO-SADISTICS
Four 45-minute lectures
PowerPoint presentations on student web site
Some text (content) also on web page
Plus, additional internet links

8
Syllabus for the Course

?? SESSION 1 Summarizing your data 1
Types of data (quantitative and categorical
variables)
Describing data- average (mean, median, and mode)
Displaying data graphically (box plots,
histograms, bar charts, pie diagrams)
Frequency distributions
SESSION 2 Summarizing your data 2
The normal distribution
Describing data spread (range, variance,
standard deviation, z score)
Quartiles, percentiles
Standard error of the mean
Confidence intervals

SESSION 3 Sampling principles
Study Population
The sample
Random sampling
Non random sampling
Sampling bias
Sample size and power
SESSION 4 Statistical tests and the concept of
significance
Hypothesis testing
p value
Statistical versus clinical significance
Parametric versus non-parametric methods

9
Free textbook on-line
Statistics at Square One
http//bmj.bmjjournals.com/collections/statsbk/ind
ex.shtml
10
http//www.medstatsaag.com/mcqs.asp
Relevant topics Handling data 1, 4, 5, 6,
7 Sampling 10, 11 Hypothesis testing 17, 18
11
Todays Lecture

What types of data are there?
(numerical vs. categorical variables)
Describing data - measures of central tendency
(mean, median and mode)
Summarising data graphically (histograms, box
plots, bar charts, pie diagrams)

12
Types of data
13
Types of Data

Numerical data
Discrete
Examples
No. of children
No. asthma attacks in a week
No. of rooms in home

14
Types of Data

Numerical data
Continuous
Any value on the continuum is possible (even
fractions or decimals)
Examples
Weight
Age
Temperature
Heart rate

15
Types of Data

Categorical data
Nominal
Mutually exclusive unordered categories
Examples
Sex (male, female)
Eye colour (brown, grey, green, blue)
Are you happy? (Yes, No)
Diarrhoea (Present, absent)
Can summarize in
Tables using counts and percentages
Bar Chart

16
Types of Data

Categorical data
Ordinal (ordered categories)
Examples
Degree of agreement
(Strongly Agree, Agree, Disagree, Strongly
disagree)
Severity of injury
Severe, Moderate, Mild
Income level
High, medium, low

17
PRACTICE
Discrete or Continuous ?
Nominal or Ordinal?

mg of tar in cigarettes
number of people in a car
high to low temperature in
any day
weight
time
number of children in the
average family

Average / above avg / below average
Colours of Smarties
Grades (A, B, C, D, F)

Continuous
Ordinal
Discrete
Nominal
Continuous
Ordinal
Continuous
Continuous
Discrete
18
(No Transcript)
19
Data Summaries

It is ALWAYS a good idea to summarise your data
You become familiar with the data and the
characteristics of the people that you are
studying
You can also identify problems or errors with the
data (data management issues).

20
Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)
Mean
Median
Mode

21
Definitions

The arithmetic mean is what is commonly called
the average. The mean is the sum of all the
scores divided by the number of scores.
The median is the middle of a distribution half
the scores are above the median and half are
below the median.
The mode is the most frequently occurring score
in a distribution

It has been said that a fellow with one leg
frozen in ice and the other leg in boiling water
is comfortable
on average.
J.M. Yancy

23
Sample Mean X

The Average or Arithmetic Mean
Add up data, then divide by sample size (n)
The sample size n is the number of observations
(pieces of data)
?? Example
Systolic blood pressures (mmHg)
X1 120
X2 80
X3 90
X4 110
X5 95
n 5

24
Notation
S (sigma) denotes the summation of a set of
values x is the variable usually used to
represent the individual data values n
represents the number of data values in a
sample N represents the number of data values in
a population
x is pronounced x-bar and denotes the mean of
a set of Sample values

µ is pronounced mu and denotes the mean of all
values in a population

25
Definitions

Mean
the value obtained by adding the scores and
dividing the total by the number of scores

S x
x
Sample
n
S x
µ
Population
N
26
Notes on Sample Mean

Also called sample average or arithmetic mean
Sensitive to extreme values
- One data point could make a great change in
sample mean
Why is it called the sample mean?
To distinguish it from population mean

27
Population Versus Sample

Population - The entire group you want
information about
For example The blood pressure of all
20-year-old male university students in South
Africa
Sample - A part of the population from which we
actually collect information and draw conclusions
about the whole population
For example Sample of blood pressures (n50)
of 20-year-old male university students in South
Africa
The sample mean X is not the population mean µ

28
Population Versus Sample

We dont know the population mean µ but would
like to know it
We draw a sample from the population
We calculate the sample mean X
How close is X to µ?
Statistical theory will tell us how close X is to
µ
Statistical inference is the process of trying to
draw conclusions about the population from the
sample

29
Weighted Mean
S (w x)
x
S
w
Your grade in many courses are weighted means
(averages). In other words, some things count
(are weighted) more than others.
30
Geometric Means
These are histograms rotated 90º, and box
plots. Note how the log transformation gives a
symmetric distribution.
31
(No Transcript)
32

5 5 5 3 1 5 1 4
3 5 2
1 1 2 3 3 4 5 5
5 5 5
(in order)
exact middle MEDIAN is 4

1 1 3 3 4 5 5 5
5 5
no exact middle -- shared by two numbers
MEDIAN is 4.5

4 5
4.5
2
33
Mode

The score that occurs most frequently
Bimodal
Multimodal
No Mode
The only measure of central tendency that can be
used with nominal data

34
Examples

Mode is 5
Bimodal 2 6
No Mode

a. 5 5 5 3 1 5 1 4 3 5 b.
2 2 2 3 4 5 6 6 6 7 9 c.
2 3 6 7 8 9 10

Mode is 3
No Mode

d. 2 2 3 3 3 4 e. 2 2 3
3 4 4 5 5
35
Shapes of the Distribution
36
Shapes of the Distribution
37
Distribution Characteristics
38
Shapes of the Distribution
Example Height of students in the class
39
Shapes of the Distribution
Example Serum cholesterol level
40
Shapes of the Distribution
Example Birth weight of newborn babies
41
Shapes of the Distribution
42
(No Transcript)
43
Some visual ways to summarize data

Tables
Frequency table
Graphs
Histograms
Bar graphs
Box plots
Line plots
Scatter graphs
Charts
Bar chart
Pie diagram

44
Frequency Tables

Summarizes a variable with counts and percentages
The variable is categorical
Note that you can take a continuous variable and
create categories with it
How do you create categories for a continuous
variable?
Choose cutoffs that are biologically meaningful
Natural breaks in the data

45
Example of frequency table
When raw data are arranged with frequencies, they
are said to form a frequency table for ungrouped
data. When the data are divided into groups/
classes, they are called grouped data. The
classes have to be decided according to the range
of data and size of class. The number of
observations lying in a particular class is
called its frequency and the table showing
classes with frequencies is called a frequency
table. The total of frequencies of a particular
class and of all classes prior to that class is
called the cumulative frequency of that class.
46
Graphical Summaries

Histograms
Continuous or ordinal data on horizontal axis
Bar Graphs
Nominal data
No order to horizontal axis
Box Plots
Continuous data

47
Histogram
A histogram is a graphic representation of the
frequency distribution of a variable. Vertical
rectangles (bars) are drawn in such a way that
their bases lie on a linear scale representing
different intervals, and their heights are
proportional to the frequencies of the values
within each of the intervals.
48
Bar Chart
A bar chart is a method of presenting discrete
data organized in such a way that each
observation can fall into one of mutually
exclusive categories. The frequencies (or
percentages) are listed along the Y axis and the
categories of the variable along the X axis. The
heights of the bars correspond to the
frequencies. The bars should be of equal width
and they should not be touching me other bars.
49
Difference between bar chart and histogram

Bar charts for categories that are separate
Histograms if you got categories by dividing up
continuous data.
Bars do not touch, histogram rectangles do touch.

50
Line graph
If the mid-points of the top of the bars of a
histogram are connected together by a line and if
the bars were omitted from the display, the
resultant graph will be a line graph (also called
a frequency polygon). Line graphs are good at
showing trends over a period of time. When trends
of rates (e.g. death rate, Infant Mortality Rate,
etc.) are to be displayed it is better done with
line graphs rather than histograms.
51
Scatter plot
Also called a scattergram. This a method of
displaying the distribution of two variables in
relation to each other another. The value of one
variables is measured on the X axis and the
values of the other on the Y axis. The variables
have to be on a continuous scale. Each plot thus
has two values (coordinates) from the Y and X
axis scales. A wide scatter of the plots denotes
poor correlation between the two variables. If
the two variables are perfectly correlated, then
all the plots will fall on the diagonal
(regression line).
52
Survival curve
53
Pie chart
This is a circular diagram (can be shown as 2-D
or 3-D) divided into segments, each representing
a category or subset of data (part of the whole).
The amount for each category is proportional to
the area of the sector (slice of the pie). The
total area of the circle is 100 and it
represents the total population that is being
shown.
54
Pictures of DataContinuous Variables

Histograms
Means and medians do not tell whole story
Differences in spread (variability)
Differences in shape of the distribution

55
How to Make a Histogram

Divide range of data into intervals (bins) of
equal width
Count the number of observations in each class
Draw the histogram
Label scales

56
Pictures of Data Histograms
57
Pictures of Data Histograms
58
Pictures of Data Histograms
59
Box plot

Another common visual display tool is the box
plot
Gives good insight into distribution shape in
terms of skewness and outlying values
Very nice tool for easily comparing distribution
of continuous data in multiple groups can be
plotted side by side

60
Box plot
A box plot provides an excellent visual summary
of many important aspects of a distribution. The
box stretches from the lower hinge (defined as
the 25th percentile) to the upper hinge (the 75th
percentile) and therefore contains the middle
half of the scores in the distribution. The
median is shown as a line across the box.
Therefore 1/4 of the distribution is between this
line and the top of the box and 1/4 of the
distribution is between this line and the bottom
of the box.
61
Hospital Length of Stay
62
Box plot Length of Stay
63
Box plot Length of Stay
64
Misuse of graphics

" It pays to be wide awake in studying any graph.
The thing looks so simple, so frank, and .so
appealing. that the careless are easily fooled. "
- M J Moroney.
Graphs and charts are often misused. The honest
researcher must have a good handle on how graphs
can be used to deliberately mislead people so
that such misadventures can be avoided.
Common tricks used to mislead
The problem of scaling
The Advertiser's Graph
The transformed graph
The chart with too much data

65
Which graph to use?
Statistical methods depend on the form of a set
of data, which can be assessed with some common
useful graphics Graph Name Y-axis X-axis
Histogram Count Category Scatterpl
ot Continuous Continuous Dot
Plot Continuous Category Box
Plot Percentiles Category Line Plot Mean
or value Category
66
Example of MCQ 1
The arithmetic mean of a set of values a) Is a
particular type of average.b) Is a useful
summary measure of location if the data are
skewed to the right.c) Coincides with the median
if the distribution of the data is
symmetrical.d) Is always greater than the
median.e) Cannot be calculated if the data set
contains both positive and negative values
67
Example of MCQ 2

A histogram
a) Can be used instead of a pie chart to display
categorical data.
b) Is similar to a bar chart but there are no
gaps between the bars.
c) Contains contiguous bars, with the height of
each bar being proportional to the frequency of
the observations in the range specified by the
bar.
d) Can be used to display either a frequency or a
relative frequency distribution.
e) Is used to show the relationship between two
variables.

68
Any questions?

Write a Comment

User Comments (0)

About PowerShow.com

Introduction to Biostatistics - PowerPoint PPT Presentation

Introduction to Biostatistics

Title: Review of key biostatistical concepts relevant to EBM Author: Haroon Saloojee Last modified by: Wits-Admin Created Date: 7/11/2004 8:23:31 PM – PowerPoint PPT presentation