Title: Mahasin Mujahid
1 Biostatistics Academic Preview Session 1
Introduction to Biostatistics
- Mahasin Mujahid
- Doctoral Candidate
- Department of Epidemiology
2Outline
- Survey
- Math refresher quiz
- Introduction to Statistics
- The what and why of biostatistics
- Misconceptions and misuses of biostatistics
- How to properly use biostatistics
- Overview of BIOS 503/553
- Study Design
- Types of epidemiologic studies
- Statistical Sampling
- Probability
- Basic definition and rules or probability
- Commonly used probability distributions
3What is biostatistics
- Statistics is the science and art of collecting,
summarizing, and analyzing data that are subject
to random variation - Biostatistics is the application of statistics
and mathematical methods to the design and
analysis of health, biomedical, and biological
studies
4Statistics is an art
- Whether you apply statistics to biological or
other processes, it is the art of decision making
in the face of uncertainty.
5Components of Biostatistics
6Why Biostatistics
- Its a required element of your training here at
the University of Michigan School of Public
Health - You will all either do statistical analysis or
hire a biostatistician as a public health
professional - If you dont know it and understand it, there
will be consequences
7What biostatistics has to offer
- Help in developing concrete objectives and data
acquisition methods that meet the objectives - Appropriate experimental and study design
- Source of bias
- Measurement issues
- Efficiency/power
- Maximizing use of a given number of subjects
- Interpretability of findings
- Reproducibility of analyses
8What biostatistics has to offer (cont)
- Increase the likelihood that the sample will
yield estimates of adequate precision to make
experiments conclusive/affect medical practice - More efficient use of the data
- Formulate analysis plans without making
inappropriate assumptions - Estimate sample size
9What biostatistics has to offer (cont)
- Place limits on effects of chance in small sample
experiments - Determine sample size needed to detect clinically
relevant effects - Control for confounding variables
- Measure intangible information such as
intelligence, depression, and well being, stress
10Example
- How are my 10 patients doing after I put them on
an anti-hypertensive medications? - Describe the results of your 10 patients
- How do patients with high blood pressure respond
after being on anti-hypertensive medications (for
some follow up period)? - Need to take a sample of patients of interest to
approximate what would be observed had all such
patients been treated that way
11Example
- What is the in hospital mortality rate after open
heart surgery at my hospital so far this year - Describe the mortality
- What is the in hospital mortality after open
heart surgery likely to be this year, given
results from last year - Estimate probability of death for patients like
those seen in the previous year.
12The bottom line
- When analyzing data, your goal is simple You
wish to make the strongest possible conclusions
from limited amounts of data. - How does one achieve this goal?
13Biostatistics and Public Health
Epidemiology
Biostatistics
Health Management and Policy
Health Behavior and Health Education
Environmental Health Sciences
14Misuse of statistics
- About 25 of biological research is flawed
because of incorrect conclusions drawn from
confounded experimental designs and misuse of
statistical methods
15Misuse
- Statistics is not your worst nightmare or the
answer to all of your problems - Statistical significance does not equal clinical
or theoretical significance.
16How to properly useBiostatistics
- Develop an underlying question of interest
- Generate a hypothesis
- Design a study
- Collect Data
- Analyze Data
- Descriptive statistics
- Statistical Inference
17Define problems, Questions, and Research aims
The Role of Statistics in the Scientific Method
Review the literature
Statististical Methods, Measurement tools and
models
Develop a hypothesis
Design experiments or other tests
Revise or modify Protocol
Collect and record data
Peer review
Replication of results
Analyze and interpret data
Public understanding of research
Scientific impact of research
Disseminate results
Investigating Research Integrity (2001)
18Screening Statistics
- Who said so
- How do they know
- Whats missing
- Does the conclusion make sense
- Association does not establish causation
19How to lie with statistics by Darrel Huff
- The secret language of statistics, so appealing
in a fact-minded culture, is employed to
sensationalize, inflate, confuse, and
oversimplify
20A little humor!!!!!
- There are three types of lies white lies, damn
lies, and statistics (Benjamin Disraeli) - Statistics are like a bikini. What they reveal is
suggestive, but what they conceal is vital.
(Aaron Levenstein)
21A little humor (cont)!!!!!
- Top ten reasons to be a statistician
- Estimating parameters is easier than dealing with
real life. - Statisticians are significant
- I always wanted to learn the entire Greek
alphabet. - The probability a statistician major will get a
job is gt .9999. - If I flunk out I can always transfer to
Engineering. - We do it with confidence, frequency, and
variability. - You never have to be right - only close.
- We're normal and everyone else is skewed.
- The regression line looks better than the
unemployment line. - No one knows what we do so we are always right.
- http//www.workjoke.com/projoke48.htm
22Choosing BIOS 503/553
- Biostatistics 503 and 553 are both introductory
courses for non-majors that assume no prior
course work in biostatistics or statistics.
Either course satisfies the school-wide
requirement for biostatistics however, some
departments require 553 instead of 503. While
both courses cover the same statistical topics
and methods, 553 assumes stronger preparation in
mathematics and goes into greater depth in
statistics. The prerequisite for 503 is
elementary (high school) algebra. Students taking
553 need to have had one term of calculus and be
comfortable with function notation and algebra.
The stronger mathematical prerequisite for 553
allows time for more detailed study. Students who
have satisfied the calculus prerequisite are
strongly encouraged to enroll in 553.
Biostatistics 503 and 553 are offered only in the
fall term.
23Biostatistics 503 Applied Biostatistics Course
Outline Fall, 2003 Lecture Four days a week,
M-Th, SPH II Auditorium. Section 1 meets 8-9am,
Section 2 9-10am. Students may attend either
lecture regardless of how they are
registered. Lab Meets once per week. Students
must attend lab they are registered for. Labs
start Thursday, September 4. Required texts
Introduction to the Practice of Statistics, 4th
Edition. Moore McCabe, CoursePack Computer Lab
Manual, available at Ulrichs. Recommended
text SPSS Manual for Moore McCabes
Introduction to the Practice of Statistics, 4th
Ed., Rogness, Stephenson Stephenson Prerequisit
es Knowledge of algebra (GRE/GMAT quantitative
score above the 50th-ile, or passing the algebra
placement exam). Calculator square root, log
(natural), exponential, y to the x. Web Site We
make extensive use of a Coursetools web site. Go
to www.coursetools.ummu.umich.edu then go to
Biostatistics 503. The site is also linked from
my homepage, www.sph.umich.edu/nichols.
24Final Grade The final grade is an equal
weighting Homework and Labs 25 and each of the
three exams (3x25). Exams/Grading Wt. Homework
labs 25 Handed in every week Exam 1 25
Covers weeks 1-5. Exam 2 25 Covers weeks
6-10. Final 25 Comprehensive, but emphasis on
weeks 11-15
25BIOS 503 Outline
Introduction to the Practice of Statistics, 4th
Edition. Moore McCabe,
26Success in Biostatistics 503/553
- Dos
- Do required reading before class
- Attend all lectures
- Get very familiar with your calculator
- Write out all given information when working out
a problem - Ask questions!!!!!!!!
- Donts
- Skip class
- Skip reading assignments
- Skip homework assignments
- Rely only on the lecture to prepare for exams
- Wait until after the 1st exam to panic
27Study Design
28Designing a study
- Statistical designs for producing trustworthy
data are perhaps the single most influential
contribution of statistics to the advancement of
knowledge (Moore and McCabe 1993)
29Statistical Considerations in Study Design
- Type of study design
- Experiment
- Observational study
- Sample size/power analysis
- How many individuals to include in your study
- Sampling techniques
- How to identify a sample of individuals to
include in your study
30Epidemiologic Study Designs
Observational Studies
Experimental Studies
Descriptive
Analytic
Laboratory Clinical Trials Field
Trials Intervention Trials
Cohort Case-Control
Case Report Case Series Cross-sectional Correlativ
e
31Sampling
- The purpose of sampling is to examine some
portion of the population and to extend the
knowledge obtained from the sample to the
population at large.
32Sampling (cont)
- It may not be practical or feasible to analyze
the entire population - Physically impossible
-
- Ethical Considerations
33The language of sampling
- population the entire collection of things of
interest - population parameter a number that results from
measuring all the units in the population - sampling frame the specific data from which the
sample is drawn - unit of analysis the type of object of interest
- (persons with condition x, animals, genes/cells)
- sample a subset of some of the units in the
population - statistic is a number that results from
measuring all the units in the sample
34Relationship between population and sample
35Example
- For example, to find out the average age of all
motor vehicles in the state in 1997 - Populationall motor vehicles in the state in
1997 - Sampling frameall motor vehicles registered with
the DMV on July 1, 1997 - Unit of analysismotor vehicle
- Sample300 motor vehicles
- Statisticthe average age of the 300 motor
vehicles in the sample - Parameterthe true average age of all motor
vehicles in the state-1997
36Sampling Techniques
Population
Simple Random Sample
Systematic Sampling
Stratified Random Sample
Convenience Sampling
Cluster Sampling
Bias free sample
Bias free sample
Biased sample
Bias free sample
Biased sample
37Bias
- Any trend in the collection, analysis,
interpretation, publication or review of data
that can lead to conclusions that are
systematically different from the truth (Las,
2001) - A systematic error in design or conduct of a
study (Szklo et al, 2000)
38Handling bias
- Design stage
- Choosing a strong study design (RCT)
- Selection stage
- Random sampling, matched pairs
- Measurement stage
- Blinding
- Analysis stage
- Multivariate analysis
39Probability
40Consider the following scenario
- One theory concerning the etiology of breast
cancer states that white women are at greater
risk of developing breast cancer. - Suppose we wish to test this hypothesis. We
identify 1500 women (45-50 years) free of breast
cancer at baseline. - 500 white
- 500 African American
- 500 Asian
41Scenario cont
- We follow women for 10 years
- 20 cases of breast cancer (white women)
- 15 cases of breast cancer (African American
women) - 10 cases of breast cancer (Asian women)
- Is this difference among the groups enough to
make you conclude that white women are at greater
risk?
42And the answer is
- Probability can help you rule out chance as an
explanation. - Definitions
- Probability-The measure of how likely it is that
an event will occur - Sample-all possible outcomes
- Event-outcome of interest
43Venn diagram for event E
Sample
44Basic Properties of Probabilities
Property 1 The probability of an event is always
between 0 and 1. Property 2 The probability of
an event that cannot occur is 0. (An event that
cannot occur is an impossible event) Property 3
The probability that an even must occur is 1 (An
event that must occur is called a certain event)
45Two events A and B are mutually exclusive if they
cannot both happen at the same time. A and B are
thus said to be mutually exclusive or disjoint.
When two events A and B are mutually exclusive,
the probability of A or B occurring isP
(AUB)P(A)P(B)
46When two events A and B can both occur
simultaneously the two events are not mutually
exclusive
If P(AnB)0 then events A and B are mutually
exclusiveIf P(AnB)?0 then A and B are not
mutually exclusive
47The complement of an event is the probability
that an event doesnt occur
EEc1-P(E)
48The multiplicative law of probability
- Two events A and B are said to be independent if
the fact that A occurs has no affect on B
occurring. - P(AnB)P(A) x P(B)
49Example 1 Considering a deck of playing cards
50P (king is selected)
51P (face card is selected)
52Probability distributions
- Probability distributions are fundamental to the
practice of statistics - Used in descriptive statistics (i.e the mean is
based on assuming a normal distribution - Used in inferential statistics
- Estimation (i.e constructing confidence
intervals) - Inference (calculating test statistics and
p-values for hypothesis testing
53Probability distribution
- A probability distribution describes the possible
events in a sample and the frequency at which
they occur - A probability distribution describes such for a
random variable
54Random Variables
A random variable, x is the numerical outcome of
a probability experiment.
-
- x The number of people in a hospital.
- x The time it takes to exercise
- x The number of trips to doctor you make per
year
55Types of Random Variables
A random variable is discrete if the number of
possible outcomes is finite or countable.
Discrete random variables are determined by a
count.
A random variable is continuous if it can take on
any value within an interval. The possible
outcomes cannot be listed. Continuous random
variables are determined by a measure.
56Types of Random Variables
Identify each random variable as discrete or
continuous.
-
- x The number of people in a car.
- x The time it takes to drive from home to
school - x The number of trips to school you make per
week
57Types of probability distributions
- A probability distribution can either be
discrete or continuous based on the type of
random variable it represents.
Discrete
Continuous
Normal Distribution
Binomial Distribution
T or F Distribution
Poisson Distribution
Chi-square Distribution
58Discrete Probability Distributions
A discrete probability distribution lists each
possible value of the random variable, together
with its probability.
A survey asks a sample of families how many
vehicles each owns.
number of vehicles
59Example 1
- Example Suppose that a coin is tossed twice so
that the sample. Let X (the discrete random
variable) be the number of heads which can come
up. Write out the probability distribution for
the random variable X.
60Binomial Experiments
Characteristics of a Binomial Experiment
- There are a fixed number of trials. (n)
- The n trials are independent and repeated under
identical conditions - Each trial has 2 outcomes, S Success or F
Failure. - The probability of success on a single trial is
p. P(S) p - The probability of failure is q. P(F) q
where p q 1 - The central problem is to find the probability of
x successes out of n trials. Where x 0 or 1 or
2 n.
The random variable x is a count of the number
of successes in n trials.
61Continuous Probability Distributions
A continuous probability distribution provides a
shape describing the distribution of a continuous
random variable X. Thus X can take on a range of
values within an interval.
62Normal Distribution
63Normal distribution
- bell-shaped
- symmetrical about the mean
- total area under curve 1
- approximately 68 of distribution is within one
standard deviation of the mean - approximately 95 of distribution is within two
standard deviations of the mean - approximately 99.7 of distribution is within 3
standard deviations of the mean - Mean Median Mode
64Empirical Rule
65The Standard Score
The standard score, or z-score, represents the
number of standard deviations a random variable x
falls from the mean.
The test scores for a civil service exam are
normally distributed with a mean of 152 and
standard deviation of 7. Find the standard
z-score for a person with a score of (a) 161
(b) 148 (c) 152
66The Standard Normal Distribution
The standard normal distribution has a mean of 0
and a standard deviation of 1.
Using z- scores any normal distribution can be
transformed into the standard normal distribution.
z
67Basic Properties of the Standard Normal Curve
68Next session
- Descriptive statistics
- The what and why of descriptive statistics
- Types of variables
- Formulas and interpretations of commonly used
descriptive statistics - Pictorial representations of descriptive
statistics - Examining the relationship between two or more
variables