Title: Topics in Biostatistics Part 2
1Topics in BiostatisticsPart 2
- Sarah J. Ratcliffe, Ph.D.
- Center for Clinical Epidemiology and
Biostatistics - University of Penn School of Medicine
2Outline
- Hypothesis testing
- Examples
- Interpreting results
- Resources
3Hypothesis testing
- Steps
- Select a one-sided or two-sided test.
- Establish the level of significance (e.g., ?
.05). - Select an appropriate test statistic.
- Compute test statistic with actual data.
- Calculate degrees of freedom (df) for the test
statistic.
4Hypothesis testing
- Steps contd
- Obtain a tabled value for the statistical test.
- Compare the test statistic to the tabled value.
- Calculate a p-value.
- Make decision to accept or reject null hypothesis.
5Hypothesis testing
- Steps
- Select a one-sided or two-sided test.
- Establish the level of significance (e.g., ?
.05). - Select an appropriate test statistic.
- Compute test statistic with actual data.
- Calculate degrees of freedom (df) for the test
statistic.
6Hypothesis testing One-sided versus Two-sided
- Determined by the alternative hypothesis.
- Unidirectional one-sided
- Example
- Infected macaques given vaccine or placebo.
Higher - viral-replication in vaccine group has no benefit
of - interest.
- H0 vaccine has no beneficial effect on
viral-replication levels at 6 weeks after
infection. - Ha vaccine lowers viral-replication levels by 6
weeks after infection.
7Hypothesis testing One-sided versus Two-sided
- Bi-directional two-sided
- Example
- Infected macaques given vaccine or placebo.
- Interested in whether vaccine has any effect on
viral- - replication levels, regardless of direction of
effect. - H0 vaccine has no beneficial effect on
viral-replication levels at 6 weeks after
infection. - Ha vaccine effects the viral-replication levels.
8Hypothesis testing
- Steps
- Select a one-sided or two-sided test.
- Establish the level of significance (e.g., ?
.05). - Select an appropriate test statistic.
- Compute test statistic with actual data.
- Calculate degrees of freedom (df) for the test
statistic.
9Hypothesis testing Level of Significance
- How many different hypotheses are being
examining? - How many comparisons are needed to answer this
hypothesis? - Are any interim analyses planned?
- e.g. test data, depending on results collect more
data and re-test. - gt How many tests will be ran in total?
10Hypothesis testing Level of Significance
- ?total desired total Type-I error (false
positives) for all comparisons. - One test
- ?1 ?total
- Multiple tests / comparisons
- If ?i ?total, then ??i gt ?total
- Need to use a smaller ? for each test.
11Hypothesis testing Level of Significance
- Conservative approach
- ?i ?total / number comparisons
- Can give different ?s to each comparison.
- Formal methods include Bonferroni, Tukey-Cramer,
Scheffes method, Duncan-Walker. - OBrien-Fleming boundary or a Lan and Demets
analog can be used to determine ?i for interim
analyses. - Benjamini Y, and Hochberg Y (1995) Controlling
the false discovery rate a practical and
powerful approach to multiple testing. JRSSB,
57125-133.
12Hypothesis testing
- Steps
- Select a one-tailed or two-tailed test.
- Establish the level of significance (e.g., ?
.05). - Select an appropriate test statistic.
- Compute test statistic with actual data.
- Calculate degrees of freedom (df) for the test
statistic.
13Hypothesis testing Selecting an Appropriate test
- How many samples are being compared?
- One sample
- Two samples
- Multi-samples
- Are these samples independent?
- Unrelated subjects in each sample.
- Subjects in each sample related / same.
14Hypothesis testing Selecting an Appropriate test
- Are your variables continuous or categorical?
- If continuous, is the data normally distributed?
- Normality can be determined using a P-P (or Q-Q)
plot. - Plot should be approximately a straight line for
normality. - If not normal, can it be transformed to
normality? - Blindly assuming normality can lead to wrong
conclusions!!!
15Hypothesis testing Selecting an Appropriate test
Approximately a straight line normal assumption
okay
16Hypothesis testing Selecting an Appropriate test
Not a straight line NOT normal Can it be
transformed to normality?
17Hypothesis testing Selecting an Appropriate test
The natural log transform of the data is
approximately a straight line normal assumption
okay Analyze the transformed data NOT the
original data.
18Hypothesis testing Geometric versus Arithmetic
mean
- Geometric mean of n positive numerical values is
the nth root of the product of the n values. - Geometric will always be less than arithmetic.
- Geometric better when some values are very large
in magnitude and others are small. - If geometric is used, log-transform the data
before analyzing. - Arithmetic mean of log-transformed data is the
log of the geometric mean of the data - E.g. t-test on log-transformed data test for
location of the geometric mean - Langley R., Practical Statistics Simply
Explained, 1970, Dover Press
19Source Richardson Overbaugh (2005). Basic
statistical considerations in virological
experiments. Journal of Virology, 29(2) 669-676.
20Hypothesis testing Selecting an Appropriate test
- Other tests are available for more complex
situations. For example, - Repeated measures ANOVA gt2 measurements taken
on each subject usually interested in time
effect. - GEEs / Mixed-effects models gt2 measurements
taken on each subject adjust for other
covariates.
21Hypothesis testing
- Steps
- Select a one-tailed or two-tailed test.
- Establish the level of significance (e.g., ?
.05). - Select an appropriate test statistic.
- Run the test.
22Example 1
- Expression of chemokine receptors on CD14/CD14-
populations of blood monocytes. - Percent of cells positive by FACS.
23(No Transcript)
24Example 1 contd
- Continuous data, 2 samples
- gt t-test, if normal OR
- gt Wilcoxon rank sum or signed-rank sum test, if
non-normal - Are samples independent or paired?
- If independent, can test for equality of
variances using a Levenes test
25Example 1 contd
1-sided or 2-sided test
- T-tests in excel
- TTEST(L6L15,M6M15,2,2)
Type of t-test 1 paired 2 independent, equal
variance 3 independent, unequal variance
Cells containing data from sample 1
Cells containing data from sample 2
26(No Transcript)
27Example 1 contd
- Possible results for different assumptions
28Example 1 contd
- Which result is correct?
- Data are paired
- The differences for each subject are normally
distributed. - gt Paired t-test
- p .0095
- There is a difference in the percentage of
positive CD14 and CD14- cells.
29A graph of the 95 CIs for the means would give
the impression there is no difference
30When its really the differences we are testing.
31Example 1 contd
- Note paired tests dont always give lower
p-values. - A 1-sided test on the CCR5 values would give
p-values of - p 0.06 independent samples
- p 0.11 paired samples
- WHY?
32Example 1 contd
- The differences have a larger spread than the
individual variables.
33Example 2
- Does the level of CCR5 expression on PBLs (basal
or upregulated using lentiviral vector) determine
the of entry that occurs via CCR5? - Two viruses
- 89.6
- DH12
34Example 2 contd
35Example 2 contd
- How do we know if the slope of the line is
significantly different from 0? - Can perform a t-test on the slope estimate. For
simple linear regression, this is the same as a
t-test for correlation ( square root of R2).
36Example 2 contd
37Interpreting Results
- P-values
- Is there a statistically significant result?
- If not, was the sample size large enough to
detect a biologically meaningful difference?
38Online Resources
- Power / sample size calculators
- http//calculators.stat.ucla.edu/powercalc/
- http//www.stat.uiowa.edu/rlenth/Power/
- Free statistical software
- http//members.aol.com/johnp71/javasta2.htmlFreeb
ies
39BECC Consulting Center
- www.cceb.upenn.edu/main/center/becc.html
- Hourly fee service
- Design and analysis strategies for research
proposals - Selecting and implementing appropriate
statistical methods for specific applications to
research data - Statistical and graphical analysis of data
- Statistical review of manuscripts.