Fundamental Concepts of Biostatistics - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Fundamental Concepts of Biostatistics

Description:

Write out using no more than one sentence per question. ... Observations may be people, ... Nonsmoker. Frequency. Mean Birth weight. Smoked during pregnancy ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 29
Provided by: biostatMc
Category:

less

Transcript and Presenter's Notes

Title: Fundamental Concepts of Biostatistics


1
Fundamental Concepts of Biostatistics
  • Cathy Jenkins, MS
  • Biostatistician II
  • Lisa Kaltenbach, MS
  • Biostatistician II
  • April 17, 2007

2
Prior to any analysis
  • Define research question(s). Write out using no
    more than one sentence per question.
  • Determine statistical analysis plan to address
    each research question.
  • Analysis

Confounder
Outcome
Predictor
3
Population versus Sample
  • Population includes all possible observations of
    a particular type.
  • Observations may be people, animals, places, or
    things
  • Ex. Men women aged 18 and older infants
    penguins bodies of water
  • Sample includes only some of the observations
    but selected in a way that gives every possible
    observation an equal chance of being observed.
  • Ex. Men women aged 18 and older in the
    TennCare Database from years 1998-2005 infants
    with a primary care physician at Vanderbilt
    during years 1990-2000 all penguins living in
    the Nashville zoo since 1995 bodies of water
    included in Bay Delta Tributaries Database

4
Population versus Sample 2
  • In most cases in clinical research, we want to
    generalize from information about our sample to
    information about a population.

5
Descriptive Statistics
  • To describe characteristics of the sample
  • Ex. demographics, distributions, frequencies
  • May want to describe data with numerical or
    graphical summary
  • Characteristics of sample may be continuous or
    categorical variables

6
Continuous Variables
  • Continuous a variable that can take on any
    number of possible values (Ex. weight).
  • Discrete Numeric a variable whose set of
    possible values is a finite sequence of numbers
    (Ex. pain scale 1 to 5).
  • Numerical Summary
  • Often want to measure central tendency of data
  • Sample mean The sum of all of the observations
    divided by the number of observations. The mean
    is only useful when the data are normally
    distributed.
  • Sample Median (50th Percentile) Order the
    observations from smallest to largest
  • If n is odd, then the median is the middle
    ordered observation.
  • If n is even, then the median is the average of
    the two middle ordered observations.

7
Continuous Variables 2
  • Other common percentiles include quartiles (25th,
    50th, and 75th percentiles) and deciles (10th,
    20th, , 90th percentiles).
  • The p-th percentile is the value that p- of the
    data are less than or equal to. If p- of the
    data lie below the p-th percentile, it follows
    the (100- p)- of the data lie above it.
  • Ex If the 85-th percentile of household income
    is 60,000 then 85 of the households have
    incomes of 60,000 or less and the top 15 of
    households have incomes of 60,000 or more.
  • Measures of Dispersion
  • When measurements are collected there will be
    scatter, dispersion, or variability.
  • Sources of dispersion
  • Random error Error due to chance
  • Systematic Error Wrong result do to bias
  • Biological variability

8
Continuous Variables 3
  • Minimum Smallest observed value
  • Maximum Largest observed value
  • Range Difference between max and min (often
    reported as (min, max))
  • Interquartile Range (IQR) Difference between
    75th and 25th percentiles (often reported as
    (25th,75th))
  • Variance The average of the squares of the
    deviations of the observations from their mean.
  • Standard Deviation the square root of the
    variance
  • Standard deviation measures spread about the mean
    and should be used only when the mean is chosen
    as the measure of center
  • Has the same unit of measurement as the mean

9
Continuous Variables 4
  • Graphical Summary

10
Categorical Variables
  • Categorical a variable having only certain
    possible values (ex. race).
  • Binary a categorical variable with only two
    possible values (ex. gender).
  • Ordinal a categorical variable for which there
    is a definite ordering of the categories (ex.
    severity of lower back pain ordered as none,
    mild, moderate, and severe).
  • Numerical Summary
  • Frequency Distribution A listing of distinct
    values for that characteristic and the number of
    observations having each value.
  • Relative frequency Proportion of the total
    number of observations that fall into each
    category.
  • Cumulative frequency Proportion of the total
    number of observations that fall into the current
    or previous categories listed (may be useful for
    ordinal variable).

11
Relationships between two variables
  • Two variables measured on the same observations
    are associated if some values of the first
    variable tend to occur more often with some
    values of the second variable than with other
    values of that variable.
  • Two continuous variables
  • Ex. Persons weight and blood pressure
  • Two categorical variables
  • Ex. gender and smoking status
  • One continuous one categorical variable
  • Ex. blood pressure and gender
  • Keep in mind - relationship between two variables
    can be strongly influenced by other variables
    that are lurking in the background

12
Two Continuous Variables
  • A scatterplot shows the relationship between two
    continuous variables measured on the same
    observations.
  • The values of one variable appear on the x-axis,
    and the values of the other variable appear on
    the y-axis.
  • Each observation appears as a point in the plot
    fixed by the values of both variables for that
    observation.

13
Two Continuous Variables 2
  • Graphical Summary Scatterplot

14
Two Categorical Variables
  • Numerical Summary
  • Cross-tabulation (or 2-way table)
  • Ex. Clinical Pregnancy by Age Groups

15
One Categorical and One Continuous variable
  • Consider descriptive statistics of the continuous
    variable separately for different values of the
    categorical variable
  • Ex Descriptive statistics of birth weight by
    smoking status during pregnancy for mothers

16
How to study the relationship between two
different variables
  • Quantify the relationship Measure the strength
    of the relationship (linear, monotonic, )
    between two continuous variables.
  • Use hypothesis testing Test theory to see if
    experimental results only reflect random chance.
  • Fit model Predict one measure of an individual
    from another.

17
Quantify the relationship
  • Correlation coefficient (r) Quantitative
    summary of the strength of the relationship
    between two continuous variables.
  • Pearson correlation focuses on the raw data.
  • Spearman correlation focuses on the ranks of
    the raw data.
  • Covariance (r2) Square of the correlation
    coefficient that defines
  • the strength or magnitude of the correlation.
  • not a cause and effect relationship but
    quantifies how well one variable predicts
    another.

18
Hypothesis testing 1
  • Define null hypothesis for question of interest
    that assumes the experimental results are due to
    chance alone.
  • Perform statistical test to determine if we can
    reject or fail to reject the null hypothesis.
  • NOTE Absence of evidence does not mean evidence
    of absence. In other words, if our test results
    in a non-significant p-value, we do not accept
    the null hypothesis. Rather we fail to reject
    the null hypothesis. It could be that for the
    same experiment but a different sample we would
    obtain significant results.
  • P-value the probability of obtaining a result
    at least as extreme as a given data point
    assuming the data point was the result of chance
    alone.

19
Hypothesis testing 2
  • Categorical data
  • Chi-square tests
  • Both row and column variables are nominal
  • Row variable nominal column variable ordinal
  • Both row and column variables are ordinal
  • Tests whether distribution of frequencies differs
    across rows (groups) or whether there is any
    association between the row and column variables.

20
Hypothesis testing 3
  • Nominal row and column variables
  • Example Given data on the neighborhood in which
    a person lives and his political affiliation, you
    wish to test whether a persons politics
    influences where he/she lives.
  • H0 No association exists between a persons
    political affiliation and the neighborhood in
    which he lives.
  • HA An association exists between a persons
    political affiliation and the neighborhood in
    which he lives.

21
Hypothesis testing 4
  • Nominal row variables and ordinal column
    variables
  • Example Given data studying hours of headache
    pain relief (hours ranging from 0 6) using
    three different treatments placebo, standard,
    and test treatment.
  • H0 No association between hours of pain relief
    and treatment.
  • HA A shift in row mean hours of headache pain
    relief exists between the treatment groups.

22
Hypothesis testing 5
  • Ordinal row and column variables
  • Example Given data assessing how water
    additives (water, standard, super) affect the
    washability of clothes (low, medium, high).
  • H0 No association between the water additive
    and the washability of the clothes.
  • HA There is a linear association between water
    additive and washability of clothes.

23
Hypothesis Testing 6
  • Continuous variables
  • Parametric tests Make assumptions about
    underlying distribution of data.
  • 1-sample t-test H0 Mean of data is equal to
    some fixed value (defined by study question).
  • 2-sample t-test H0 No difference in means
    between the two independent groups.
  • Paired t-test H0 Mean of difference in paired
    data is equal to 0.

24
Hypothesis testing 7
  • Non-parametric tests No assumptions about
    underlying distribution of data.
  • Wilcoxon signed rank test analogous to the
    one-sample or paired t-test.
  • 1-sample H0 Median is equal to specified value
    (defined by study question).
  • Paired H0 Median difference is equal to 0.
  • Wilcoxon rank sum test analogous to the
    two-sample t-test.
  • H0 The distribution of the response variable is
    the same in the two independent groups.

25
Modeling 1 ANOVA
  • 1-way/2-way extends 2-sample t-test (with 1
    factor/2 factors) to n-groups -- compares mean of
    continuous variable across n-groups.
  • H0 No difference in means between the n-groups.
  • HA At least one group has a different mean than
    the other (n-1) groups.
  • Avoids problems with multiple comparisons.
  • Tests whether within-group variability is greater
    than between-group variability.

26
Modeling 2 Linear regression
  • Continuous outcome.
  • Assumes relationship between predictor(s) and
    outcome is linear.
  • Observations assumed to be independent (ie., only
    one observation per subject, no subjects that are
    related to each other, etc.)
  • Number of predictors allowed in the model depends
    on the sample size.
  • Rule of thumb no more than n/10 predictors where
    n of subjects.
  • Include confounders in the model for better
    parameter estimates.
  • Output are parameter estimates
  • Can give information similar to that obtained
    from hypothesis testing.
  • Allows the investigator to make inference based
    on the parameter estimates.

27
Modeling 3Logistic regression
  • Categorical outcome typically binary.
  • Number of predictors in model depends on several
    things
  • Each group has at least 10 subjects.
  • Cell counts in a cross-tab table meet certain
    sample size criteria
  • 80 of expected counts are at least 5.
  • All other expected counts are greater than 2,
    with virtually no 0 counts.
  • Output are parameter estimates and odds ratios
    calculated from these parameter estimates.
  • Odds ratio way of comparing whether the
    probability of a certain event is the same for
    two groups.
  • OR 1 ? The event is equally likely in both
    groups.
  • OR gt 1 ? The event is more likely in the first
    group.
  • OR lt 1 ? The event is less likely in the first
    group.

28
Conclusion
  • Statistical analysis plan should be devised
    before collecting data.
  • Use aims of study to decide the best way to study
    the relationship between variables of interest
    correlations, hypothesis testing, modeling.
  • Make use of the daily biostatistics clinics.
    Refer to http//biostat.mc.vanderbilt.edu for the
    clinic schedule. Click on the Clinics link
    (5th link from the top).
  • Check to see if your department is a part of the
    collaboration plan for more intensive
    biostatistics support.
Write a Comment
User Comments (0)
About PowerShow.com