Lecture 1: Introduction - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Lecture 1: Introduction

Description:

Important things to know about Boston. Everyone LOVES the Red Sox. Very bad drivers ... example, the fact that eating red meat is associated with heart disease ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 31
Provided by: bhea
Category:

less

Transcript and Presenter's Notes

Title: Lecture 1: Introduction


1
Lecture 1 Introduction
  • Summer Program
  • Brian Healy

2
Welcome!!!
  • Important things to know about Boston
  • Everyone LOVES the Red Sox
  • Very bad drivers
  • Easy to get lost
  • Good variety of restaurants, but pizza and bagels
    are bad
  • Important things to know about Harvard
  • Attractive part of campus is in Cambridge
  • Food in cafeteria quite good
  • Great speakers

3
Who am I?
  • Brian Healy
  • bhealy_at_hsph.harvard.edu
  • Office 412-B (Dave and I share an office)
  • Brighton, MA near Cleveland Circle, but am moving
    to Salem, MA
  • 5th year graduate student, planning to graduate
    in Oct/Nov
  • Just married!!
  • No formal office hour, but I am always available
    to answer questions

4
What is this class?
  • Designed to help students learn
  • Basics of methods including
  • Descriptive statistics
  • Confidence intervals
  • Hypothesis testing
  • Regression
  • Basics of computing
  • If you feel like you know a topic, you are not
    required to attend
  • Short HW every 3 classes

5
Syllabus
  • Basic outline, but may change depending on how
    quickly we go through topics
  • The purpose of the class is to get everyone on
    equal footing at the start of the semester. Any
    and all questions are encouraged. It is never a
    problem to go over a topic a second or third time.

6
Computing background
  • Coming into this program I had very limited
    computing background and I have taught myself the
    basics of programming.
  • Important to practice
  • Google and the internet are great resources
  • Other languages

7
General information
  • Good book to practice with Rosner, Fundamentals
    of Biostatistics
  • For help with R, there are great web resources.
    Books about S-plus are more plentiful and
    information very applicable to R
  • If you have any feedback on the course, please
    tell me so that I can make the course as useful
    as possible for all. The only caveat is that I
    will not be able to go too fast because this is
    mostly to ensure everyone has a certain
    background, not as enrichment for people with
    lots of stats

8
Question
  • What appeals to you about biostatistics?
  • Is there any particular area that you find most
    interesting?

9
What is the difference between statistics and
biostatistics?
  • My opinion Statistics is more driven by
    interesting statistical questions, while
    biostatistics is more driven by interesting real
    data analysis problems.
  • There is a lot of overlap, but the spectrum looks
    like

Epidemiology
Biostatistics
Statistics
Mathematics
10
Topics for biostatisticians
  • Applications driven methods
  • Survival analysis
  • Multiple testing methods (genetics)
  • Causal inference
  • Risk / Decision Science
  • Computational biology / Bioinformatics

11
What are biostatisticians trying to do?
  • We want to answer questions about the population
    by using a sample
  • In more technical language, we want to make
    inferences about a population characteristic from
    a sample

12
Clinical study
  • Study design
  • Hypothesis formulation- Does the treatment have
    an effect?
  • Sample selection- Who are we going to study?
  • Collection of data
  • Mid-study markers- Should we stop the study
    early?
  • Analysis of data
  • Results- Was there any effect?
  • Conclusions- What does this all mean?
  • Generalizability

13
What does the statistician do?
  • Study design
  • Hypothesis formulation- Define outcome and
    exposure
  • Sample selection- Sample size, type of sampling
  • Collection of data
  • Mid-study markers- Toxicity or effect
  • Analysis of data
  • Results- Estimation and confidence intervals
  • Conclusion- Significance of effect or association
  • Causality vs. association

14
Hypothesis formulation
  • The most important step of any problem, but the
    one that gets the least attention in statistics
    courses.
  • What do we want to know?
  • The more specific the better because statistics
    is designed to answer very specific questions.
  • Ex. Is Treatment A better than Treatment B?
  • Does Treatment A lead to an larger decrease in
    the size of a tumor than Treatment B?
  • We will discuss hypothesis formulation in every
    class

15
Sample selection
  • Exposure and control group (Lecture 4)
  • What group of people are we most interested in?
  • Can we generalize our conclusions to other cancer
    patients? all men? all people?
  • Sample size (Lecture 9)
  • How many patients are needed to prove what we
    want to show?

16
Mid-study markers
  • May use many of the same techniques we will
    discuss in this course to determine how the
    treatment and placebo group are performing
  • Most commonly used in long clinical trials and
    these are written into the protocol
  • Ex. For HIV trials, it is unethical to keep
    patients on a placebo treatment once it can be
    shown that the treatment is working, even if the
    study has not finished.

17
Results
  • The main results from a study are provided in
    graphs and tables (Lecture 2)
  • Every public health paper has a first table which
    describes the demographics of the study
    population
  • Summary statistics and confidence intervals for
    the effect measures (Lecture 6-7)
  • What is the best analysis for this data?

18
Significance of results
  • Descriptive studies / Comparative studies
  • In the latter, we want to know if there is a
    difference between groups, especially based on an
    exposure of interest. To do this, we use
    hypothesis testing (Lecture 6). The specific type
    of test depends on the question of interest.
  • Have the assumptions of the analysis been met?

19
Generalizability
  • Assume that we have found a difference between
    our exposure and control group, what does this
    mean for the general population? Specifically, to
    which group of people can we apply our results?
  • This is often based on how the sample was
    originally collected.
  • Ex. If exposed group were school children living
    near power lines and the control group were
    school children living elsewhere, can we
    generalize the findings to all children? adults?

20
Example
  • We are interested in determining if the men in
    Boston are shorter than average for the rest of
    the United States (it sure seems that way
    according to all of my female friends and wife)
  • How can we determine this?

21
Study
  • Hypothesis
  • Sample
  • Mid study marker
  • Results
  • Significance of results

22
Aside
  • Throughout this course, we will go through many
    examples from consulting I have done during my
    time here. I think working with physicians and
    other researchers is extreme valuable because you
    can apply the concepts you learn. Often the
    statistical techniques appear very easy to use
    because the question of interest and the data
    have been given to you in a form that are easy to
    deal with, but this is not how you get the data
    in practice.
  • If there is any medical topic or research area
    that you find interesting, I will try to include
    this in some of my examples.

23
Association vs Causation
  • Association
  • Definition two factors are related, ex. height
    and weight
  • There is no implied cause and effect relationship
  • Causation
  • Definition one factor causes a second factor,
    either directly or indirectly, ex. laying in the
    sun causes sun burn
  • The causal factor is required for the outcome to
    occur

24
  • It is much easier to demonstrate an association,
    but for public health, it is much more
    interesting to demonstrate causation
  • For example, the fact that eating red meat is
    associated with heart disease may show an
    important group to direct interventions towards,
    but we do not know if there is no causal
    relationship. Therefore, it is not clear if
    changing this behavior would change heart disease
    status.

25
  • The majority of methods we describe in this
    course are used to demonstrate an association
    between two factors because this is much easier
    to show.
  • The important thing to ask yourself when you read
    a study or complete an analysis
  • What do these results mean?
  • Are the two factors associated or causally
    related?
  • What action can be taken based on this?
  • There has been significant research into causal
    inference (some of the best in the world happens
    here).

26
Counterfactual
  • One common way to look at a causal effect is to
    think about counterfactuals
  • Def What would the outcome be if, contrary to
    fact, the exposure was different?
  • Ex. We know that a patient who ate red meat got
    heart disease. What would have happened if the
    patient lived exactly the same life, except
    he/she did not eat red meat?
  • The causal effect is the difference in the
    counterfactual outcomes, CE Yred meat Yno red
    meat

27
  • The causal effect is the thing we are trying to
    find because this shows which exposures affect
    (directly or indirectly) the outcome
  • Of course, both counterfactual outcomes cannot be
    observed for each patient, so we have to find
    clever ways to estimate this effect. The type of
    study we have will determine how to estimate this
    causal effect
  • When we cannot estimate the causal effect, we can
    describe the association between exposure and
    outcome

28
Difference between this course and the
probability course
  • This course
  • Focus on analysis of data without too much theory
  • Applications
  • Better jokes
  • Probability course
  • Focus on theoretical background for most of the
    data analysis techniques we describe
  • Theory

29
Inference vs Methods
  • This distinction between data analysis and theory
    holds for the academic year as well
  • I think the relationship between these courses
    can sometimes missed in all of the details during
    the semester, but Dave and I will try to make
    connections where possible this summer to get you
    in the mind set that these are separate, but very
    related areas

30
Computing
  • Computing resources, SAS, R, etc., have made
    completing analyses to determine the significance
    of results and confidence intervals much simpler.
  • Although software allows analyses to be complete
    faster, they do not imply that the correct
    analyses have been done. This is why framing the
    problem is the most important step of the
    analysis.
  • R- focus of the courses computing (Lecture 2)
  • SAS- secondary package (Lecture 10)
Write a Comment
User Comments (0)
About PowerShow.com