Biostat 200 Introduction to Biostatistics - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Biostat 200 Introduction to Biostatistics

Description:

Biostat 200 Introduction to Biostatistics Numerical variable summaries Sample variance Amount of spread around the mean, calculated in a sample by Sample standard ... – PowerPoint PPT presentation

Number of Views:262
Avg rating:3.0/5.0
Slides: 38
Provided by: Judy111
Category:

less

Transcript and Presenter's Notes

Title: Biostat 200 Introduction to Biostatistics


1
Biostat 200Introduction to Biostatistics
2
Lecture 1
3
Course instructors
  • Judy Hahn, M.A., Ph.D.
  • Judy.hahn_at_ucsf.edu
  • (415) 206-4435
  • TAs
  • Michelle Odden, Ph.D., M.S.
  • Megumi Okumura, M.D.
  • Maya Vijayaraghavan, M.D.
  • Robin Wallace. M.D.

4
The details
  • Lectures Tuesdays 1030-1230
  • Labs Thursday 1030-12
  • Lab 1 Room CB 6702
  • Lab 2 Room CB 6704
  • Office hrs Thursday 12-1 Room CB 5715
  • Course credits 3

5
The details
  • Readings
  • Required readings will be from Principles of
    Biostatistics by M. Pagano and K. Gauvreau.
    Duxbury. 2nd edition.
  • Please read the assigned chapters before lecture,
    and review them after lecture

6
The details
  • Assignments will be posted on Thursdays with due
    dates Sunday at 5 p.m. 1.5 weeks later
  • Data collection (Assignment 1 only)
  • Data analysis and interpretation
  • Exercises in the book
  • Reading and interpretation of scientific
    publications
  • You must attend Lab 1 to receive assignment 1

7
The details
  • Grading
  • Homework (75)
  • 5 Assignments
  • Varying in length each homework problem is worth
    (usually 10) points toward final homework score
  • Final exam (25)
  • LATE ASSIGNMENTS WILL NOT BE ACCEPTED!!!

8
Assigments
  • Send to your TAs
  • Lab 1 Megan Okumura, Robin Wallace
  • ticr.biostat200.1_at_gmail.com
  • Lab 2 Michelle Odden, Maya Vijayaraghavan
  • ticr.biostat200.2_at_gmail.com

9
What I do and why
10
Course goals
  • Familiarity with basic biostatistics terms and
    nomenclature
  • Ability to summarize data and do basic
    statistical analyses using STATA
  • Ability to understand basis statistical analyses
    in published journals
  • Understanding of key concepts including
    statistical hypothesis testing critical
    quantitative thinking
  • Foundation for more advance analyses

11
Todays topics
  • Variables- numerical versus categorical
  • Tables (frequencies)
  • Graphs (histograms, box plots, scatter plots,
    line graphs)
  • Required reading Pagano Chapter 2

12
Types of data
  • Data are made up of a set of variables
  • Categorical variables any variable that is not
    numerical (values have no numerical meaning)
    (e.g. gender, race, drug, disease status)
  • Nominal variables
  • Ordinal variables

Pagano and Gauvreau, Chapter 2
13
Types of data
  • Categorical variables
  • Nominal variables
  • The data are unordered (e.g. RACE 1Caucasian,
    2Asian American, 3African American)
  • A subset of these variables are Binary or
    dichotomous variables have only two categories
    (e.g. GENDER 1male, 2female)
  • Ordinal variables
  • The data are ordered (e.g. AGE 110-19 years,
    220-29 years, 330-39 years likelihood of
    participating in a vaccine trial)

Pagano and Gauvreau, Chapter 2
14
Types of data
  • Numerical (quantitative) variables naturally
    measured as numbers for which meaningful
    arithmetic operations make sense (e.g. height,
    weight, age, salary, viral load, CD4 cell counts)
  • Discrete variables can be counted (e.g. number
    of children in household 0, 1, 2, 3, etc.)
  • Continuous variables can take any value within a
    given range (e.g. weight 2974.5 g, 3012.6 g)

Pagano and Gauvreau, Chapter 2
15
Types of data
  • Manipulation of variables
  • Continuous variables can be discretized
  • E.g., age can be rounded to whole numbers
  • Continuous or discrete variables can be
    categorized
  • E.g., age categories
  • Categorical variables can be re-categorized
  • E.g., lumping from 5 categories down to 2

Pagano and Gauvreau, Chapter 2
16
Frequency tables
  • Categorical variables are summarized by
  • Frequency counts how many are in each category
  • Relative frequency or percent (a number from 0 to
    100)
  • Or proportion (a number from 0 to 1)

Gender of new HIV clinic patients, 2006-2007, Mbarara, Uganda. Gender of new HIV clinic patients, 2006-2007, Mbarara, Uganda.
n ()
Male 415 (39)
Female 645 (61)
Total 1060 (100)
Pagano and Gauvreau, Chapter 2
17
Frequency tables
  • Continuous variables can categorized in
    meaningful ways
  • Choice of cutpoints
  • Even intervals
  • Meaningful cutpoints related to a health outcome
    or decision
  • Equal percentage of the data falling into each
    category

Pagano and Gauvreau, Chapter 2
18
Frequency tables
CD4 cell counts (mm3) of newly diagnosed HIV positives at Mulago Hospital, Kampala (N268) CD4 cell counts (mm3) of newly diagnosed HIV positives at Mulago Hospital, Kampala (N268)
n ()
50 40 (14.9)
50-200 72 (26.9)
201-350 58 (21.6)
350 98 (36.6)
Pagano and Gauvreau, Chapter 2
19
Bar charts
  • General graph for categorical variables
  • Graphical equivalent of a frequency table
  • The x-axis does not have to be numerical

Pagano and Gauvreau, Chapter 2
20
Histograms
  • Bar chart for numerical data The number of bins
    and the bin width will make a difference in the
    appearance of this plot and may affect
    interpretation

histogram cd4count, fcolor(blue) lcolor(black)
width(50) name(cd4_by50) title(CD4 among new HIV
positives at Mulago) xtitle(CD4 cell count)
percent
Pagano and Gauvreau, Chapter 2
21
Histograms
  • This histogram has less detail but gives us the
    of persons with CD4 lt350 cells/mm3

histogram cd4count, fcolor(blue) lcolor(black)
width(350) name(cd4_by350) title(CD4 among new
HIV positives at Mulago) xtitle(CD4 cell count)
percent
Pagano and Gauvreau, Chapter 2
22
  • What does this graph tell us?

23
Box plots
  • Middle linemedian (50th percentile)
  • Middle box25th to 75th percentiles
    (interquartile range)
  • Bottom whisker Data point at or above 25th
    percentile 1.5IQR
  • Top whisker Data point at or below 75th
    percentile 1.5IQR

Pagano and Gauvreau, Chapter 2
24
Box plots
graph box cd4count, box(1, fcolor(blue)
lcolor(black) fintensity(inten100)) title(CD4
count among new HIV positives at Mulago)
Pagano and Gauvreau, Chapter 2
25
Box plots by another variable
  • We can divide up our graphs by another variable
  • What type of variable is gender?

26
Histograms by another variable
27
Numerical variable summaries
  • Mode the value (or range of values) that occurs
    most frequently
  • Sometimes there is more than one mode, e.g. a
    bi-modal distribution (both modes do not have to
    be the same height)
  • The mode only makes sense when the values are
    discrete, rounded off, or binned

Pagano and Gauvreau, Chapter 3
28
Scatter plots
Pagano and Gauvreau, Chapter 2
29
The importance of good graphs
http//niemann.blogs.nytimes.com/2009/09/14/good-n
ight-and-tough-luck/
30
Numerical variable summaries
  • Measures of central tendency where is the
    center of the data?
  • Median the 50th percentile the middle value
  • If n is odd the median is the (n1)/2
    observations (e.g. if n31 then median is the
    16th highest observation)
  • If n is even the median is the average of the
    two middle observations (e.g. if n30 then the
    median is the average of the 15th and16th
    observation
  • Median CD4 cell count in previous data set 234.5

Pagano and Gauvreau, Chapter 3
31
Numerical variable summaries
  • Range
  • Minimum to maximum or difference (e.g. age range
    15-58 or range43)
  • CD4 cell count range (0-1368)
  • Interquartile range (IQR)
  • 25th and 75th percentiles (e.g. IQR for age
    23-36) or difference (e.g. 13)
  • Less sensitive to extreme values
  • CD4 cell count IQR (92-422)

Pagano and Gauvreau, Chapter 3
32
Numerical variable summaries
  • Measures of central tendency where is the
    center of the data?
  • Mean arithmetic average
  • Means are sensitive to very large or small values
  • Mean CD4 cell count 296.9
  • Mean age 32.5

Pagano and Gauvreau, Chapter 3
33
Interpreting the formula
  • ? is the symbol for the sum of the elements
    immediately to the right of the symbol
  • These elements are indexed (i.e. subscripted)
    with the letter i
  • The index letter could be any letter, though i is
    commonly used)
  • The elements are lined up in a list, and the
    first one in the list is denoted as x1 , the
    second one is x2 , the third one is x3 and the
    last one is xn .
  • n is the number of elements in the list.

Pagano and Gauvreau, Chapter 3
34
Numerical variable summaries
  • Sample variance
  • Amount of spread around the mean, calculated in a
    sample by
  • Sample standard deviation (SD) is the square root
    of the variance
  • The standard deviation has the same units as the
    mean
  • SD of CD4 cell count 255.4
  • SD of Age 11.2

Pagano and Gauvreau, Chapter 3
35
Numerical variable summaries
  • Coefficient of variation
  • For the same relative spread around a mean, the
    variance will be larger for a larger mean
  • Can use to compare variability across
    measurements that are on a different scale (e.g.
    IQ and head circumference)
  • CV for CD4 cell count 86.0
  • CV for age 34.5

Pagano and Gauvreau, Chapter 3
36
Pocket/wallet change
  • Histogram , boxplot
  • Mode, Median, 25th percentile, 75th percentile
  • Mean, SD
  • Differ by gender?

37
For next time
  • Read Pagano and Gauvreau
  • Chapters 1-3 (Review of todays material)
  • Chapter 6
Write a Comment
User Comments (0)
About PowerShow.com