Biostatistics Academic Preview - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Biostatistics Academic Preview

Description:

Blood pressure, height, weight. Changing across a population. gender, race/ethnicity ... weight, height. 11. Categorical variables ... (Contingency Tables) ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 53
Provided by: university57
Category:

less

Transcript and Presenter's Notes

Title: Biostatistics Academic Preview


1

Biostatistics Academic Preview Session 2
Descriptive Statistics
2
Outline
  • Descriptive statistics
  • The what and why of descriptive statistics
  • Types of variables
  • Formulas and interpretations of commonly used
    descriptive statistics
  • Pictorial representations of descriptive
    statistics
  • Examining the relationship between two or more
    variables

3
Descriptive Statistics
  • Used to describe the basic features of the data
    in the study
  • Types of variables
  • Summary statistics
  • Distribution of variables
  • Pictorial representation
  • Allows you to get a feel for the data

4
Purpose of Descriptive Statistics
  • Characterize subjects in a study
  • Sample size
  • Patterns of sampling
  • Summary measures
  • Distribution
  • Finding errors in data collection or data entry
  • Impossible, improbable, or inappropriate values
  • Values too high or too low
  • Outliers
  • Strange combinations
  • Missing data
  • Response rates

5
Purpose (cont)
  • Validity of assumptions
  • Distribution
  • outliers
  • Equal variance
  • Linearity
  • Hypothesis generating
  • Exploring unanticipated effects
  • Difference in effects across subgroups
  • Characterization of dose response
  • Linear
  • exponential

6
Types of Descriptive statistics
  • Univariate
  • Describing one variable
  • Bivariate
  • Describing two variables simultaneously
  • Trivariate
  • Describing three variables simultaneously

7
Types of variables
8
Definitions
  • Variable a characteristic that changes or varies
    over time and/or different subjects under
    consideration.
  • Changing over time
  • Blood pressure, height, weight
  • Changing across a population
  • gender, race/ethnicity

9
Definitions (cont)
  • Quantitative variables (numeric) measure a
    numerical quantity of amount on each experimental
    unit
  • Qualitative variables (categorical) measure a
    non numeric quality or characteristic on each
    experimental unity by classifying each subject
    into a category

10
Quantitative variables
  • Discrete variables can only take values from a
    list of possible values
  • Number of co-morbidities
  • Continuous variables can assume the infinitely
    many values corresponding to the points on a line
    interval
  • weight, height

11
Categorical variables
  • Nominal unordered categories
  • Race/ethnicity
  • Gender
  • Ordinal ordered categories
  • likert scales( disagree, neutral, agree )
  • Income categories

12
Univariate statistics(numerical variables)
  • Summary measures
  • Measures of location
  • Measures of spread
  • Overall pattern (distribution)
  • Unimodal (one major peak) vs. bimodal) (2 peaks)
  • Symmetric vs. skewed
  • Outliers-an individual value that falls outside
    the overall pattern

13
Summary Statistics Measures of central tendency
(location)
  • Mean The mean of a data set is the sum of the
    observations divided by the number of observation
  • Population mean Sample
    mean
  • Median The median of a data set is the middle
    value
  • For an odd number of observations, the median is
    the observation exactly in the middle of the
    ordered list
  • For an even number of observation, the median is
    the mean of the two middle observation is the
    ordered list
  • Mode The mode is the single most frequently
    occurring data value

14
Skewness
  • The skewness of a distribution is measured by
    comparing the relative positions of the mean,
    median and mode.
  • Distribution is symmetrical
  • Mean Median Mode
  • Distribution skewed right
  • Median lies between mode and mean, and mode is
    less than mean
  • Distribution skewed left
  • Median lies between mode and mean, and mode is
    greater than mean

15
Relative positions of the mean and median for (a)
right-skewed, (b) symmetric, and(c) left-skewed
distributions
Note The mean assumes that the data is normally
distributed. If this is not the case it is better
to report the median as the measure of location.
16
Summary statisticsMeasures of spread (scale)
  • Variance The average of the squared deviations
    of each sample value from the sample mean, except
    that instead of dividing the sum of the squared
    deviations by the sample size N, the sum is
    divided by N-1.
  • Standard deviation The square root of the sample
    variance
  • Range the difference between the maximum and
    minimum values in the sample.

17
Normal curvessame mean but different standard
deviation
18
Summary statistics measures of spread (scale)
  • We can describe the spread of a distribution by
    using percentiles.
  • The pth percentile of a distribution is the value
    such that p percent of the observations fall at
    or below it.
  • Median50th percentile
  • Quartiles divide data into four equal parts.
  • First quartileQ1
  • 25 of observations are below Q1 and 75 above Q1
  • Second quartileQ2
  • 50 of observations are below Q2 and 50 above Q2
  • Third quartileQ3
  • 75 of observations are below Q3 and 25 above Q3

19
Quartiles
20
Five number system
  • Maximum
  • Minimum
  • Median50th percentile
  • Lower quartile Q150th percentile
  • Upper quartile Q375th percentile

21
Graphical display of numerical variables(histogra
m)
Class Interval Frequency 20-under 30 6 30-under
40 18 40-under 50 11 50-under 60 11 60-under
70 3 70-under 80 1
22
Graphical display of numerical variables(stem
and leaf plot)
Stem
Leaf
Raw Data
2 3 4 5 6 7 8 9
3 9 7 9 5 6 9 0 7 7 8 8 0 2 4 5 5 6 7 7 8 9 1 1
2 3 3 6 8 9 1 1 2 4 7
23
Graphical display of numerical variables(box
plot)
Median
24
Graphical display of numerical variables(box
plot)
25
Univariate statistics(categorical variables)
  • Summary measures
  • Countfrequency
  • Percentfrequency/total sample
  • The distribution of a categorical variable lists
    the categories and gives either a count or a
    percent of individuals who fall in each category

26
Displaying categorical variables
27
(No Transcript)
28
Bivariate relationships
  • An extension of univariate descriptive statistics
  • Used to detect evidence of association in the
    sample
  • Two variables are said to be associated if the
    distribution of one variable differs across
    groups or values defined by the other variable

29
Bivariate Relationships
  • Two quantitative variables
  • Scatter plot
  • Side by side stem and leaf plots
  • Two qualitative variables
  • Tables
  • Bar charts
  • One quantitative and one qualitative variable
  • Side by side box plots
  • Bar chart

30
Response and explanatory variables
  • Response variable the variable which we intend
    to model.
  • we intend to explain through statistical modeling
  • Explanatory variable the variable or variables
    which may be used to model the response variable
  • values may be related to the response variable

31
Two quantitative variablesCorrelation
A relationship between two variables.
Explanatory (Independent)Variable
Response (Dependent)Variable
y
x
Hours of Training
Number of Accidents
Shoe Size
Height
Cigarettes smoked per day
Lung Capacity
Score on SAT
Grade Point Average
Height
IQ
What type of relationship exists between the two
variables and is the correlation significant?
32
Scatter Plots and Types of Correlation
x hours of training y number of accidents
Accidents
Negative Correlation as x increases, y decreases
33
Scatter Plots and Types of Correlation
x SAT score y GPA
GPA
Positive Correlation as x increases y increases
34
Scatter Plots and Types of Correlation
x height y IQ
IQ
No linear correlation
35
Correlation Coefficient
A measure of the strength and direction of a
linear relationship between two variables
The range of r is from -1 to 1.
If r is close to 1 there is a strong positive
correlation
If r is close to -1 there is a strong negative
correlation
If r is close to 0 there is no linear correlation
36
Positive and negative correlation
  • 1 If two variables x and y are positively
    correlated this means that
  • large values of x are associated with large
    values of y, and
  • small values of x are associated with small
    values of y
  • 2 If two variables x and y are negatively
    correlated this means that
  • large values of x are associated with small
    values of y, and
  • small values of x are associated with large
    values of y

37
Positive correlation
38
Negative correlation
39
Two qualitative variables(Contingency Tables)
  • Categorical data is usually displayed using a
    contingency table, which shows the frequency of
    each combination of categories observed in the
    data value
  • The rows correspond to the categories of the
    explanatory variable
  • The columns correspond the categories of the
    response variable

40
Example
  • Aspirin and Heart Attacks
  • Explanatory variabledrug received
  • placebo
  • Aspirin
  • Response variableheart attach status
  • yes
  • no

41
Contingency table heart attack example
42
Two qualitative variables
Marijuana Use in College xparental use,
ystudent use
43
Case Study 1Mean birth weight by race
44
One quantitative, One qualitative
Box plot of age by low birth weight
Mean age by low birth weight
low birth weight
45
Case Study 1Birth weight and age
r.09
46
Trivariate Relationships
  • An extension of bivariate descriptive statistics
  • We focus on description that helps us decide
    about the role variables might play in the
    ultimate statistical analyses
  • Identify variables that can increase the
    precision of the data analysis used to answer
    associations between two other variables

47
Confounding and effect modification
  • A factor, Z, is said to confound a relationship
    between a risk factor, X, and an outcome, Y, if
    it is not an effect modifier and the unadjusted
    strength of the relationship between X and Y
    differs from the common strength of the
    relationship between X and Y for each level of Z.
  • A factor, Z, is said to be an effect modifier of
    a relationship between a risk factor, X, and an
    outcome measure, Y, if the strength of the
    relationship between the risk factor, X, and the
    outcome, Y, varies among the levels of Z.

48
Example confounding
  • In our low birth weight data suppose we wish to
    investigate the association between race and low
    birth weight.
  • Our ability to detect this association might be
    affected by
  • Smoking status being associated with low birth
    weight
  • Smoking status being associated with race

49
Case study 1Race and smoking status
50
Case Study 1Race, smoking status, LBW
smokers
Non-smokers
51
Multivariate Statistics
  • Allows one to calculated the association between
    and response and outcome of interest, after
    controlling for potential confounders.
  • Allows for one to assess the association between
    an outcome and multiple response variables of
    interest.

Statistical Models
52
Next Session
  • The what and why of statistical inference
  • Statististical estimation and confidence
    intervals
  • Statistical significance tests
Write a Comment
User Comments (0)
About PowerShow.com