Introduction%20to%20Biostatistics:%20Data%20Collection.%20Descriptive%20Statistics - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction%20to%20Biostatistics:%20Data%20Collection.%20Descriptive%20Statistics

Description:

In the Internet Era Introduction to Biostatistics Data Collection Descriptive Statistics Thomas Songer, PhD with acknowledgment to several s provided by – PowerPoint PPT presentation

Number of Views:365
Avg rating:3.0/5.0
Slides: 34
Provided by: Thoma402
Learn more at: https://sites.pitt.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction%20to%20Biostatistics:%20Data%20Collection.%20Descriptive%20Statistics


1
Introduction to Research Methods In the Internet
Era
Introduction to Biostatistics
Data Collection Descriptive Statistics
Thomas Songer, PhD with acknowledgment to several
slides provided by M Rahbar and Moataza Mahmoud
Abdel Wahab
2
Key Lecture Concepts
  • Distinguish between different strategies for
    obtaining a sample from a population
  • Distinguishing between different forms of data
    collection
  • Identify key approaches to organize and portray
    your data
  • Understand the measures of central tendency and
    variability in your data

2
3
Descriptive Inferential Statistics
Descriptive Statistics deal with the
enumeration, organization and graphical
representation of data from a sample Inferential
Statistics deal with reaching conclusions from
incomplete information, that is, generalizing
from the specific sample Inferential
statistics use available information in a sample
to draw inferences about the population from
which the sample was selected
Rahbar
4
Epidemiology is
  • The study of disease and its treatment, control,
    and prevention in a population of individuals.
  • Whole populations may be examined, but
  • More frequently, samples of the population may be
    examined. Samples that are studied must be
    representative of the population for the results
    to be generalized to the total population.

Torrence 1997
4
5
Hypothetical Population
Representative? Y N
Sample 1
Representative? Y N
Sample 2
Representative? Y N
Sample 3
5
6
Sampling Approaches
  • Convenience Sampling select the most accessible
    and available subjects in target population.
    Inexpensive, less time consuming, but sample is
    nearly always non-representative of target
    population.
  • Random Sampling (Simple) select subjects at
    random from the target population. Need to
    identify all in target population first.
    Provides representative sample frequently.

6
7
Sampling Approaches
  • Systematic Sampling Identify all in target
    population, and select every xth person as a
    subject.
  • Stratified Sampling Identify important
    sub-groups in your target population. Sample
    from these groups randomly or by convenience.
    Ensures that important sub-groups are included in
    sample. May not be representative.
  • More complex sampling

7
8
Sampling Error
  • The discrepancy between the true population
    parameter and the sample statistic
  • Sampling error likely exists in most studies, but
    can be reduced by using larger sample sizes
  • Sampling error approximates 1 / vn
  • Note that larger sample sizes also require time
    and expense to obtain, and that large sample
    sizes do not eliminate sampling error

8
9
Research Process
Research question
Hypothesis
Identify research design
Data collection
Presentation of data
Data analysis
Interpretation of data
9
Polgar, Thomas
10
Types of Data Collection
  • Surveys/Questionnaires
  • Self-report
  • Interviewer-administered
  • proxy
  • Direct medical examination
  • Direct measurement (e.g. blood draws)
  • Administrative records

10
11
Understanding and Presenting Data
11
12
Types of Data
  1. Categorical (e.g., Sex, Marital Status, income
    category)
  2. Continuous (e.g., Age, income, weight, height,
    time to achieve an outcome)
  3. Discrete (e.g.,Number of Children in a family)
  4. Binary or Dichotomous (e.g., response to all Yes
    or No type of questions)

12
13
Brain Size and IQ
What types of data do these variables represent?
Gender FSIQ VIQ PIQ Weight Height MRI Count
Female 133 132 124 118 64.5 816932
Male 140 150 124 124 72.5 1001121
Male 139 123 150 143 73.3 1038437
Male 133 129 128 172 68.8 965353
Female 137 132 134 147 65 951545
Female 99 90 110 146 69 928799
Female 138 136 131 138 64.5 991305
Female 92 90 98 175 66 854258
Male 89 93 84 134 66.3 904858
Male 133 114 147 172 68.8 955466
Female 132 129 124 118 64.5 833868
13
14
Scale of Data
1. Nominal These data do not represent an
amount or quantity (e.g., Marital Status, Sex)
2. Ordinal These data represent an ordered
series of relationship (e.g., level of
education) 3. Interval These data is measured
on an interval scale having equal units but an
arbitrary zero point. (e.g. Temperature in
Fahrenheit) 4. Interval Ratio Variable such
as weight for which we can compare meaningfully
one weight versus another (say, 100 Kg is twice
50 Kg)
14
15
Organizing Data and Presentation
  • Frequency Table
  • Frequency Histogram
  • Relative Frequency Histogram
  • Frequency polygon
  • Relative Frequency polygon
  • Bar chart
  • Pie chart
  • Box plot

15
16
Frequency Table
  • Generally, the first approach to examining your
    data.
  • Identifies distribution of variables overall
  • Identifies potential outliers
  • Investigate outliers as possible data entry
    errors
  • Investigate a sample of others for data entry
    errors

16
17
Frequency Table
A research study has been conducted examining
the number of children in the families living in
a community. The following data has been
collected based on a random sample of n 30
families from the community. 2, 2, 5, 3, 0, 1,
3, 2, 3, 4, 1, 3, 4, 5, 7, 3, 2, 4, 1, 0, 5, 8,
6, 5, 4 , 2, 4, 4, 7, 6 Organize this data in a
Frequency Table!
17
18
XNo. of Children Count (Frequency) Relative Freq.
0 2 2/300.067
1 3 3/300.100
2 5 5/300.167
3 5 5/300.167
4 6 6/300.200
5 4 4/300.133
6 2 2/300.067
7 2 2/300.067
8 1 1/300.033
18
19
Frequency Table
Now, construct a similar frequency table for the
age of patients with Heart related problems in a
clinic. The following data has been collected
based on a random sample of n 30 patients who
went to the emergency room of the clinic for
Heart related problems. The measurements are
42, 38, 51, 53, 40, 68, 62, 36, 32, 45, 51, 67,
53, 59, 47, 63, 52, 64, 61, 43, 56, 58, 66, 54,
56, 52, 40, 55, 72, 69.
19
20
Age Groups Frequency Relative Frequency
32 -36 yr 2 2/300.067
37- 41 yr 3 3/300.100
42-46 yr 4 4/300.134
47-51 yr 3 3/300.100
52-56 yr 8 8/300.267
57-61 yr 3 3/300.100
62-66 yr 4 4/300.134
67-72 yr 3 3/300.100
Total n30
20
21
Frequency Polygon
  • Use to identify the distribution of your data

21
22
Table 1 in a paper
Describe your study population in a frequency
table
Table Title
Name of variable (Units of variable) Frequency (n) Mean (SD)
- - Categories -
Total
22
23
Measures of Central Tendency
Where is the heart of distribution? 1. Mean
2. Median 3. Mode
23
24
Sample Mean
The arithmetic mean (or, simply, mean) is
computed by summing all the observations in the
sample and dividing the sum by the number of
observations. For a sample of five household
incomes, 6000, 10,000, 10,000, 14000, 50,000 the
sample mean is,
24
25
Median
In a list ranked from smallest measurement to the
highest, the median is the middle value In our
example of five household incomes, first we rank
the measurements   6,000 10,000 10,000
14,000 50,000 Sample Median is 10,000
25
26
Mode
  • In nominal data
  • The value which occurs with the greatest frequency

26
27
Measures of non-central locations
  • Quartiles
  • Quintiles
  • Percentiles

27
28
Measures of Dispersion or Variability
  • Range (present highest and lowest value in a
    distribution. The difference between these
    values is the range)
  • Variance
  • Standard deviation (the square root of the
    variance)

28
29
Sample Variance
S standard deviation (square root of
variance)
29
30
Calculation of Variance and Standard deviation
30
31
Mean and Standard deviation (SD)
7 8 7 7 7 6
7 7 7 7 7 7
3 2 7 8 13 9
Mean 7 SD0.63
Mean 7 SD0
Mean 7 SD4.04
31
32
Empirical Rule
  • For a Normal distribution approximately,
  • a) 68 of the measurements fall within one
    standard deviation around the mean
  • b) 95 of the measurements fall within two
    standard deviations around the mean
  • c) 99.7 of the measurements fall within three
    standard deviations around the mean

32
33
Suppose the reaction time of a particular drug
has a Normal distribution with a mean of 10
minutes and a standard deviation of 2 minutes
  • Approximately,
  • a) 68 of the subjects taking the drug will have
    reaction time between 8 and 12 minutes
  • b) 95 of the subjects taking the drug will have
    reaction tome between 6 and 14 minutes
  • c) 99.7 of the subjects taking the drug will
    have reaction tome between 4 and 16 minutes

33
Write a Comment
User Comments (0)
About PowerShow.com