Biostatistics course Part 3 Data, summary and presentation - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Biostatistics course Part 3 Data, summary and presentation

Description:

Medical Doctor by University Autonomous of Guadalajara. ... first digit of category to the left of the line (tree) and the second digit to ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 54
Provided by: Universida129
Category:

less

Transcript and Presenter's Notes

Title: Biostatistics course Part 3 Data, summary and presentation


1
Biostatistics coursePart 3Data, summary and
presentation
  • Dr. en C. Nicolás Padilla Raygoza
  • Facultad de Enfermería y Obstetricia de Celaya
  • Universidad de Guanajuato México

2
Biosketch
  • Medical Doctor by University Autonomous of
    Guadalajara.
  • Pediatrician by the Mexican Council of
    Certification on Pediatrics.
  • Postgraduate Diploma on Epidemiology, London
    School of Hygine and Tropical Medicine,
    University of London.
  • Master Sciences with aim in Epidemiology,
    Atlantic International University.
  • Doctorate Sciences with aim in Epidemiology,
    Atlantic International University.
  • Associated Professor B, School of Nursing and
    Obstetrics of Celaya, university of Guanajuato.
  • padillawarm_at_gmail.com

3
Competencies
  • The reader will describe type of variables.
  • He (she) will analyze how summary shows the
    different variables
  • He (she) will calculate central trend measures
    and find them in graphics.
  • He (she) will calculate dispersion measures and
    find them in graphics.

4
Definitions
  • Data are collected on the specific
    characteristics of each subject, and groups are
    formed to be compared.
  • These characteristics are called variables,
    because they can change from each subject.
  • Variable is obtained because it is
  • A result of interest - dependent variable
  • Or it explain the dependent variable - risk
    factor - independent variable.

5
Type of data
  • Classification for its measurement scale
  • Qualititative
  • Binary - dichotomous
  • Ordinal
  • Nominal
  • Quantitative
  • Discrete
  • Continuous

6
Type of data - Examples
  • Qualitative
  • Dichotomous - binary
  • Gender male or female.
  • Employment status employment or without
    employment.
  • Ordinal
  • Socioeconomic level high, medium, low.
  • Nominal
  • Residency place center, North, South, East,
    West.
  • Civil status single, married, widowed, divorced,
    free union.
  • Quantitative
  • Discrete
  • Number of offspring 1,2,3,4.
  • Continuous
  • Glucose in blood level 110 mg/dl, 145 mg/dl.

7
Data summary
  • Generally, we want to show the data in a summary
    form.
  • Number of times that an event occur, is of our
    interest, it show us the variable distribution.
  • We can generate a frequency list quantitative or
    qualitative.

8
Summary of categorical data
  • We can obtain frequencies of categorical data and
    summary them in a table or graphic.
  • Example we have 21 agents of parasitic diseases
    isolated from children.

Giardia lamblia Entamoeba histolytica Ascaris
lumbricoides Enterobius vermicularis Ascaris
lumbricoides Enterobius vermicularis Giardia
lamblia
Giardia lamblia Entamoeba histolytica Ascaris
lumbricoides Enterobius vermicularis Ascaris
lumbricoides Enterobius vermicularis Giardia
lamblia
Giardia lamblia Entamoeba histolytica Ascaris
lumbricoides Enterobius vermicularis Ascaris
lumbricoides Enterobius vermicularis Giardia
lamblia
9
Summary of categorical data
  • List of parasites detected show us an idea of the
    frequency of each parasite, but that is not
    clear.
  • If we ordered them, the idea is more clear.

Giardia lamblia Giardia lamblia Giardia
lamblia Giardia lamblia Giardia lamblia Giardia
lamblia Ascaris lumbricoides
Ascaris lumbricoides Ascaris lumbricoides Ascaris
lumbricoides Ascaris lumbricoides Ascaris
lumbricoides Enterobius vermicularis Enterobius
vermicularis
Enterobius vermicularis Enterobius
vermicularis Enterobius vermicularis Enterobius
vermicularis Entamoeba histolytica Entamoeba
histolytica Entamoeba histolytica
10
Summary of categorical data
  • We can show the results in a frequency
    distribution.

Frequency distribution of intestinal parasites
detected in children from CAISES Celaya, n21
Source Laboratory report
11
Summary of categorical data
  • It is useful to show the frequency of each
    category, expressed as percentage of the total
    frequency.
  • It is called distribution of relative frequencies.

Frequency distribution of intestinal parasites
detected in children from CAISES Celaya, n21
Source Laboratory report
12
Summary of categorical data
  • Sometimes, the number of categories is high and
    should diminish the number of categories.

Distribution by death cause in Celaya, Gto,
during 2007
Source Certification of deaths
13
Frequency distributions for quantitative data
  • With quantitative data, we need group the data,
    before of show it in a frequencies or relative
    frequencies table.

Distribution of frequencies in students of FEOC
that have smoked at least once. n534
Source Health survey
14
Frequency distributions for quantitative data
  • With quantitative data, it is useful calculate
    cumulative frequency.

Distribution of frequencies in students of FEOC
that have smoked at least once. n534
Source Health survey
15
Distributions of frequencies for grouped
quantitative data.
  • Frequently, there are many categories with
    quantitative data, and we have to calculate
    intervals for each category.

Distribution of frequencies of ages of children
with acute streptoccocal pharyngotonsillitis
Source Padilla N, Moreno M. Comparison between
clarithromycin, azithromycin and propicillin in
the management of acute streptococcal
pharyngotonsillitis in children. Archivos de
Investigación Pediátrica de México 2005 85-11.
(In Spanish)
16
Distribución de frecuencias para datos
cuantitativos agrupados
Distribution of frequencies of ages of children
with acute streptoccocal pharyngotonsillitis
Source Padilla N, Moreno M. Comparison between
clarithromycin, azithromycin and propicillin in
the management of acute streptococcal
pharyngotonsillitis in children. Archivos de
Investigación Pediátrica de México 2005 85-11.
(In Spanish)
17
To group data
  • Guide
  • To obtain minimum and maximum values and decide
    the number of intervals.
  • Number of intervals between 5 and 15.
  • To assure interval limits.
  • To assure that width of intervals been the same.
  • To avoid that first or last interval been open.

18
Charts
  • Categorical data
  • Bar chart
  • Gráfica de pastel
  • Quantitative data
  • Histogram
  • Polygon of frequencies

19
Bar chart
  • The frequency or relative frequency of a
    categorical variable can be show easily in a bar
    chart.
  • It is used with categorical or numerical discrete
    data.
  • Each bar represent one category and its high is
    the frequency or relative frequency.
  • Bars should be separated.
  • It is very important that Y axis begin with 0.

20
Bar chart
21
Grouped bar chart
  • If we have a nominal categorical variable,
    divided in two categories, can show data with a
    grouped bar chart.
  • It allow easy comparison between groups.

22
Grouped bar chart
23
Pie chart
  • It is an alternative to show categorical
    variable.
  • Each slice of pie correspond at frequency or
    relative frequency of categories of variable.
  • It only shows one variable in each pie chart.
  • If we want to make comparisons, we need to build
    two pie charts.

24
Pie chart
25
Pie chart
26
Distribution of frequency charts histograms
  • It is useful to quantitative variables.
  • There are not spaces between bars.
  • The area bar, not its high, represent its
    frequency.
  • X axis should be continuous.
  • Y axis should begin in 0.
  • Width represent the interval for each group.

27
Distribution of frequency charts histograms
28
Distribution of frequency charts frequencies
polygon
  • It is another form to show the frequency
    distribution of a numerical variable.
  • It is building, joining the middle point higher
    of each bar of histogram.
  • We should be take into account the width of each
    bar.
  • We can plot more than one polygon in each chart,
    to make comparisons.

29
Distribution of frequency charts polygon of
frequencies
30
Distribution of frequencies cumulative histogram
  • We can plot directly from a cumulative
    frequencies table.
  • It is not necessary to make adjustments to the
    high of the bars, because the cumulative
    frequencies represent the total frequency
    superior, including the superior limit of the
    interval.

31
Distribution of frequencies cumulative histogram
32
Distribution of frequencies cumulative polygon
of frequencies
  • We use them to see proportions below o above of a
    point in the curve.
  • We can read median and percentiles, directly.
  • If the distribution is symmetrical, it has S form
    symmetrical.
  • If it is skewed to the right or to the left, will
    be flatten in that side.

33
Distribution of frequencies cumulative polygon
of frequencies
34
Other charts tree and leafs
  • We use it to show directly quantitative data or
    preliminary step in the build a frequency
    distribution.
  • We organize data determining the number of
    divisions (5-15).
  • We plot a vertical line and put the first digit
    of category to the left of the line (tree) and
    the second digit to the right of the vertical
    line (leafs).

35
Other charts tree and leafs
3 5 2 4 932 5 487 6 14
36
Other charts box and line
  • We plot a vertical line that represents the range
    of distribution.
  • We plot a horizontal line that represents third
    quartile and another that represents the first
    quartile (box).
  • The point middle of distribution is show as a
    horizontal line in the center of box.

37
Other charts box and line
38
Localization measures
  • For categorical variable percentage
  • For quantitative variable
  • Central trend measures
  • Mean
  • Median
  • Mode
  • Dispersion measures
  • Standard deviation
  • Percentiles
  • Range

39
Central trend measures
  • Mean
  • It is the conventional mean.
  • If we say that n observations have a xi value,
    then the value of the mean will be

_ X Sxi/n
40
Central trend measures in a frequency distribution
  • Each value of data (xi) occur with a frequency
    (fi), then
  • Ina grouped distribution, we use point middle of
    each interval as x value.

_ X Sxifi/n
41
Central trend measures in a frequency distribution
Interval Point middle Frequency
(fi) _________________________________ 1 3
2 18 4 6
5 27 7 9
8 34 10 12
11 22 13 15
14 13 ____________________
_____________ Total
114 Example of mean for a grouped
distribution (2 x 18) (5 x 27)
(8 x 34) (11 x 22) (14 x 13) 36 135
272 242 182 867 Mean
--------------------------------------------------
------------------- ----------------------------
------------ -------- 7.61
(18 27 34 22 13)
114
114 Mean 7.61 years
42
Central trend measures
  • Median
  • It is the value that divide the distribution in
    two equal parts.
  • If it is a pair number of observations, the
    central values are summed and divided by two.

51.2, 53.5, 55.6, 65.0, 74.2 median is the value
at the half, thus Median 55.6 51.2, 53.5,
55.6, 61.4, 65.0, 74.2, 55.6 61.4 /2 Median
58.5
43
Central trend measures for frequency distributions
  • Median
  • It is the value where is 50.

44
Central trend measures
  • Mode
  • It is the value that occur more frequently.

Interval Point middle Frequency
(fi) _________________________________ 1 3
2 18 4 6
5 27 7 9
8 34 10 12
11 22 13 15
14 13 ____________________
_____________ Total
114
45
Central trend measures
  • Properties
  • Mean is sensitive to the tails, median and mode,
    not.
  • Mode can be affected by little changes in the
    data, median and mean, not.
  • Mode and median can be find in a chart.
  • The three measures are the same in a Normal
    distribution.

46
Central trend measures
  • What measurement to use?
  • For skewed distributions, we use median.
  • For statistical analysis or inference, we use
    mean.

47
Dispersion measures
  • Range
  • It show the minimum and maximum values and the
    difference between they.

51.2, 53.5, 55.6, 61.4, 65.0, 74.2 Range of this
distribution es 51.2 74.2 kg. However, the
extreme values of this distribution are far
center of distribution, it unclear the fact that
the most data are between 53.5 and 65 kg.
48
Dispersion measures
  • Percentiles
  • A percentile o centile is the value, below of
    which, a percentage given of data, has occurred.

See the distribution of stature in this
population. What is the range, median, percentile
25 and 75? Stature (cm.).
n Relative frequency ()
Cumulative frequency () 151
2
0.7
0.7 152
3 1.1
1.8 152
6
2.2
4.0 154
12 4.5
8.5 155
27
10.0
18.5 157
29 10.8
29.3 158
26
9.7
39.0 159
33 12.3
51.3 163
37
13.8
65.1 164
16 5.9
71.0 165
24
8.9
79.9 168
18 6.7
86.6 169
14
5.2
91.8 171
6 2.2
94.0 174
7
2.6
96.6 175
1 0.4
97.0 177
4
1.5
98.5 179
2 0.7
99.2 184
1
0.4
99.6 185
1 0.4
100.0 _____________________
________________________________________________ T
otal 269
100.0
49
Dispersion measures
  • Standard deviation
  • It is the more common form of to quantify the
    variability of a distribution.
  • It measure the distance between each valu and its
    mean.


Subject High Value

S Xi - X 1 1.6 -1
Mean deviation -------------
2 1.7 0
n
3 1.8 1

_
X 1.7 Mean deviation
(-1)(0)(1)/3 0
50
Dispersion measures
  • Standard deviation
  • We should be interest in magnitude of
    observations.
  • If squared each deviation, we shall have positive
    values.
  • If divided this add by n -1, we shall obtain
    variance and if we obtain square root, shall have
    standard deviation.


Subject High
Value2
S (Xi - X)2 1 1.6
0.1 Standard deviation v
--------------- 2 1.7
0
n-1 3
1.8 0.1
_
X 1.7
Standard deviation v0.2/2 0.32
51
Dispersion measures fo grouped data
  • Standard deviation
  • It use the mean point of each interval.



S f(Xi - X)2
Standard deviation v
--------------
f - 1
Also, it can be
expressed as
Sfx2 - (Sfx)2 /Sf
Standard deviation v -------------------
--
S f -1

52
Dispersion measures for grouped data
  • For data with Normal distribution
  • Around 68 of data are between -1 and 1 standard
    deviation.
  • Around 95 of data are between -2 and 2 standard
    deviations.
  • Around 99.9 of data are between -3 and 3
    standard deviations.
  • Standard deviation is a measure of the width of
    the distribution. If the standard deviation
    change, the distribution change, also.

53
Bibliography
  • 1.- Kirkwood BR. Essentials of medical
    statistics. Oxford, Blackwell Science, 1988.
  • 2.- Altman DG. Practical statistics for medical
    research. Boca Ratón, Chapman Hall/ CRC 1991.
Write a Comment
User Comments (0)
About PowerShow.com