Title: Statistics for clinicians
1Statistics for clinicians
- Biostatistics course by Kevin E. Kip, Ph.D.,
FAHAProfessor and Executive Director, Research
CenterUniversity of South Florida, College of
NursingProfessor, College of Public
HealthDepartment of Epidemiology and
BiostatisticsAssociate Member, Byrd Alzheimers
InstituteMorsani College of MedicineTampa, FL,
USA
2SECTION 1.1 Module Overview and
Introduction Introduction to biostatistics,
descriptive statistics, SPSS, and Power Point.
3SECTION 1.4 Introduction to SPSS
4Introduction to SPSS
- Database structure
- Data view and variable view
- Variable names, labels, and formats
- Interactive menus
- SPSS syntax generated from interactive analyses
5SECTION 1.5 Summarizing Data in Charts
6Summarizing Data Charts 1. One categorical, gt1
proportion/percentage (i) Bar chart (ii) Stacked
bar chart (iii) Stacked bar chart
(100) 2. One categorical, gt1 continuous
variable (i) Box plot (ii) High-low (iii) Line
(iv) Kernel-density plots 3. Two continuous
variables (i) X-Y scatter (ii) Histogram (can
be used for 1 variable)
71. One categorical, gt1 proportion/percentage (i)
Bar chart
- Rectangular bars with lengths proportional to the
values that they represent. - Bars can be plotted vertically or horizontally.
81. One categorical, gt1 proportion/percentage (ii)
Stacked bar chart
- Can be counts or percentages.
- Do not sum to a specified value
Obese
Age Group
91. One categorical, gt1 proportion/percentage (iii
) Stacked bar chart (100)
Bar Charts and Stacked Bar Charts Important to
select either row versus column
percentages Example Race and blood pressure
classification Usually, the row variable is the
predictor, and the column variable is the
outcome. SPSS Analyze Descriptive
statistics Crosstabs
10Bar Charts and Stacked Bar Charts
Column Percentage SPSS-CROSSTABS
/TABLESSCR_RACECAT3 BY SCR_BP_CLASS4
/FORMATAVALUE TABLES /CELLSCOUNT COLUMN
/COUNT ROUND CELL /BARCHART.
Race BP classification Crosstabulation Race BP classification Crosstabulation Race BP classification Crosstabulation Race BP classification Crosstabulation Race BP classification Crosstabulation Race BP classification Crosstabulation Race BP classification Crosstabulation Race BP classification Crosstabulation
   BP classification BP classification BP classification BP classification Total
   Normal Prehypertensive Hypertensive Stage 1 Hypertensive Stage 2 Total
Race White Count 247 397 294 95 1033
Race White within BP classification 65.2 58.3 49.8 38.0 54.4
Race Black Count 117 262 275 149 803
Race Black within BP classification 30.9 38.5 46.6 59.6 42.3
Race Other Count 15 22 21 6 64
Race Other within BP classification 4.0 3.2 3.6 2.4 3.4
Total Total Count 379 681 590 250 1900
Total Total within BP classification 100.0 100.0 100.0 100.0 100.0
11Difficult to identify trends
12Bar Charts and Stacked Bar Charts
Row Percentage SPSS-CROSSTABS
/TABLESSCR_RACECAT3 BY SCR_BP_CLASS4
/FORMATAVALUE TABLES /CELLSCOUNT ROW /COUNT
ROUND CELL /BARCHART.
Use row percentages in stacked bar chart (PP)
13Power Point Chart Column 100 Stacked Column
14Power Point Chart (Practice) Column - 100
Stacked Column Display Quality of Life from Poor
to Excellent by Gender
Column Percentages for QOL
Row Percentages for QOL
15Power Point Chart Column 100 Stacked Column
16Power Point Chart Column 100 Stacked Column
172. One categorical, gt1 continuous
variable (i) Box plot
- Also known as box-and-whisker diagram.
- Displays 5 summary statistics minimum, lower
quartile (Q1), median (Q2), upper quartile (Q3),
and maximum - No assumptions on underlying statistical
distribution non-parametric
SPSS Graphs Chart Builder Boxplot Example
HDL Cholesterol (continuous) distribution by
gender (categorical)
182. One categorical, gt1 continuous
variable (i) Box plot
Question Are HDL cholesterol levels positively
or negative skewed? Run SPSS frequencies procedure
192. One categorical, gt1 continuous
variable (i) Box plot
Question Are triglycerides positively or
negative skewed? Run SPSS frequencies procedure
202. One categorical, gt1 continuous
variable (i) Box plot (Practice)
Draw a box plot of the distribution of HDL
cholesterol by ethnicity Hispanic Min30,
Q140, Q246, Q356, Max86 Non-Hispanic Min
21, Q146, Q256, Q366, Max131
Example
212. One categorical, gt1 continuous
variable (i) Box plot (Practice)
Draw a box plot of the distribution of HDL
cholesterol by ethnicity Hispanic Min30,
Q140, Q246, Q356, Max86 Non-Hispanic Min
21, Q146, Q256, Q366, Max131
222. One categorical, gt1 continuous
variable (ii) High-low
- Can trick Power Point to use open-high-low-close
chart (i.e. used for financials) to show
distributions of continuous variables - Upper and lower ends (high-low) can represent any
percentiles, such as 5th/95th percentiles
23(No Transcript)
24Total Cholesterol (mg/dl)
P0.003
Ptrend0.009
EUgt25
EUgt85
EUgt40
EUlt40
EUlt25
White
Black
White
Black
Black
Black
Self-Report
Admixture Defined
N (753) (464)
(753) (68)
(201) (195)
The filled rectangles depict the interquartile
range (25th and 75th percentile). The lower and
upper limits of the vertical lines depict the 5th
and 95th percentiles, respectively.
25Total Cholesterol (mg/dl)
U.S. Black vs. Ghana Urban P0.0001 U.S.
Black vs. Ghana Rural Plt0.0001 Ghana
Urban vs. Ghana Rural Plt0.0001
N594
N546
N80
N111
The filled rectangles depict the interquartile
range (25th and 75th percentile). The lower and
upper limits of the vertical lines depict the 5th
and 95th percentiles, respectively.
26 5 25 75 95 Male 137 175 224 271 Female 15
3 190 245 295
Total Cholesterol (Practice in Power Point
first draw by hand) (mg/dl)
The filled rectangles depict the interquartile
range (25th and 75th percentile). The lower and
upper limits of the vertical lines depict the 5th
and 95th percentiles, respectively.
27Total Cholesterol (Practice in Power Point)
(mg/dl)
5 25 75 95 Trick Power Point Male 137 175
224 271 Open High Low Close Female 153 190 245 29
5 25 95 5 75
The filled rectangles depict the interquartile
range (25th and 75th percentile). The lower and
upper limits of the vertical lines depict the 5th
and 95th percentiles, respectively.
282. One categorical, gt1 continuous
variable (iii) Line chart
- Typically represents trend in data over intervals
of time (i.e. time series) - Often used to show repeated health outcome
measurements over time.
Prevalence of Use () Crohns Disease Medications
29In this example, the categorical variable is
individual subject nested within each treatment
arm of the trial
302. One categorical, gt1 continuous
variable (iv) Kernel density plots
- Like a histogram, but constructs a smooth
probability density function
313. Two continuous variables (i) X-Y scatter
- Shows the relationship between two sets of
continuous data - Also called a scatter chart, scattergram, scatter
diagram or scatter graph.
Body Density
Body Mass Index
323. Two continuous variables (ii) Histogram(s)
- Probability distribution of a continuous
variable(s) displayed over discrete intervals
(bins) - The bins contain frequency counts, or can be
normalized to display relative frequencies (i.e.
proportion of cases that fall into each category
(bin) with total area 1.0)
subjects
333. Two continuous variables (ii) Histogram(s)
- Probability distribution of a continuous
variable(s) displayed over discrete intervals
(bins) - The bins contain frequency counts, or can be
normalized to display relative frequencies (i.e.
proportion of cases that fall into each category
(bin) with total area 1.0)
34SECTION 1.6 SPSS Data Manipulation
35SPSS Data Manipulation and Syntax Editor
- Recode continuous variable into
arbitrarily-defined or pre-defined categories - Visual binning of continuous variable
- Transform a skewed variable
- Using the SPSS Data Editor
36SPSS Data Manipulation and Syntax Editor
- Recode continuous variable into
arbitrarily-defined or pre-defined categories - Example Define age into 3 categories
(arbitrary) - 45-54
- 55-64
- 65 and older
- SPSS
- Transform
- Recode into different variables
- Input variable is age
- Output variable
- Name age_cat
- Label Age in 3 categories
- Click on old and new values
- Range specify explicitly
- 45-54 value 1
- 54 64 value 2
- 65 and older value 3
37SPSS Data Manipulation and Syntax Editor
2. Visual binning of continuous
variable Example Body mass index Put in
output name for binned variable Make
cutpoints Equal percentiles based on scanned
cases Put in labels for frequency display in bar
chart  SPSS Code Visual Binning.
38SPSS Data Manipulation and Syntax Editor
3. Transform a skewed variable Descriptive
statistics for triglycerides in natural
scale Mean, median, SD, min, max, skewness,
kurtosis Chart histogram with normal curve
superimposed  Triglycerides are skewed. Use a
transformation to create a new variable and
reduce the skew in triglycerides. Â SPSS Compute
variable Target Variable LOG_TRIG Numeric
Expression lg10(LAB_TRIG_VAP) SPSS
Syntax COMPUTE log_triglg10(LAB_TRIG_VAP).
39SPSS Data Manipulation and Syntax Editor
- 4. Using the SPSS Data Editor
- SPSS File New (syntax)
- Save the file with a new name
- 1. Select males only (scr_sex1)
- Data
- Select Cases
- If scr_sex1
- USE ALL.
- COMPUTE filter_(SCR_SEX1).
- VARIABLE LABELS filter_ 'SCR_SEX1 (FILTER)'.
- VALUE LABELS filter_ 0 'Not Selected' 1
'Selected'. - FORMATS filter_ (f1.0).
- FILTER BY filter_.
- EXECUTE.
- Run descriptives for age
- Copy code and repeat for females (scr_sex2)
40SPSS Data Manipulation and Syntax Editor
4. Using the SPSS Data Editor USE ALL. COMPUTE
filter_(SCR_SEX1). VARIABLE LABELS filter_
'SCR_SEX1 (FILTER)'. VALUE LABELS filter_ 0
'Not Selected' 1 'Selected'. FORMATS filter_
(f1.0). FILTER BY filter_. EXECUTE. DESCRIPTIV
ES VARIABLESSCR_AGE /STATISTICSMEAN STDDEV
MIN MAX. USE ALL. COMPUTE filter_(SCR_SEX2).
VARIABLE LABELS filter_ 'SCR_SEX2 (FILTER)'.
VALUE LABELS filter_ 0 'Not Selected' 1
'Selected'. FORMATS filter_ (f1.0). FILTER BY
filter_. EXECUTE. DESCRIPTIVES
VARIABLESSCR_AGE /STATISTICSMEAN STDDEV MIN
MAX.