Title: Introduction to Statistics
1Introduction to Statistics
2Why we do it
- "What we really want to get at in health care
research is not how many reports have been done,
but how many people's lives are being bettered by
what has been accomplished. In other words, is it
being used, is it being followed, is it actually
being given to patients... What effect is it
having on people"
Rep. John Porter (R-IL), retired chairmanHouse
Appropriations Subcommittee on Labor, Health and
Human Services (HHS), and Education
3Is Statistics Important?
- Statistics is important because we can use it to
find out whether something we observe can be
applied to new and different situations. - Knowing this allows us to plan for the future,
and to make decisions about how to allocate our
scarce resources of money, energy, and ultimately
life. - We use the term generalizable can what we know
help to predict what will happen in new and
different situations?
4Why Statistics
- Scientific knowledge represents the best
understanding that has been produced by means of
current evidence. - Research design, if used properly, strengthens
the objectivity of the research. - Statistical methods allow us to compare what is
actually observed to what is logically expected.
5Why Statistics (contd)
- Knowledge of statistics . . .
- Useful in conducting investigations
- Helpful the preparing and evaluating research
proposals. - Vital in deciding whether claims of a researcher
are valid - Keep abreast of current developments.
- Effective presentations of the findings.
6(No Transcript)
7Evils of Pickle Eating
- Pickles are associated with all the major
diseases of the body. Eating them breeds war and
Communism. They can be related to most airline
tragedies. Auto accidents are caused by pickles.
There exists a positive relationship between
crime waves and consumption of this fruit of the
cucurbit family. For example
8Evils of Pickle Eating (contd)
- Nearly all sick people have eaten pickles. 99.9
of all people who die from cancer have eaten
pickles. - 100 of all soldiers have eaten pickles.
- 96.8 of all Communist sympathizers have eaten
pickles. - 99.7 of the people involved in air and auto
accidents ate pickles within 14 days preceding
the accident. - 93.1 of juvenile delinquents come from homes
where pickles are served frequently. Evidence
points to the long-term effects of pickle eating. - Of the people born in 1839 who later dined on
pickles, there has been a 100 mortality.
9Evils of Pickle Eating (contd)
- All pickle eaters born between 1849 and 1859 have
wrinkled skin, have lost most of their teeth,
have brittle bones and failing eyesight-if the
ills of pickle eating have not already caused
their death. - Even more convincing is the report of a noted
team of medical specialists rats force-fed with
20 pounds of pickles per day for 30 days
developed bulging abdomens. Their appetites for
WHOLESOME FOOD were destroyed.
10Evils of Pickle Eating (contd)
- In spite of all the evidence, pickle growers and
packers continue to spread their evil. More than
120,000 acres of fertile U.S. soil are devoted to
growing pickles. Our per capita consumption is
nearly four pounds. - Eat orchid petal soup. Practically no one has as
many problems from eating orchid petal soup as
they do with eating pickles. - EVERETT D. EDINGTON
11(No Transcript)
12(No Transcript)
13Types of Statistics
- Descriptive Statistics
- enumerate, organize, summarize, and categorize
- graphical representation of data.
- these type of statistics describes the data.
- Examples
- means and frequency of outcomes
- charts and graphs
14Types of Statistics
- Inferential Statistics
- drawing conclusions from incomplete information.
- they make predictions about a larger population
given a smaller sample - these are thought of as the statistical test
- Examples
- t-test, chi square test, ANOVA, regression
15(No Transcript)
16Variables
J.D. Bramble, Ph.D. Creighton University Medical
Center Med 483 -- Fall 2006
17Types of Data
- Qualitative
- data fall into separate classes with no numerical
relationship - sex, mortality, correct/incorrect, etc.
- Quantitative
- numerical data that is continuous
- pharmaceutical costs, LOS, etc.
18Parameters and Statistics
- Parameters
- characteristics of the population
- calculating the exact population parameter is
often impractical or impossible - Statistics
- characteristics of the sample
- represent summary measures of observed values
19Types of Variables
- Variables are symbols to which numerals or values
are assigned - e.g. X and Y are variables
- Dependent (Ys), that which is predicted
- Independent (Xs), that which predicts
- Extraneous (Confounding or Control)
- statistical models adjust for their influence
20Independent variables
- Independent variables are the presumed cause of
the the dependent variable - The variable responsible for the change in the
phenomena being observed - Nothing is for sure, so avoid the word cause
and think in terms of independent and dependent
variables
21Dependent variables
- Also referred to as the outcome variable
- The outcome of the changes due to the independent
variables - Example y a bx
22Confounding variables
- Additional variables that may effect the changes
in the dependent variable attributed to the
independent variables. - These variables are controlled by measuring them
and statistical methods adjust for there
influence. - Sometimes referred to as control variables
23Active vs. attribute variables
- Active variables are those variables under the
control of the researcher - controlled experimental studies
- e.g., amount of drug administered
- Attribute variables can not be manipulated by the
researcher - quasi-experimental studies
- e.g.,sex or age of subject blood pressure smoker
24The Wrong data Leads to Migraines
25Levels of Measurement
- Categorical Variables
- Nominal Scale
- Ordinal Scale
- Continuous Variables
- Interval Scale
- Ratio Scale
26Continuous Variables
- Continuous variables are measured and can take on
any value along the scale - quantitative variables
- measured on a interval or ratio level
- Examples
- Age, income, number of medications
27Categorical Variables
- Categorical variables are measured as dichotomous
or polytomous measures - qualitative variables
- measured on a nominal or ordinal level
- Examples
- sex smoking status ownership
- Categorizing continuous variables
28Nominal measurement scale
- Used for qualitative data
- Two or more levels of measurement
- The name of the groups does not matter
- Examples
- Sex (Male/Female)
- Smoker (Yes/No)
- Political Party (Rep, Dem, Ind)
29Ordinal measurement scale
- All the properties of nominal plus . . .
- The groups are ordered or ranked
- Intervals between groups are not necessarily
equal - Example
- Income (low, med, high)
- Disease severity
- Likert scales
30Interval measurement scale
- All properties of nominal and ordinal plus . . .
- A scale is used to measure the response of the
study subjects - The intervals scales units are equal however
arbitrary (e.g., a relative scale) - Examples
- Temperature on Fahrenheit scale
31Ratio measurement scale
- All properties of the previous scales plus . . .
- An absolute zero point
- Can perform mathematical operations
- Highest level of measurement
- Examples
- Income, age, height, weight
32Summarizing Data
Measures of Central Tendency and Variation
33Mean
- Arithmetic mean
- the balance point sum all observations
- sum all observations
- divide the sum by the number of observations
34Median
- Divides the distribution into two equal parts.
- Considered the most typical observation
- Less sensitive to extreme values
35Calculating Medians
- To find the median value q(n1)
- 41, 28, 34, 36, 26, 44, 39, 32, 40, 35,
36, 33 - order data in ascending order
- 26, 28, 32, 33, 34, 35, 36, 36, 39, 40, 41,
44 - Apply the median location formula 0.5(121)
6.5 - Note this is ONLY the location of the median
36Quantiles
- Quantiles are those values that divide the
distribution into n equal parts so that there is
a given proportion of data below each quantile. - The median is the middle quantile.
- Quartiles are also very common (25, 50, 75)
- If we divided the distribution into 100 then we
have percentiles.
37Mode
- The observation that occurs most frequently
- Graphically it is the value of the peak of the
distribution. - Frequency often may be bimodal--two modes.
- If values are all the same--no mode exists
38Single Modal
39Bimodal Example
40Symmetrical The relationship between the Mean,
Median, Mode
mean median mode
41Positive Skew The relationship between the Mean,
Median, Mode
Mean
Mode
Median
42Negative Skew The relationship Between the Mean,
Median, Mode
Mode
Mean
Median
43Summarizing Data
- Frequency distributions
- Measures of central tendency
- The tendency of data to center around certain
numerical and ordinal values. - Three common measures
- mean, median, mode
- Measures of variation
- standard deviation
44Five Figure Summary
- Median
- Quartiles
- Maximum
- Minimum
- Can be shown in a box and whisker plot
45Which Measure?
- Mean
- numerical data
- symmetric distribution
- Median
- ordinal data
- skewed distribution
- Mode
- bimodal distribution
- most popular
46Variation
- Must also report measures of variation
- Measures of variability reflect the degree to
which data differ from one another as well as the
mean. - Together the mean and variability help describe
the characteristics of the data and shows how the
distributions vary from one another.
47Example of Variation
- Take the following three sets of data       Â
1) 10, 8, 5, 5, 2Â Â Â Â Â Â Â 2) 5, 6, 6, 7,
6Â Â Â Â Â Â Â Â Â 3) 6, 6, 6, 6, 6 - In all three cases the mean is 6,
- the variability is a lot of variability in set 1
- No variability in set 3.
- We will discuss three measures of variability 1)
the range 2) the standard deviation and 3)
variance
48Measures of Variation
- Range
- the value between the highest and the lowest
observations - Range xmax - xmin
- limited usefulness since it only accounts for the
extreme values - can also report the inter-quartile range (q3
q1)
49Standard Deviation
- most widely used preferred measure of
variation. - represented by the symbol s or sd
- the square root of the variance (s2)
- larger values more heterogeneous distribution
- 75 of the observations lie between x-2s and x2s
- if the distribution is normal (bell shaped)
- 67
- 95
- 99.7
50Variance and Std Deviation
Variance
Standard Deviation
51Example
- Using data on the sexual activity of male and
female subjects can be found in Chatterjee,
Handcock, and Simonoff (1995) A casebook for a
first course in statistics. New York Wiley. They
provide data on the reported number of sexual
partners for 1682 females and 1850 males. The
dependent variable is the number of reported
partners.
52Descriptive Statistics
- Male Female (n1850) (n1685)
Mean 10.9 3.4
Median 4 1
Mode 1 1
53Using Excel When Syntax in Known
- Write them right into the spreadsheet
- Be sure to start with an equal sign
- Use your mouse to highlight data to analyze
54Using Excel When Syntax in Unknown
- Use the wizard and follow in instructions.
- All wizards work about the same way.
Select the fx button to select appropriate test
Select category and then desired test
55Follow the Wizard
Either highlight the array or just write it in
These icons reduce/enlarge the Wizard box