Title: Community Dental Health
1Community Dental Health
2Statistics
- Statistics is the field of study which concerns
itself with the art and science of data analysis - Planning, collecting, organizing, analyzing,
interpreting, summarizing and presenting the data - Statistics, when used in the plural form, refers
to the specific bits of data which either have
been or are about to be gathered.
3Introduction To BIOSTATISTICS
- Biostatistics
- The mathematics of collection, organization and
interpretation of numeric data having to do with
living organisms. - Techniques to manage data
- Descriptive
- Inferential
4Facts About Data
- Two types of data
- Qualitative labels used to identify an item when
it cannot be numerically identified. - e.g. marital status, car colour, occupation
- (attributes)
- n.b. has absolutely nothing to do with the
quality of the data - Quantitative characteristics that can be
expressed numerically. Any mathematical
manipulation that is carried out on them will
have meaning. - e.g. height, length, volume, number of DMFTs
- (variates)
5Data Management
- Grouping data to make it easier to understand.
- Descriptive Technique
- Used to describe and summarize a set of numerical
data - Tabular and graphical methods
- Apply to generalizations made about the group
studied
6Descriptive Data Display Types
An Array A group of scores arranged from lowest
to highest in value. e.g. Histology test results
24 students
19 28 30 44 41 41
25 33 39 49 42 38
26 35 41 38 33 40
30 38 44 31 36 46
Raw Data
Array 19, 25, 26, 28, 30, 30, 31, 33, 33, 35,
36, 38, 38, 38, 39, 40, 41, 41, 41, 42, 44, 44,
46, 49 / 50 total
7Descriptive Data Display Types
- Arrays are bulky and hard to read, thus an
alternative is - Frequency Distribution
- An organization of scores from lowest to highest
which includes the number of times each score
value occurs in the data set.
8Descriptive Data Display Types
- Frequency Distribution 3 Types
- Ungrouped
- Each possible score value of the variable being
measured is represented in the display and the
frequency of occurrence of the value is recorded.
Sample -
9Descriptive Data Display Types
Frequency Distribution Ungrouped
Score F Score F Score F
50 40 1 30 2
49 1 39 1 29
48 38 3 28 1
47 37 27
46 1 36 1 26 1
45 35 1 25 1
10Descriptive Data Display Types
2. Grouped Frequency Distribution When a broad
range of values on the measurement is possible
(i.e. gt 30), the range is collapsed by grouping
scores together into smaller value ranges.
Scores Grouped Cumulative
16-20 1 1
21-25 1 2
26-30 4 6
31-35 3 9
36-40 6 15
41-45 7 22
46-50 2 24
11Descriptive Data Display Types
3. Cumulative Frequency Distribution Used with
score groupings where the frequency of any one
group includes all instances of scores in that
group plus all the groups of lower score values.
Scores Grouped Cumulative
16-20 1 1
21-25 1 2
26-30 4 6
31-35 3 9
36-40 6 15
41-45 7 22
46-50 2 24
12Central Tendency
- Term in statistics that describes where the data
set is located. - Measures of Central Tendency
- Used to describe what is typical in the sample
group based on the data gathered. - Three Main Indicators
- Mean
- - Median
- - Mode
13Central Tendency
- Mean arithmetic average of scores
- Mean symbol is ( x )
- Scores are all added then divided by the number
of scores. - The most common measure
- Data set 3, 7, 9, 4, 9, 16 48 / 6 8
14Central Tendency
- Median
- Is the point that divides the distribution of
scores into 2 equal parts 50 / 50 - With odd set of numbers, median is the datum in
the middle - i.e. 3, 7, 2, 5, 9 rearranged to 2, 3, 5, 7,
9 - median 5
- With even set of numbers, median is the average
of the two middle values - i.e. 4, 7, 1, 3, 8, 2 rearranged to 1, 2, 3,
4, 7, 8 - 3 4 7 / 2 median 3.5
15Central Tendency
- Mode
- Is the most frequently occurring score in a
distribution - i.e. 4, 3, 4, 9, 7, 2 mode 4
- i.e. 3, 8, 4, 2, 4, 9, 7, 4, 9, 1, 9
- bimodal data set 4 and 9
16BIOSTATISTICS Continued
- Previously discussed
- Descriptive statistical techniques
- The first measures of spread / central tendency
- Information about central tendency is important.
Equally important is information about the spread
of data in a set. -
17Variability/Dispersion
- Three terms associated with variability /
dispersion - Range
- Variance
- Standard Deviation
- (They describe the spread around the central
tendency)
18Variability/Dispersion
- Range
- The numerical difference between the highest and
lowest scores - Subtract the lowest score from the highest score
- i.e. c 19, 21, 73, 4, 102, 88
- Range 102 4 98
- n.b. easy to find but unreliable
19Variability/Dispersion
- Variance
- The measure of average deviation or spread of
scores around the mean - - Based on each score in the set
- Calculation
- Obtain the mean of the distribution
- Subtract the mean from each score to obtain a
deviation score - Square each deviation score
- Add the squared deviation scores
- Divide the sum of the squared deviation scores by
the number of subjects in the sample
20Variability/Dispersion
- Standard Deviation of a set of scores is the
positive square root of the variance - - a number which tells how much the data is
spread around its mean - Interpretation of Variance and Standard Deviation
is always equal to the square root of the
variance - The greater the dispersion around the mean of
the distribution, the greater the standard
deviation and variance
21Normal Curve (Bell)
- A population distribution which appears very
commonly in life science - Bell-shaped curve that is symmetrical around the
mean of the distribution - Called normal because its shape occurs so often
- May vary from narrow (pointy) to wide (flat)
distribution - The mean of the distribution is the focal point
from which all assumptions may be made - Think in terms of percentages easier to
interpret the distribution
22Research Techniques
- Inferential Statistics
- (Statistical Inference)
- Techniques used to provide a basis for
generalizing about the probable characteristics
of a large group when only a portion of the group
is studied - The mathematic result can be applied to larger
population
23Definitions Relating To Research Techniques
- Population
- Entire group of people, items, materials, etc.
with at least one basic defined characteristic in
common - Contains all subjects of interest
- A complete set of actual or potential
observations - e.g. all Ontario dentists or all brands of
toothpaste - Sample
- A subset (representative portion) of the
population - Do not have exactly the same characteristics as
the population but can be made truly
representative by using probability sampling
methods and by using an adequate sample size (5
types of sampling) -
24Definitions Relating To Research Techniques
- Parameters
- Numerical descriptive measures of a population
obtained by collecting a specific piece of
information from each member of the population - Number inferred from sample statistics
-
- E.G. 2,000 women over age 50 with heart disease
25Definitions Relating To Research Techniques
- Statistic
- A number describing a sample characteristic.
Results from manipulation of sample data
according to certain specified procedures - A characteristic of a sample chosen for study
from the larger population - e.g. 210 women out of 500 with diabetes have
heart problems
26Sampling Procedures
- 5 Types of Samples
- A random sample by chance
- A stratified sample categorized then random
- A systematic sample every nth item
- A judgment sample prior knowledge
- A convenience sample readily available
27Concept Of Significance
- Probability P (symbol)
- When using inferential statistics, we often deal
with statistical probability. - The expected relative frequency of a particular
outcome by chance or likelihood of something
occurring - Coin toss
28Probability
- Rules of probability
- The (P) of any one event occurring is some value
from 0 to 1 inclusive - The sum of all possible events in an experiment
must equal 1 - Numerical values can never be negative nor
greater than 1 - 0 non event
- P 1 event will always happen
29Probability
- Calculating probability
- Number of possible successful outcomes
- / Number of all possible outcomes
- E.G. Coin flip
- 1 successful outcome of heads
- / 2 possible outcomes P .5 or 50
- E.G. Throw of dice
- 1 successful outcome
- / 6 possible outcomes P .17 or 16.6
30Hypothesis Testing
- The first step in determining statistical
significance is to establish a hypothesis - To answer questions about differences or to test
credibility about a statement - e.g. ? does brand X toothpaste really whiten
teeth more than brand Y ? -
31Hypothesis Testing
- Null hypothesis (Ho) there is no statistically
significant difference between brand X and brand
Y - Positive hypothesis brand X does whiten more
- Ho most often used as the hypothesis
- Ho assumed to be true
- Therefore the purpose of most research is to
examine the truth of a theory or the
effectiveness of a procedure and make them seem
more or less likely!
32Hypothesis Characteristics
- Hypothesis must have these characteristics in
order to be researchable. - Feasible
- Adequate number of subjects
- Adequate technical expertise
- Affordable in time and money
- Manageable in scope
- Interesting to the investigator
- Novel
- Confirms or refutes previous findings
- Extends previous findings
- Provides new findings
33Hypothesis Characteristics
- Ethical
- Relevant
- To scientific knowledge
- To clinical and health policy
- To future research direction
34Significance Level
- A number (a alpha) that acts as a cut-off point
below which, we agree that a difference exists
Ho is rejected. Alpha is almost always either
0.01, 0.05 or 0.10. - Represents the amount of risk we are willing to
take of being wrong in our conclusion - P lt 0.10 10 chance
- P lt 0.01 1 chance (cautious)
- P lt 0.05 5 chance
- Critical value cut-off point of sample is set
before conducting the study (usually P lt 0.05)
35Degree Of Freedom (D.F.)
- Most tests for statistical significance require
application of concept of d.f. - d.f. refers to number of values observed which
are free to vary after we have placed certain
restrictions on the data collected - d.f. usually equals the sample size minus 1
- e.g. 8, 2, 15, 10, 15, 7, 3, 12, 15, 13 100
- d.f. number (10) minus 1 9
- Takes chance into consideration
- A penalty for uncertainty, so the larger the
sample the less the penalty