Classification, analysis - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Classification, analysis

Description:

To discuss ways in which quantitative data can be classified & analysed ... we would expect to see from a reasonably homogenous and representative sample. ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 51
Provided by: nurs54
Category:

less

Transcript and Presenter's Notes

Title: Classification, analysis


1
Classification, analysis interpretation of
quantitative data
  • Dr Heather Wharrad

2
There Are Three Kinds Of Lies Lies, Damned
Lies, And Statistics

(quoted by B Disraeliattributed to Mark Twain)
3
Session Overview
  • To discuss ways in which quantitative data can be
    classified analysed
  • Identify some important considerations in the use
    statistics in your dissertation
  • Demonstrate SPSS

4
Research Process
Research process
5
Data analysis depends on..
  • The number of variables being examined
  • The level of measurement
  • whether for descriptive or inferential purposes?

6
  • Levels of measurement

7
Number of variables
  • Single variable (Univariate)
  • describe individual characteristics separately
    eg age, sex, income level
  • 2 variables (Bivariate)
  • sex income level (do males earn more than
    females?)
  • 3 or more variables (Multivariate)
  • sex, income and education (are differences in
    income level due to sex and/or education?)

8
Methods of analysis
9
Descriptive Inferential Statistics
  • Descriptive Statistics
  • Methods used to summarise or describe
    observations eg frequency distributions, average,
    range, standard deviation
  • Inferential Statistics
  • use predictions from a sample(s) to make
    generalisations about a population(s)
  • statistical tests

10
Frequency Count
  • In order to see patterns in data it is useful to
    classify the data. This might involve dividing
    the range of measured values into groups (ten is
    a reasonable number) and then placing the
    subjects into a group. The number of subjects in
    a category is the frequency count for that
    category.

11
A 50-54 B 55-59 C 60-64 D 65-69 E 70-74 F
75-79 G 80-84 H 85-89 I 90-94 J 95-99
Take your pulse. Identify which letter
corresponds to your pulse rate
12
(No Transcript)
13
  • Here is an example of what we would expect to see
    from a reasonably homogenous and representative
    sample.
  • Group Frequency
  • A 1
  • B 2
  • C 4
  • D 6
  • E 9
  • F 15
  • G 9
  • H 6
  • I 4
  • J 2

14
Frequency Distribution Chart
  • This refers to a graph which shows the frequency
    (number of subjects) in each category on a ranked
    scale.
  • The categories are on the horizontal axis and the
    frequency count is on the vertical axis

15
Descriptive Measures of central tendency
  • Mean (Average)
  • -sum of the observations divided by sample size
  • Mode
  • -the value with the largest frequency
  • - only for large samples in frequency form
  • Median
  • - middle value when values are placed in order
  • - if n is an even number, take the average of the
    two values nearest to the middle

16
Example five pulses in ascending order 60 72
72 75 81 Mean ? 72 (360
divided by 5) Mode 72 Most common
number Median 72 Middle value of data set
17
Although the mean is the most commonly used
descriptive statistic, certainly by popular
media, there are situations when it is
inappropriate. Scenario 1 The media reports
that the average income in a village is 100,000.
This might sound like a wealthy place with a high
number of well paid jobs, until you discover that
one villager is a manager of an international oil
company and has an income of 1million. In this
case the data is skewed and it would be more
useful to know the median income.
18
(No Transcript)
19
Scenario 2 An Olympic Athletes conference
accidentally mingles with the Kenco Coffee Club
AGM on a coffee break. Someone measures
everybodys pulses and when the pulses are
plotted, the following graph is plotted.
20
(No Transcript)
21
Descriptive statistics variation
  • Range
  • difference between the largest and smallest
    values in your sample
  • Standard deviation
  • standard deviation of the observations from the
    mean

22
Quantitative Analysis
Interpreting numerical data Part 2 Inferential
statistics
23
Research Process
Inferential statistics
24
Hypothesis
  • Supposition about the data
  • Is there a difference between data sets A B?
  • Null hypothesis - there is no difference between
    data sets A B
  • ACCEPT ? REJECT
  • Significance Testing

25
Significance testing
  • to determine whether an observed difference (or
    association) between 2 or more sets of data is
    real or could have arisen by chance
  • statistical tests have been derived to apply to
    different types of data, the end result is a
    significance level or probability (p) value.

26
Dont have disease
Have disease
Trial group
0
100
0
Control group
100
27
Dont have disease
Have disease
Trial group
100
0
0
Control group
100
28
Dont have disease
Have disease
Trial group
75
25
55
Control group
45
29
Probability
Probability number of desired outcomes number
of possible outcomes
Examples Chance of getting head on coin is
1/2 Probability of obtaining number three on a
dice is 1/6
30
Coincidence
If the probability of each of two separate events
is known, the coincidence of both events
happening together is then calculated by
multiplying the two probabilities
together. Example, probability of obtaining both
a head on coin and a three on a dice in any one
go is 1/2 X 1/6 1/12 Below are all the
possible outcomes H1 T1 H2 T2 H3 T3 H4 T
4 H5 T5 H6 T6
31
Comparing two sets of data
People with brown hair
People with blue eyes
The area shaded is the overlap, commonly known as
the intersect. It represents a region in which
data from a subject is common to both groups.
In this case it represents people with both
blue eyes and brown hair.
32
Using Probability to compare two sets of data
Statistical tests calculate a P value. This
indicates the probability that data from group A
could also have come from group B. If P is less
than 0.05 we say there is a significant
difference between A and B.
P0.8
P0.05
P0
33
Probability
  • By convention, a significance level of 5
    (p0.05) is considered to be acceptable
  • 5 risk that the null hypothesis is true
  • put the other way round, there is a 95
    probability that any observed difference is not
    the result of chance.

34
Significance level
35
Parametric v Non-parametric
  • Parametric tests
  • used on normally distributed data
  • used on interval/ratio data not nominal or
    ordinal
  • samples have equal variances
  • Non-parametric tests
  • nominal (Chi squared)
  • ordinal or interval/ratio data

36
Which tests?
37
Correlation
So far we have considered how to make
comparisons of one variable between two sets of
data taken from from either one group of subjects
or two groups of subjects. Sometimes we need to
compare the relationship between two different
variables. This is known as correlation.
38
Scatter Plots
A scatter plot can illustrate how one variable
changes relative to another variable. Here
height and weight are shown together.
39
Calculating Correlation
Data may fall within a range or band of values as
shown in the plot below, rather than on a perfect
straight line. We need to be able to quantify
the strength of the relationship between the two
variables in situations like this. We need to
know how close to a straight line our data falls.
40
Correlation Coefficient
Formulae exist which can calculate this.
Pearsons or Spearmans formulae calculate a
value called rho, which can be between -1 and 1.
It is given the symbol r. The higher the value
of r, the stronger the correlation
41
The Correlation Coefficient
The correlation coefficient can be between -1 and
1 The sign ( or -) tells us the direction of
the gradient. The value tells us how close to a
straight line the data falls.
r 1
r -1
r 0
42
Some more examples
r -0.9
r0.3
r -0.6
43
Causal and Non-causal Relationships
It is quite common for two variables to show a
strong correlation. But does this mean that a
change in one causes a change in the other ? The
answer is that correlation does not mean that
there is necessarily a cause and effect. A
strong correlation may be the result of two very
different variables which may have changed for
two very different reasons over a period of time
or under certain circumstances.
44
Example of Non-Causal Correlation
Variable 1 Coronary heart disease exhibits a
winter peak and summer trough in incidence and
mortality, in countries both north and south of
the equator. In England and Wales, the winter
peak accounts for an additional 20,000 deaths per
annum. It is likely that this reflects seasonal
variations in risk factors. Seasonal variations
have been demonstrated in a number of lifestyle
risk factors such a physical activity and diet.
However, a number of studies have also suggested
a direct effect of environmental temperature on
physiological and rheological factors. Pell JP.
Cobbe SM. Seasonal variations in coronary heart
disease QJM. 92(12)689-96, 1999 Dec
45
Variable 2 Ice cream sales vary depending upon
time of year. More ice creams are sold when the
weather is warmer, less when it is colder.
46
If the numerical data from the ice cream company
is plotted against the numerical data for heart
attacks, the result seems to suggest that an
increase in ice cream consumption could cause a
decrease in heart attacks. Of course this is
wrong, however there is an associative
relationship. Environmental temperature has an
effect on both variables
47
Non-Causal Correlation
A strong positive correlation has been
demonstrated between change in number of storks
and human population in European Cities. But
does this prove that babies are delivered by
storks ?
Adapted from Mould, R. (1989) Introductory
medical statistics Bristol Adam Hilger p170
48
Which test?
  • type of data
  • paired or unpaired?
  • normally distributed?
  • associations or differences?
  • Plus..
  • Use flow charts
  • ask supervisor or statistician

49
Questions
  • What stats measures have been used?
  • Are they appropriate and how do you know?
  • Are the stats presented appropriately and in a
    way that enables you to understand?
  • Are appropriate conclusions drawn

50
Finally...
  • statistical significance may not mean clinically
    significant.
  • significance testing relies on probabilities,no
    definitive answer (risk that may wrongly accept
    or reject the Null hypothesis)
  • Make sure that conclusions you draw from your
    results are supported by statistical evidence
  • dont make more of the data than is there
  • small sample size so less power may not get any
    statistically significant findings
Write a Comment
User Comments (0)
About PowerShow.com