Title: Quantitative Data Analysis
1Quantitative Data Analysis
- JN602
- Week 10
- Veal Ch 13 14, SLT Chapter 11
2Objectives
- edit questionnaire and interview responses
- set up the coding key for the data set and code
the data - categorise data and create a data file
- use SPSS, Excel or other software programs for
data entry and data analysis - get a feel for the data using univariate
analysis - test the goodness of data
- statistically test each hypothesis using
bivariate analysis - interpret the computer results and prepare
recommendations based on the quantitative data
analysis
3Quantitative Data Analysis Process
- Data Preparation
- Data Cleaning
- Familiarisation Frequencies, Means, Recoding
- Data Analysis Crosstabs, Statistics
- Answering research questions
- Graphics
- Interpretation
- Discussion and recommendations
4Survey Analysis Overview
5Data Preparation
- Getting Data Ready for Analysis
- Editing data
- Handling blank responses
- Coding
- Categorising
- Entering data
- Cleaning
6Errors in the analysis process
- Recording errors
- Misreading of questionnaire
- Multiple responses
- Entry errors
- Deliberate
- Accidental keying errors misreading of
responses
7Entering data
- Enter data from answer sheets directly into
computer - Enter raw data through any software programme, eg
SPSS Data Editor, Excel, text programme - Assign meaningful names to columns
- Save regularly
8Cleaning data
- Possible code cleaning
- Check that the distribution of the item is within
the possible range of responses - If possible, computer program should not permit
invalid entries - Contingency cleaning
- Cleaning based on prior responses
- E.g. males should not have responses regarding
giving birth
9Data entrySPSS Variables specification
- For each variable in the questionnaire, specify
- Name
- Type numeric or string
- Width max. no. of characters
- Decimal places
- Label longer version of name
- Values
- Missing blanks, no answer, etc. see note
- Columns in Data View
- Alignment left, right, centre
- Measure/data type nominal, ordinal, scale see
note
10A note on Measure/Data type
- Nominal data non-quantitative data even if
numerical codes are used, data cannot be added,
multiplied etc. - Ordinal data ranks 1, 2, 3 etc. first,
second, third etc. - Scale data fully numerical can be added,
multiplied, etc. - The type of data has implications for types of
analysis which can be undertaken for individual
variables
11A note on Missing values
- If a no value response is entered for a variable
(ie. blank), SPSS treats this as a Missing
value - Not included in percentages etc.
- You can specify other values as Missing
- eg. 0 could be specified as No answer or Not
applicable
12Introduction to SPSS
- SPSS uses two windows
- Variable View window
- Data View window
- User can toggle between the two windows using
the tabs at the bottom of the screen
13SPSS Variable View window
14SPSS Data View window
15Completed Variable View window
16Analysing Questionnaire Survey Data
- Types of research and approaches to analysis
- Starting an SPSS analysis session
- Analysis procedures
- Frequencies one variable
- Frequencies multiple variables
- Missing values
- Analysis procedures (continued)
- Checking for errors
- Multiple response
- Recode
- Means
- Attitude/Likert scales
- Crosstabulation
- Weighting
- Graphics
17Types of research and approaches to analysis
Research type SPSS procedures
Descriptive research Frequencies Means Graphics
Explanatory research Cross tabs Regression (Ch. 14) Graphics
Evaluative research Comparisons using frequencies means
18Starting an SPSS analysis session
- Click on SPSS icon to start session OR select
START, then PROGRAMS then SPSS - Select file from recently used files dialog box
or select MORE FILES and locate file, OR - Select FILE from menu bar, then OPEN, select
FILES OF TYPE SPSS (.sav), then locate your file. - Variable View and Data View windows should
appear.
19The statistics approach
- Concepts/terms/ideas used in statistics
- Forms of analysis
- Measures of central tendency and dispersion
- The idea of probabilistic statements
- The normal distribution
- Probabilistic statement formats
- Significance
- The null hypothesis
- Dependent and independent variables
20Forms of quantitative analysis
- Univariate - simplest form,describe a case in
terms of a single variable. - Bivariate - subgroup comparisons, describe a case
in terms of two variables simultaneously. - Multivariate - analysis of two or more variables
simultaneously.
21Probabilistic statements
- It is only possible to estimate the probability
that results obtained from a sample are true of
the population therefore statements on findings
are probabilities.
Nature of statement Unqualified format Probabilistic format
Descriptive 10 per cent of managers use Macs We can be 95 per cent confident that the proportion of managers who use Macs is between 9 and 11.
Comparative 10 per cent of managers use Macs compared with 90 per cent who use PCs. The proportion of PC users is significantly higher than the proportion of Mac users (at the 95 per cent level of probability)
Relational People with high incomes use Macs more than people with low incomes There is a positive relationship between level of income and use of Mac computers (at the 95 per cent level of probability).
22Basis of probabilistic statements
- Probability is based on the idea of drawing many
random samples - Most results would be close to the population
value - Some would be larger or smaller
- A few would be very much larger or smaller
- This distribution can be estimated using
statistical theory - See Figure 14.1 bell-shaped Normal
distribution
23Fig. 14.1 Drawing repeated samples
24Probabilistic statement formats
- So far we have used 95 probability
- this is sometimes expressed as 5
- and sometimes expressed as 0.05
- 99 probability is also used
- also expressed 1 or 0.01
- 99.9 probability is occasionally used
- Also expressed as 0.1 or 0.001
- Note particularly in correlation and ANOVA output
25Significance
- A finding which is unlikely to have happened by
chance (ie. is highly probable) is described as
significant - Denoted by the probability of it occuring by
chance (e.g. 0.05, 0.01, 0.001) - The larger the sample the greater the likelihood
that a finding will be significant - But NB small differences or weak relationships
may not be socially or managerially significant
even when they are statistically significant
26Univariate Analysis
- Describing a case in terms of the distribution of
attributes that comprise it. - Examples course of study, sex, age
- Goals
- Provide reader with the fullest degree of detail
regarding the data. - Present data in a manageable form.
27Measures of central tendency and dispersion
- Central tendency
- The mean is the sum of scores in a distribution
divided by the number of scores. - The mode is the most frequent score in a
distribution. - The median is the mid-point or mid-score in a
distribution - Dispersion
- The range is the highest score in a distribution
minus the lowest score in the same distribution. - The variance is the mean of the squared
deviation scores about the mean of a
distribution. - The standard deviation is the square root of the
variance
28Descriptive statistics
29Frequency tables
- For presentation of CATEGORICAL data
- Nominal or ordinal responses
- Eg. Day of week, sex
- Present the distribution of a small number of
categories
30Day of week
31Chart of frequencies
32Bivariate Analysis
- Describe a case in terms of two variables
simultaneously. - Aim is to test the relationship between the
independent (explanatory) variable and the
dependent variable - Example
- Gender
- Amount of exercise
33Fig. 14.2 Dependent independent variables
Does this look familiar?
34Null hypothesis
- Setting up two mutually incompatible hypotheses
- if one is true the other must be false
- The null hypothesis and the alternative
hypothesis - H0 Null hypothesis there is no
difference/relationship - H1 Alternative hypothesis there is a
difference/relationship
35Fig. 14.3a What tests?
Task Data Format of vars Types of variable Test
Relationship between two variables Crosstabulation of frequencies/ counts 2 Nominal Chi-square
Difference between two means - paired Means - whole sample 2 Two scale/ ordinal t-test paired
Difference between two means - independent samples Means - two sub-groups 2 1.scale/ ordinal (means) 2. nominal (2 grps) t-test - independent samples
Relationship between two variables Means 3 or more sub-groups 2 1.scale/ ordinal (means) 2. nominal (3 or more groups) One-way ANOVA
36Fig. 14.3b What tests?
Task Data Format vars Types of variable Test
Relationship between three or more variables Means -Crosstabulated 3 1.scale/ ordinal (means) 2. Two or more nominal Factorial ANOVA
Relationship between two variables Individual measures 2 Scale or ordinal (2) Correlation
Linear relationship between two vars Individual measures 2 Scale or ordinal (2) Linear regression
Linear relationship between 3 vars Individual measures 3 Scale or ordinal (3) Multiple regression
Relationships between large numbers of vars Individual measures Many Large numbers of scale/ordinal vars Factor/Cluster analysis
37Data file
- To demonstrate SPSS statistical procedures
- Data from student background survey
- Data from online diary survey
- PDA survey data available next week
38Chi-square
- Testing the relationship between two variables
presented in a frequency crosstabulation. - Null/alternative hypotheses
- H0 - there is no relationship between exercise
activity and gender in the population - H1 - there is a relationship between exercise
activity and gender in the population. - ?SPSS - procedures p. 260 - Figure 14.4
39Fig. 14.6 Chi-square distribution
40Interpreting Chi-square output - 1
- Degrees of freedom
- (Number of rows -1) x (Number of columns -1)
- Expected counts rule
- Expected count cell frequency if there was no
relationship at all between the variables - Should be no more than one fifth of cells with
expected counts of less than 5 - Should be no cells with expected count of less
than 1 - If rule is violated try combining rows or
columns - Presentation of Chi-square results See Fig. 14.7
41Interpreting Chi-square output - 2
- Value of chi-square
- If value is in the 5 zone (ie. Probability is
less that .05) it is an unlikely value and Null
Hypothesis is rejected. - Value is 6.588 and probability is 0.037 or 3.7,
so Null Hypothesis is rejected - there is a significant difference in enrolment
pattern between men and women. - Presentation of Chi-square results See Fig. 14.7
42Chi-squared output
43Comparing two means t-test
- Situation 1 two variables applying to all
members of the sample - Eg. Compare time spent on exercise and time spent
on study - Paired samples t-test
- Situation 2 sample is divided in two
- Eg. Compare average happiness levels in different
activities - Independent samples t-test
44Compare two means t-test t distribution
45Compare 2 means Independent samples t-test
46Compare 2 means Independent samples t-test
- Reading t-tests
- Example 1 Enjoyment and happiness by activity
- Happiness in class mean 2.53, at work 2.73
- H0 Null hypothesis there is no difference
between these two - t value -0.712 Probability 0.478 (which is
gt 0.05) - Accept the null hypothesis there is no
significant difference
47Compare 3 means One-way Analysis of Variance
(ANOVA)
- Comparing a range of means see Fig. 14.11
- ?SPSS see procedure pp. 243-44
- H0 Null hypothesis each of the group means is
equal to the overall mean - H1 Alternative hypothesis there is a difference
between group means
48One-way Analysis of Variance (ANOVA)
49One-way Analysis of Variance (ANOVA) the idea of
Variance
50 One-way Analysis of Variance (ANOVA)
- ?SPSS - procedure p. 271 see Fig. 14.13
51One-way Analysis of Variance (ANOVA)
- Reading Fig. 14.13
- Example 1 exp. on books x crse
- Means as shown in Fig. 14.11
- H0 Null Hypothesis means are not different from
the overall mean - F-ratio value 1.231 Probability 0.301 (gt 0.05)
- Example 2 income x crse
- Accept null hypothesis means are not
significantly different - H0 Null Hypothesis means are not different from
the overall mean - F-ratio value 3.607 probability 0.035 (lt 0.05)
- Reject null hypothesis the means are
significantly different
52Correlation
- Correlation measures the relationship between two
scale/ordinal variables - The correlation coefficient ( r ) ranges from -1
to 1 - See Fig. 14.16
- Based on summing the (squared) distances of
observations from the mean see Fig. 14.17
53Correlation Fairly strong positive (Fig.
14.16a)
54Correlation strong negative (Fig. 14.16b)
55Correlation (almost) zero (Fig. 14.16c)
56Correlation very strong positive (Fig. 14.16d)
57Correlation Fig. 14.17
58Correlation matrix Fig. 14.18
- See?SPSS - procedure p. 276 see Fig. 14.18
- Reading Fig. 14.18
- Correlation between Income Age Attendance at
prof. conferences Exp. on books and Use of
Internet - Null hypothesis H0 for each pair correlation is
zero - Eg. Income vs Age r 0.917 Sig. 0.000 (lt
0,05) reject null hypothesis correlation is
positive - Eg. Income vs use of internet r 0.049 Sig.
0.735 (gt 0.05) accept null hypothesis
correlation is not significantly different from
zero
59Conclusion
- This gives us a basic understanding of how to
conduct data analysis - Basis for making initial decisions regarding
further investigation - More complex analysis required if you need to
make business decisions