Title: Data Analysis
1Data Analysis
2Levels of Measurement
- Nominal Categorical no implied rankings among
the categories. Also includes written
observations and written responses from
qualitative interviews or open-ended survey
questions. - Ordinal Categorical data with implied rankings
or data obtained through respondent ranking of
categories. In some cases, a ranking process may
be set up for a particular variable. - Interval No fixed zero point. Data is
numerical, not categorical. Rank order among
variables is explicit with an equal distance
between points in the data set -2, -1, 0, 1,
2 - Ratio Fixed zero point otherwise the same as
interval.
3In general, type of data can be inferred using
the following the criteria
- Nominal Categorical no implied rankings among
the categories. Also includes written
observations and written responses from
qualitative interviews or open-ended survey
questions. - Ordinal Categorical data with implied rankings
or data obtained through respondent ranking of
categories. In some cases, a ranking process may
be set up for a particular variable. - Interval No fixed zero point. Data is
numerical, not categorical. Rank order among
variables is explicit with an equal distance
between points in the data set -2, -1, 0, 1,
2 - Ratio Fixed zero point otherwise the same as
interval. - Any categorical data is either nominal or
ordinal. - All qualitative data is nominal.
- All scores on standardized scales are either
interval or ratio. (Note almost all the scales
we use in social work, except IQ scores are
ratio). - The level of measurement determines what
statistical method we can use.
4In some cases, we can covert a variable into
another level of measurement
- We can change a variable from ratio to either
ordinal or nominal
5Coverting Data (Use Recode in SPSS)
6Advantages of using ratio data
- We can covert it to another level of data we
cant do this with nominal data. - People can simply write down information about
how they fit a particular attribute (age,
income). - We have more statistical options with ratio data.
Inferential statistics requires that dependent
variables always be ratio.
7Primary types of data analysis are
- Qualitative
- Descriptive. Used to describe the distribution of
a single variable or the relationship between two
nominal variables (mean, frequencies,
cross-tabulation) - Inferential (Used to establish relationships
among variables assumes random sampling and a
normal distribution) - Nonparametric (Used to establish causation for
small samples or data sets that are not normally
distributed)
8Much of what you will use in your research will
be descriptive statistics.
- For example, the most basic type of descriptive
statistic is the frequency. Frequencies are the
number of times a specific value or data within a
specific category occurs. - Most often we convert frequencies to percentages
Formula is f/n, where f frequency and n the
total number of values in a data set. For
example, the if the age 25 occurs 5 times in a
data set of 50 5/50 10.
9Examples of use of frequency data
- 40 of respondents are male.
- The mean level of income was 35,000
- 40 of all female voters cast their vote for
Arnold compared to 52 of the male voters. - Note the other descriptive statistic we use is
the standard deviation. It describes the degree
to which data points vary from the mean of a
distribution. In a research article, you will see
the standard deviation included with the mean.
10Application of Standard Deviation (SD)
- Mean income was 35,000 with SD 5,000
- M 23,000, SD 500
- This is interpreted as there being less
variability in income among members of the second
data set. That is scores are grouped more tightly
around the mean.
11Normal Distribution
- Meanmedianmode
- Bell shape curve
- 50 of scores fall below and 50 fall above the
mean. - Data set can be assessed in terms of how much
data falls within one, two or three standard
deviations from the mean. - Generally is unimodal although some distributions
may be bimodal or trimodal. - Theoretically, at least, inferential statistics
may only be used when a set of scores conform to
a normal distribution. However, this assumption
is often violated.
12Frequencies used in almost all types of data
analysis. Frequency tables can be formatted in a
variety of ways. (Some analysis add value and
cumulative percent)
13We can also use tables to determine if there is a
relationship between two nominal variables,
although we can not assess the strength of the
relationship. This is called a cross-tabulation
14Categories in both Qualitative Analysis must be
- Mutually exclusive (no overlap)
- Exhaustive (all possible categories should be
included)
15Cross-tabulation is the basis for chi-square.
Chi-square
- Measures the strength of the relationship between
the two variables in the table. - Is not technically a inferential statistic does
not require a normal distribution but is often
grouped with inferential statistics. - Usually requires a random sample although data
collected from everyone in a population group is
usually considered sufficient for a chi-square
analysis.
16Means can also be used to make comparisons among
groups.
17You may use means on your project
- If your variables include ratio data
- If you want to compare groups on a ratio variable
- If you want to summarize scores on a standardized
instrument or a likert scale
18Some inferential statistics look at the strength
of the relationship between mean scores on ratio
level variables and membership in particular
demographic group
- T-tests (two group comparisons)
- Analysis of variance (compares three or more
groups) - Answers question Is the difference in means
between the two (or more) groups large enough to
be statistically significant?
19We also use correlations to measure the strength
of a relationship between two variables.
Correlations can only be used
- To assess the strength of two ratio level
variables. - To measure associations rather than cause and
effect relationships. - With data sets in which there are 30 or more
observations.
20Inferential statistics commonly used include
- Independent T-test (compares two groups on one
variable). (Test statistic T) - Paired sampled t-test (compares ratio level
scores on pre and post test data). (Test
statistic T) - ANOVA compares three or more groups on ratio
data (Test statistic F) - Correlation measures the association between
two ratio level variables (Test statistic R) - Regression analysis (dependent ratio variable
can include more than one independent variable
(can be a combination of ratio, ordinal, and
nominal data in the regression model). (Test
statistic is R2, F, or partial correlation
coefficients)
21Inferential Statistics require that we assess the
probability that there is actually a causal
relationship between two variables.
- We state the research null hypotheses.
- State the degree to which we will risk being
wrong about whether or not a relationship
actually exists between two variables (level of
significance usually under .10) - Choose an appropriate statistical test and
compute it. - Compare the probability level on your computer
print out to the level of significance. If the p.
value is lower than your confidence level, then
reject the null hypothesis. If the p value is
higher than the confidence level, accept the null
hypothesis.
22For example
- There is a positive relationship between scores
on the self-esteem scale and depression. Level of
significance is .05. R .75, p .01. Reject
Null Hypothesis and accept the Research
Hypothesis. - Women will have higher test scores than men.
Level of significance .10. T .30, p. .60.
Accept the Null Hypothesis and Reject the
Research Hypothesis.
23Other info
- Chi-square is interpreted in the same way as
inferential statistics. - Most statistics books contain tables that let you
determine p values if you calculate test
statistics by hand. - SPSS print outs always contain p values for
inferential statistics. - Theoretical assumptions are often violated in
research articles. - Sample size determines if a relationship between
two or more variables is large enough to be
statistically significant. - Relationships between two variables can be either
positive or negative. High positive relationships
are close to 1.00 and high negative
relationships are close to 1.00.