Quantitative Data Analysis: Univariate (cont - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Quantitative Data Analysis: Univariate (cont

Description:

... Courses:11-12:CMNS260:Images:Univariate Bivariate Table.doc!OLE_LINK16 Macintosh HD:Users:Jan:Documents:Courses:11-12:CMNS260:Images: ... – PowerPoint PPT presentation

Number of Views:129

Avg rating:3.0/5.0

Slides: 43

Provided by: sfuCacmn

Category:

more less

Transcript and Presenter's Notes

Title: Quantitative Data Analysis: Univariate (cont

1
Quantitative Data Analysis Univariate (contd)
Bivariate Statistics

Neuman and Robson Chapter 11.

Research Data library at SFU http//www.sfu.ca/rd
l/
2
Class Session Activities

Quiz 2
More on Univariate Statistics
Begin Bivariate Statistics
If time
Hans Rosling on Using Empirical Research to
Understand World Change
http//www.youtube.com/watch?vhVimVzgtD6w
Hans Rosling Let my data set change your mind
set
http//www.youtube.com/watch?vKVhWqwnZ1eMfeature
related

3
Recall Univariate Statistics

Frequency distributions explore each variable in
a data set, separately to see the pattern of
responses
Measures of central tendency of the values (mean,
median, mode)
Measure of variation or variation (range,
percentile, standard deviation, z-scores)

4
Studying Frequency Distributions

Raw Data Obtain a printout of the raw data
for all the variables.
resembles a matrix, with the variable names
heading the columns, and the information for each
case or record displayed across the rows.
Source (for next examples) http//www.csulb.edu/
msaintg/ppa696/696uni.htm

5
Example Raw data for a study of injuries among
county workers (first 10 cases)

Raw data is difficult to grasp, especially with
large number of cases or records.

6
To present the information in a more organized
format, start with univariate descriptive
statistics for each variable. Example The
variable Severity of Injury
7
Frequency Distribution for Severity of Injury

Obtain a frequency distribution of the data for
the variable.
Identify the lowest and highest values of the
variable,
Put all the values of the variable in order from
lowest to highest.
count the number of appearance of each value of
the variable. This is a count of the frequency
with which each value occurs in the data set.

8
Grouped Data

Decide on whether the data should be grouped into
classes.
Example The severity of injury ratings can be
collapsed into just a few categories or groups.
Grouped data usually has from 3 to 7 groups.
There should be no groups with a frequency of
zero (in this example, there are no injuries with
a severity rating of 7 or 8).
Ways to construct groups
equal class intervals (e.g., 1-3, 4-6, 7-9).
Approximately equal numbers of observations in
each group.
Remember that class intervals must be both
mutually exclusive and exhaustive.

9
Caution Grouping Response Categories

To make new categories
Facilitate analysis of trends
But decisions have effects on the interpretation
of patterns

10
Cumulative Frequency Distributions

include a third column in the table (this can be
done with either simple frequency distributions
or with grouped data
How many injuries were at level 5 or lower?
Answer7

11
Percentaged Frequency Distributions

Frequencies can also be presented in the form
of percentage distributions and cumulative
percentagescumulative percentages

12
Why Graph?

way of visually presenting data
present the data
summarize the data
enhance textual descriptions
describe and explore the data
make comparisons easy
avoid distortion
provoke thought about the data

13
Bar Graphs (Bar Charts)

to display frequency distributions for variables
measured at the nominal ordinal levels.
use the same width for all the bars with space
between bars.
label the parts of the graph, including the
title, the left (Y) or vertical axis, the right
(X) or horizontal axis, and the bar labels.

14
Another Bar Graph
15
Histograms

for interval and ratio level variables
width of the bar is important, since it is the
total area under the bar that represents the
proportion of the phenomenon accounted for by
each category
bars convey the relationship of one group or
class of the variable to the other(s).

16
Histogram example

In the case of the counties employee injuries,
we might have information on the rate of injury
according to the number of workers in each county
in State X.

17
Grouping Categories (Histograms)

If we group injury rates into three groups
low rate of injury would be 0.0-1.9 injuries per
1,000 workers
moderate would be 2.0-3.9
high would be 4.0 and above (in this case, up to
5.9).

18
Frequency Polygon

another way of displaying information for an
interval or ratio level variable.
also used to show time series graphs, or the
changes in rates over time.

19
Graph of Frequency Distribution (Univariate)
20
Pie Chart

Another way to show the relationships between
classes or categories
each "slice" represents the proportion of the
total phenomenon that is due to each of the
classes or groups.

21
Another visual representation of a distributions
Pie charts
22
Bivariate Statistics (relations between 2
variables)

After examining univariate frequency distribution
of the values of each variable separately,
To study joint occurrence distribution of the
values of the independent and dependent variable
together.
The joint distribution of two variables is called
a bivariate distribution.

23
Contingency Tables (Cross-tabulations)

A contingency table shows the frequency
distribution of the values of the dependent
variable, given the occurrence of the values of
the independent variable.
Both variables must be grouped into a finite
number of categories (usually no more than 2 or 3
categories) such as low, medium, or high
positive, neutral, or negative male or female
etc.

24
Features of Contingency Table

Title
Categories of the Independent Variable head the
tops of the columns
Categories of the Dependent Variable label the
rows
Order categories of the two variables from lowest
to highest (from left to right across the
columns from top to bottom along the rows).
(Usually but not always).
Show totals at the foot of the columns

25
Basic Terminology (Tables)

Parts of a Table
title (conventions)
Order of naming of variables
Dependent, independent, control
body, cell, column, row
marginals
sources, date

26
Bivariate Statistics Parts of the Table
27
Constructing a Contingency Table

if the variables not divided into categories,
decide on how to group the data.
obtain a frequency distribution for the values of
the independent variable
obtain a frequency distribution for the values of
the dependent variable
obtain the frequency distribution of the values
of the dependent variable, given the values of
the independent variable (either by tabulating
the raw data, or from a computer program
display the results of step 4 in a table

28
Table 1. Attitudes toward Consolidation by Area
of Residence

Interpreting a Contingency Table
Inspect the contingency table for patterns.
(difficult if there are different totals of
observations in the different categories of the
independent variable)

29
Interpreting a Contingency Table

Convert the observations in each cell to a
percentage of the column total
be sure to still show the total number of
observations for each column on which the
percentages are based. (N total number per
column)
Compare the percentages across the categories of
the dependent variable (the rows).

30
Percentaged Contingency Table (example)Table 1b
Attitudes toward Consolidation by Area of
Residence
31
Interpreting a Contingency TableTable 1.
Attitudes toward Consolidation by Area of
Residence

more city residents (54) than non-city residents
(37) are for consolidation. Conversely, more
non-city residents (39) than city residents
(19) are against consolidation. About the same
percentage of both groups have no opinion about

Description More city residents (54) than
non-city residents (37) are for consolidation.
Conversely, more non-city residents (39) than
city residents (19) are against consolidation.
About the same percentage of both groups have no
opinion about consolidation.
32
Grouping categories (Collapsing categories) U.N.
example
Babbie, E. (1995). The practice of social
research Belmont, CA Wadsworth
33
Collapsing Categories omitting missing data
Babbie, E. (1995). The practice of social
research Belmont, CA Wadsworth
34
Types of Relationships or Associations between
two variables

Correlation (or covariation)
when two variables vary together
a type of association
Not necessarily causal
Can be same direction (positive correlation or
direct relationship)
Can be in different directions (negative
correlation or indirect relationship)
Independence
No correlation, no relationship
Cases with values in one variable do not have any
particular value on the other variable

35
What is an association between two variables?

Can the value of one variable be predicted, if we
know the value of the other variable?
Example half the people participating in
training programs get a job. What is the
likelihood of any one participant getting a job?
About fifty-fifty. So we would not be very good
at predicting whether people will get jobs or
not.
If we introduce a second variable (i.e. length of
time in training), does it help us to be more
accurate in our predictions of the likelihood
that someone will get a job?

36
Two variables

Dependent variable Obtaining a Job No job100
Gets a job100
Independent Variable Length of Training Program
Short100 Long100

37
Bivariate Distribution--Perfect Positive
Relationship(If training is good for getting a
job)

If we know the length of the training program,
we can perfectly predict the likelihood of
getting a job. The longer the training program,
the more likely the participant is to get a job
and, conversely, the shorter the training program
the less likely the participant is to get a job.

38
Bivariate Distribution--Perfect Inverse
Relationship

If we know the length of the training program, we
can perfectly predict the likelihood of getting a
job. The longer the training program, the less
likely the participant is to get a job and,
conversely, the shorter the training program the
more likely the participant is to get a job. That
is, as the training program length increases,
likelihood of obtaining a job decreases.

39
Bivariate Distribution--No Relationship

(If training has no relationship with getting a
job)

50/50 guess. Knowing the length of the training
program does not help to predict the likelihood
of getting a job.
40
Techniques for examining relationships between
two variables

Cross-tabulations or percentaged tables
Graphs, scattergrams or plots
Measures of association (e.g. correlation
coeficient, etc.)

41
Scattergram (Bivariate)
42
Interpreting a Relationship between two variables

Do the patterns in the tables mean that there is
a relationship between the two variables (in
example area of residence and attitude toward
consolidation)?
Is one's attitude about consolidation associated
with one's area of residence?
If there is a relationship, how strong is it? Are
the results statistically significant? Are the
results meaningfully significant?
In order to answer these questions, we must turn
to a set of statistics called Measures of
Association (next day).