Title: Crosstabulation
1Crosstabulation Plotting Data
- EDLD 6333 Statistical Reasoning
2Crosstabulation
- For use with 2 nominal variables, or variables
with SMALL of possible values - To look at the RELATIONSHIP b/w two variables w/
few possible values - A Crosstabulation is MORE than simply two
frequency tables - Examines combinations of values
3Income and Job Satisfaction
- Gssft.sav data file
- The variable income4 divides income into
quartiles (only 4 groups) - The variable jobsat has four categories
- To view two frequency tables tells you nothing
about the relationship b/w the two variables
4A Crosstabulation
ANALYZE?DESRIPTIVE STATISTICS?CROSSTABS
In this case, satjob is the DV and income is the
IV. When selecting variables for a Crosstabs,
remember they must both be CATEGORICAL.
5Cell Display Subdialog Box
You should ask for percentages in the direction
of the IV (if you can tell which is IV and DV).
In this case, we need the Column , b/c income is
the IV.
IV thought to influence another variable DV
the variable influenced
6Crosstabs Output
Read down columns - s add to 100. The
expresses the number of cases in each cell as a
of column total. The TOTAL row and TOTAL column
display same values as a frequency table would.
It appears the lowest income people are less
likely than avg. to be very satisfied.
7Graphical Displays
Compare bar lengths within cluster and see if
patterns are same across clusters.
8Stacked Bar with Scale
In chart editor window select GALLERY?BAR?STACKED
Then under OPTIONS, make changes as shown below.
Comparisons are now made easier.
9Control Variables
- Adding another variable into the analysis might
change the relationship you find - Control variable removing the effect of another
variable - In this example, we are controlling out gender
10Jobsat by income4 by gender
Sex is the layer variable. Income4 is still
the IV, so ask for column percentages. Here we
are looking at IV income and DV satjob,
controlling for gender.
11What did we find?
For women more than men, jobsat increases with
income.
12Crosstabs Summary
- Looking at relationships b/w variables with small
number of possible values (categorical) - Number of cases in a cell can be expressed as a
(in direction of IV) - The variable that is influenced is DV
- Layer variables control out the effects of other
variables - Bar charts are useful displays for categorical
variables - Later in the semester, well do tests of sig for
such relationships
13Plotting Data
- To look at relationships between TWO numeric
(interval/ratio) variables - Graphical display shows values of two numeric
variables - Scatterplot, Sunflower Plots
- Scatterplot Matrix, Overlay Plots
- 3-D Scatterplot
- Bar charts, histograms, etc. displayed single
variables only (across groups)
14Why You Plot Data
- Before you do any statistical tests (like the
ones well learn later) you should always plot
the data first - Scatterplots look for relationships and patterns
between TWO variables - For the following examples, well use the
country.sav data file
15Life Expectancy Birthrate
For a Simple scatterplot Y AXIS vertical
should be DV X AXIS horizontal should be
IV You can give the scatterplot a title by
selecting Title.
16Resulting Scatterplot
PATTERN As birthrate increases (see values
along X Axis), life expectancy decreases (values
along Y Axis). NEGATIVE RELATIONSHIP Dont know
if sig yet and not necessarily causal!
17Scatterplot Points
- We are looking for patterns
- Can visually see linear relationships
- From upper left to bottom right NEG
- From bottom left to upper right POS
- We are looking at COMBINATIONS of values
(stemleaf/histogram only look at individual
values) - You can label points by country
18Controlling Out Development
Here we are controlling out the effects of some
variable. You get a plot of 2 variables with a
3rd variable used to classify. Ea. Country still
only appears once.
19Scatterplot Controlled for Dev.
Most of the developed nations cluster in the
upper left corner of the plot. You can see a
clear difference in the pattern for developing
vs. developed countries.
20Sunflower Plots
These are used to show the density of points
whether there are overlapping or nearly
overlapping points. A visual representation of
how points cluster.
21Scatterplot Matrices
To see how variables relate to another
variable. Scatterplots for all possible pairs of
variables.
22Resulting Matrix
You must look across, and up/down to find the
variable pairing.
Each cell is a scatterplot of a pair of
variables. Split this display diagonally, and the
same plots (w/ variables flipped) are on each
side. Strongest relationship is tightest grouping
birthrate/life exp.
23Reading the Results
- Scan across a row or down an entire column
- Look up/down variable on horizontal axis
- Look right/left variable on vertical axis
- The strongest relationship was negative
- As birthrate decreases, the life expectancy
increases. The birthrate also decreases w/
increasing urbanization, but not as strongly - Life exp. urbanization are positively related
- As urbanization increases, so does life exp.
24Overlay Plot
To see 2 pairs of variables. Each country
represented twice.
Select pairs of variables to create overlay
scatterplots.
25Resulting Overlay Plot
W/ Lowess smooths
Variables in an overlay plot must be measured on
the same scale to make sense. Here you can see
that both death rates and birthrates decrease as
urbanization increases. Birthrates decline more
steeply.
263-D Scatterplot
A 3-D Plot will show the values of three
variables simultaneously. There are X, Y, and Z
Axes. Points are positioned w/in a 3-dimensional
box.
27Resulting 3-D Scatterplot
You have to read point values off of three planes
now, so its tricky. The relationship b/w three
variables is presented.
You could also insert spikes, or control by
status again.
28Identifying Unusual Points
- The data value for Bhutan stands out far from the
rest - Using the Point Selection Tool, select the point
and view it in the Data Editor - You realize that urban MUST be wrong
- Norusis gives directions at the end of chapter 9
(p. 179) for how to change the data value for
Urban from 95 to 5 Make sure to change this
in your data file and save your changes.
29Where We Are
- Weve still just been looking at data
- Weve looked at categorical variables and
continuous numerical variables - We will learn tests that will quantify the
significance and strength of relationships b/w
variables in future chapters - How can we use this w/ education data?
30Homework