Title: Sociology 690
1Sociology 690 Data Analysis
- Simple Quantitative
- Data Analysis
2Four Issues in Describing Quantity
- 1. Grouping/Graphing Quantitative Data
- 2. Describing Central Tendency
- 3. Describing Variation
- 4. Describing Co-variation
31. Grouping Quantitative Data
If there are a large number of quantitative
scores, one would not simply create a raw score
frequency distribution, as that would contain too
many unique scores and, therefore, not fulfill
the data reduction goal.
- Intervals and Real Limits
- Widths and midpoints
- Graphing grouped data
4Grouping Data - Intervals
- To group quantitative data, three rules are
followed - 1. Make the intervals no greater than the most
amount of information you are willing to lose. - 2. Make the intervals in multiples of five.
- 3. Make the distribution intervals few enough to
be internalized at a glance.
5Grouping Data Intervals Example
- If these are the scores on a midterm
- 9,13,18,19,22,25,31,34,35,36,36,38,41,43,44,45
- The corresponding grouped frequency distribution
would look like - i fi
- 01-10 1
- 11-20 3
- 21-30 2
- 31-40 6
- 41-50 4
- Total 16
6Grouping Data - Real Limits
- This implies the need for real limits as there
are gaps in these intervals. The real limits
of an interval are characterized by numbers that
are plus and minus one-half unit on each side of
stated limits - For example
- the interval 11-20 becomes 10.5 20.5
- the interval 3.5 4.5 becomes 3.45 4.55
7 Grouped Data Width and Midpoint
- The width of an interval is simply the difference
between the upper and lower real limits. - e.g. 11-20 ? 20.5 10.5 10
- The midpoint is determined by calculating the
interval width, dividing it by 2, and adding that
number to the lower real limit. - e.g. 10/2 10.5 15.5
8Graphing Grouped Data
- A Quantitative version of a bar graph is called
an Histogram - When the frequencies are connected via a
line, it is call a frequency polygon
92. Describing Central Tendency
But we can do more than simply create a frequency
distribution. We can also describe how these
observations bunch up and how they
distribute. Describing how they bunch up
involves measures of
10Central Tendency - Modes
- The mode for raw data is simply the most frequent
score e.g. 2,3,5,6,6,8. The mode is 6. - The mode for grouped data is the midpoint of the
interval containing the highest frequency
(35.5 here)
i fi 01-10
1 11-20 3 21-30 2 31-40
6 41-50 4 Total 16
11Central Tendency - Medians
- The median for raw data is simply the score
at the middle position. This involves taking the
(N1)/2 position and stating the associated value
attached to it - e.g. 2,3,5,6,8 (51)/2 ? the third position
score - The third position score is 5.
- e.g. 2,3,5,8 (41)/2 ? the 2.5 position
score - The 2.5 position score is (35)/2 4
12Medians for Grouped Data
- The median for grouped data is
- For our previous distribution of scores,
- the answer would be
- 30.5 ((16/2-6)/6)10
- 30.5 3.33 33.83
i fi 01-10
1 11-20 3 21-30 2 31-40
6 41-50 4 Total 16
13Central Tendency - Mean
- For raw data, the mean is simply the sum of the
values divided by N - Suppose Xi 2,3,5,6
- The mean would be 16/4 4
14Means for Grouped Data
- For grouped data, the mean would be the sum of
the frequencies times midpoints for each
interval, that sum divided by N - For our previous distribution,
- the answer would be
- i fi
- 01-10 1
- 11-20 3
1(5.5)3(15.5)2(25.5)6(35.5) - 21-30 2 4(45.5)
498 / 16 31.125 - 31-40 6
- 41-50 4
- Total 16
153. Describing Variation
- Range
- Mean Deviation
- Variance
- Standard Scores (Z score)
16Describing Variation - Range
- The Range for raw scores is the highest minus the
lowest score, plus one (i.e. inclusive) - The Range for grouped scores is the upper real
limit of the highest interval minus the lower
real limit of the lowest interval. In the case
of our - previous distribution this would be
- 50.5 - .5 50
i fi 01-10
1 11-20 3 21-30 2 31-40
6 41-50 4 Total 16
17Describing Variation Mean Deviation
- The mean deviation is the sum of all deviations,
in absolute numbers, divided by N. - Consider the set of observations, 6,7,9,10 The
mean is 8 and the MD is (6-87-89-810-8)
/4 6/4 1.5
18 Mean Deviation for Grouped Data
- Again grouped data implies we substitute
frequencies and midpoints for values
The mean would be 50,000 (satisfy yourself that
that is true) and the MD would be (638-50)
(843-50) (1248-50) (1253-50)
(858-50) (463-50) 725624366452
304/50 6.080 x 1000 6,080
19Variation The Variance
- The variance for raw data is the sum of the
squared deviations divided by N - Consider the set Xi 6,7,9,10 The mean is 8
and the variance is ((6-8)2(7-8)2(9-8)2(10-8)2)
/4 2.5
20Variance for Grouped Data
- Frequencies and midpoints are still
substituted for the values of Xi.
Again the mean is 50 and the Variance is
6(38-50)2 8(43-50)2 12(48-50)2 12 (53-50)2
8(58-50)2 4(63-60)2 1014 392 48 108
512 676 2690 / 50 53.8 x 1000 53,800.
The Standard Deviation is the sq root of this.
214. Covariance and Correlation
- The Definition and Concept
- The Formula
- Proportional Reduction in Error and r2
22Correlation Definition and Concept
- Visually we can observe the co-variation of
two variables as a scatter diagram where the
abscissa and ordinate are the quantitative
continua and the points are simultaneously
mapping of the pairs of scores.
23Correlation - Formula
- Think of the correlation as a proportional
measure of the relationship between two
variables. It consists of the co-variation
divided by the average variation
24Correlation and P.R.E.
Consider this scatter diagram. The proportion of
variation around the Y mean (variation before
knowing X), less the proportion of variation
around the regression line (variation after
knowing x) is r2
25Partial Correlation
IV. Quantitative Statistical Example of
Elaboration
Step 1 Construct the zero order
Pearsons correlations (r).
Assume rxy .55 where x divorce rates and y
suicide rates.
Further, assume that unemployment rates (z) is
our control variable and that rxz .60 and ryz
.40
Step 2 Calculate the partial correlation
(rxy.z)
.42
Therefore, Z accounts for (.30-.18) or 12 of Y
and (.12/.30) or 40 of the relationship between
XY
Before z (rxy)2 .30
Step 3 Draw conclusions
After z (rxy.z)2 .18