Sociology 690 - PowerPoint PPT Presentation

About This Presentation

Title:

Sociology 690

Description:

Sociology 690 Data Analysis Simple Quantitative Data Analysis Four Issues in Describing Quantity 1. Grouping/Graphing Quantitative Data 2. – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 26

Provided by: Jeraldf4

Learn more at: http://www.csun.edu

Category:

more less

Transcript and Presenter's Notes

Title: Sociology 690

1
Sociology 690 Data Analysis

Simple Quantitative
Data Analysis

2
Four Issues in Describing Quantity

1. Grouping/Graphing Quantitative Data
2. Describing Central Tendency
3. Describing Variation
4. Describing Co-variation

3
1. Grouping Quantitative Data
If there are a large number of quantitative
scores, one would not simply create a raw score
frequency distribution, as that would contain too
many unique scores and, therefore, not fulfill
the data reduction goal.

Intervals and Real Limits
Widths and midpoints
Graphing grouped data

4
Grouping Data - Intervals

To group quantitative data, three rules are
followed
1. Make the intervals no greater than the most
amount of information you are willing to lose.
2. Make the intervals in multiples of five.
3. Make the distribution intervals few enough to
be internalized at a glance.

5
Grouping Data Intervals Example

If these are the scores on a midterm
9,13,18,19,22,25,31,34,35,36,36,38,41,43,44,45
The corresponding grouped frequency distribution
would look like
i fi
01-10 1
11-20 3
21-30 2
31-40 6
41-50 4
Total 16

6
Grouping Data - Real Limits

This implies the need for real limits as there
are gaps in these intervals. The real limits
of an interval are characterized by numbers that
are plus and minus one-half unit on each side of
stated limits
For example
the interval 11-20 becomes 10.5 20.5
the interval 3.5 4.5 becomes 3.45 4.55

7
Grouped Data Width and Midpoint

The width of an interval is simply the difference
between the upper and lower real limits.
e.g. 11-20 ? 20.5 10.5 10
The midpoint is determined by calculating the
interval width, dividing it by 2, and adding that
number to the lower real limit.
e.g. 10/2 10.5 15.5

8
Graphing Grouped Data

A Quantitative version of a bar graph is called
an Histogram
When the frequencies are connected via a
line, it is call a frequency polygon

9
2. Describing Central Tendency
But we can do more than simply create a frequency
distribution. We can also describe how these
observations bunch up and how they
distribute. Describing how they bunch up
involves measures of

Modes
Medians
Means
Skew

10
Central Tendency - Modes

The mode for raw data is simply the most frequent
score e.g. 2,3,5,6,6,8. The mode is 6.
The mode for grouped data is the midpoint of the
interval containing the highest frequency
(35.5 here)

i fi 01-10
1 11-20 3 21-30 2 31-40
6 41-50 4 Total 16
11
Central Tendency - Medians

The median for raw data is simply the score
at the middle position. This involves taking the
(N1)/2 position and stating the associated value
attached to it
e.g. 2,3,5,6,8 (51)/2 ? the third position
score
The third position score is 5.
e.g. 2,3,5,8 (41)/2 ? the 2.5 position
score
The 2.5 position score is (35)/2 4

12
Medians for Grouped Data

The median for grouped data is
For our previous distribution of scores,
the answer would be
30.5 ((16/2-6)/6)10
30.5 3.33 33.83

i fi 01-10
1 11-20 3 21-30 2 31-40
6 41-50 4 Total 16
13
Central Tendency - Mean

For raw data, the mean is simply the sum of the
values divided by N
Suppose Xi 2,3,5,6
The mean would be 16/4 4

14
Means for Grouped Data

For grouped data, the mean would be the sum of
the frequencies times midpoints for each
interval, that sum divided by N
For our previous distribution,
the answer would be
i fi
01-10 1
11-20 3
1(5.5)3(15.5)2(25.5)6(35.5)
21-30 2 4(45.5)
498 / 16 31.125
31-40 6
41-50 4
Total 16

15
3. Describing Variation

Range
Mean Deviation
Variance
Standard Scores (Z score)

16
Describing Variation - Range

The Range for raw scores is the highest minus the
lowest score, plus one (i.e. inclusive)
The Range for grouped scores is the upper real
limit of the highest interval minus the lower
real limit of the lowest interval. In the case
of our
previous distribution this would be
50.5 - .5 50

i fi 01-10
1 11-20 3 21-30 2 31-40
6 41-50 4 Total 16
17
Describing Variation Mean Deviation

The mean deviation is the sum of all deviations,
in absolute numbers, divided by N.
Consider the set of observations, 6,7,9,10 The
mean is 8 and the MD is (6-87-89-810-8)
/4 6/4 1.5

18
Mean Deviation for Grouped Data

Again grouped data implies we substitute
frequencies and midpoints for values

The mean would be 50,000 (satisfy yourself that
that is true) and the MD would be (638-50)
(843-50) (1248-50) (1253-50)
(858-50) (463-50) 725624366452
304/50 6.080 x 1000 6,080
19
Variation The Variance

The variance for raw data is the sum of the
squared deviations divided by N
Consider the set Xi 6,7,9,10 The mean is 8
and the variance is ((6-8)2(7-8)2(9-8)2(10-8)2)
/4 2.5

20
Variance for Grouped Data

Frequencies and midpoints are still
substituted for the values of Xi.

Again the mean is 50 and the Variance is
6(38-50)2 8(43-50)2 12(48-50)2 12 (53-50)2
8(58-50)2 4(63-60)2 1014 392 48 108
512 676 2690 / 50 53.8 x 1000 53,800.
The Standard Deviation is the sq root of this.
21
4. Covariance and Correlation

The Definition and Concept
The Formula
Proportional Reduction in Error and r2

22
Correlation Definition and Concept

Visually we can observe the co-variation of
two variables as a scatter diagram where the
abscissa and ordinate are the quantitative
continua and the points are simultaneously
mapping of the pairs of scores.

23
Correlation - Formula

Think of the correlation as a proportional
measure of the relationship between two
variables. It consists of the co-variation
divided by the average variation

24
Correlation and P.R.E.
Consider this scatter diagram. The proportion of
variation around the Y mean (variation before
knowing X), less the proportion of variation
around the regression line (variation after
knowing x) is r2
25
Partial Correlation
IV. Quantitative Statistical Example of
Elaboration
Step 1 Construct the zero order

Pearsons correlations (r).
Assume rxy .55 where x divorce rates and y
suicide rates.
Further, assume that unemployment rates (z) is
our control variable and that rxz .60 and ryz
.40
Step 2 Calculate the partial correlation
(rxy.z)

.42
Therefore, Z accounts for (.30-.18) or 12 of Y
and (.12/.30) or 40 of the relationship between
XY
Before z (rxy)2 .30
Step 3 Draw conclusions
After z (rxy.z)2 .18

Write a Comment

User Comments (0)